Coming from a SQL background, I'm learning R, and wanted to do the equivalent of GROUP BY HAVING (well, really embedded in a subquery in order to subset the data), but the most obvious Google searches turned up nothing. The answer is probably a no-brainer for R experts, but here it is in case future R-novices-SQL-experts Google for it.
Taking the example data set chickwts,
> data(chickwts)
> summary(chickwts)
weight feed
Min. :108.0 casein :12
1st Qu.:204.5 horsebean:10
Median :258.0 linseed :12
Mean :261.3 meatmeal :11
3rd Qu.:323.5 soybean :14
Max. :423.0 sunflower:12
and supposing we want to exclude "low" popularity feeds that occur in the data set fewer than 12 times (yes, this is a contrived example), the below will discard those low-popularity feeds, leaving only the high-popularity feeds.
x <- sapply(split(chickwts,chickwts$feed), nrow)
chickwts <- chickwts[chickwts$feed %in% names(x[x>=12]),]
No comments:
Post a Comment