Technical Tidbit of the Day: SQL HAVING in R

Sunday, January 20, 2013

SQL HAVING in R

Coming from a SQL background, I'm learning R, and wanted to do the equivalent of GROUP BY HAVING (well, really embedded in a subquery in order to subset the data), but the most obvious Google searches turned up nothing. The answer is probably a no-brainer for R experts, but here it is in case future R-novices-SQL-experts Google for it.

Taking the example data set chickwts,

> data(chickwts)
> summary(chickwts)
    weight         feed
Min.   :108.0   casein   :12
1st Qu.:204.5   horsebean:10
Median :258.0 linseed :12
Mean   :261.3 meatmeal :11
3rd Qu.:323.5 soybean  :14
Max. :423.0 sunflower:12

and supposing we want to exclude "low" popularity feeds that occur in the data set fewer than 12 times (yes, this is a contrived example), the below will discard those low-popularity feeds, leaving only the high-popularity feeds.

x <- sapply(split(chickwts,chickwts$feed), nrow)
chickwts <- chickwts[chickwts$feed %in% names(x[x>=12]),]

Sunday, January 20, 2013

SQL HAVING in R

No comments: