Helpfully, the e1071 package (notably for its support vector machine algorithms) provides a handy function to measure the skewness of data, called skewness(). Below is a function to automatically deskew an entire range of columns of a data frame.
deskew <- function(df, mincol=1, maxcol=ncol(df), threshold=1.10) { for (i in mincol:maxcol) { t <- log(1+df[[i]]-min(df[[i]])) if (abs(skewness(df[[i]])) > threshold * abs(skewness(t))) df[[i]] <- t } df }Deskewing data improves the performance of linear models, both regular lm()/glm() and linear svm() support vector machines. Understandably, it doesn't help with decision trees such as randomTrees().
No comments:
Post a Comment