Thursday, December 17, 2015

GPU off Apache Spark roadmap: Deeplearning4j best bet for Spark GPU


Last night, Reynold Xin took SPARK-3785 "Support off-loading computations to a GPU" off the Apache Spark road map, marking it "Closed" with a resolution of "Later". This is a little different than when GPU was mentioned at Spark Summit in June, 2015 as a possibility for Project Tungsten for 1.6 and beyond.
So for now, the best bet for using GPUs on Spark is Deeplearning4j, from which their architecture diagram above came. As I've blogged previously, the DL4J folks are waiting until they have solid benchmarks before advertising them. Nevertheless, today, you can do deep learning on GPU-powered Spark.

Tuesday, December 1, 2015

Free book excerpt: Semi-Supervised Learning With GraphX

Manning Publications has made available for free an excerpt from my book Spark GraphX In Action. The excerpt is entitled Poor Man’s Training Data: Graph-Based Semi-Supervised Learning and shows how to:
  • Construct a graph from a collection of points using a K-Nearest Neighbors Graph Construction algorithm (not to be confused with KNN machine learning prediction, which actually gets used below)
  • Do the above in a way optimized for distributed computing.
  • Propagate labels to unlabeled nodes to achieve semi-supervised learning.
  • Make predictions from the trained model (using conventional KNN machine learning prediction)
And as part of Manning's site-wide MEAP sale for Cyber Monday week, the MEAP is 50% off today using the code dotd120115.
My co-author, Robin East, and I just finished the second draft this past weekend, so the print version should be available in 2016Q1.