Sunday, November 16, 2014

Single GPU-Powered Node 4x Faster Than 50-node Spark Cluster

The above chart comes from a new dissertation out of Berkeley entitled High Performance Machine Learning through Codesign and Rooflining. Huasha Zhao and John F. Canny demonstrate that for the PageRank problem, their custom GPU-optimized matrix library they called BIDMat outperforms a 50-node Spark cluster by a factor of four. Their single GPU-powered node had two dual-GPU Nvidia cards for a total of four GPUs.

And BIDMat is just one component of their full BIDMach software stack illustrated below (illustration also from their dissertation).
Intel MKL and GPU/Cuda are of course off-the-shelf libraries. Butterfly mixing is a new 2013 technique by the same two authors that updates a machine learning model "incrementally" by using small subsets of training data and propagating model changes to neighboring subsets. They do not state it explicitly, but these network communication diagrams between the small subsets resemble the butterfly steps in the Fast Fourier Transform algorithm.

Kylix is an even newer (2014) algorithm, again by the same two authors, that further optimizes the butterfly approach by varying the degree of each butterfly node (the number of butterfly nodes each butterfly node must communicate with) in a way that is optimized for real-life power-law data distributions.

Finally, part of their overall approach is what they have coined "rooflining", which is where they compute the theoretical maximum communication and computation bandwidth, say of a GPU, and ensuring that their measured performance comes close to it. In their dissertation, they show they reach 80-90% of CPU/GPU theoretical maximums.

By doing so, the authors have turned GPU hype into reality, and have implemented numerous machine learning algorithms using their BIDMach framework. Now it remains to either make BIDMach available for commercial production use, or to incorporate the concepts into an existing cluster framework like Spark.

3 comments:

LifeVoxel.AI said...

LifeVoxel.AI has developed a Interactive Streaming and AI Platform for medical imaging using GPU clusters cloud computing. It is a leap in cloud technology platform in medical imaging that encompasses use cases in visualization, AI, image management and workflow. It’s approach is unique that it has been granted 12 International patents.

Interactive Streaming AI Platform RIS PACS

LifeVoxel.AI said...

LifeVoxel.AI platform helps imaging diagnostic centers and hospitals to save up to 50%+ over conventional RIS PACS with higher functionality. LifeVoxel.AI is the fastest RIS PACS available globally and have unimaginable capabilities of centralized PACS across all your network of Imaging Centers to single window HUB.

RIS PACS
RIS PACS software

LifeVoxel.AI said...

LifeVoxel.AI platform offers a better alternative for the medical imaging community. Its unique approach is based on 14 international patents that encompass critical patient-centric delivery of care using intuitive, diagnostic and instantly accessible visualizations over the Internet. It works 100X faster and its cheaper by 50%+ than other service providers. We are so honored to be selected by Plug-and-Play from over 120 companies to become the Alliance For Covid-19 (AFC-19) company of choice when it comes to diagnostic imaging software. Our remote 4D and AI inherent capabilities of our patented cloud platform that replaces antiquated on-premises medical imaging systems has made us achieve this important feat.

RIS PACS
RIS PACS Software