Desktop supercomputing is now cheap, mainstream, and mature. Using GPGPU (General Purpose computing on a Graphics Processing Unit), you can write C programs that execute 25x as fast as a high-end desktop computer alone for just $500 more.
The OpenCL standard, started in 2008, is now mature. It provides a way for C/C++ programs on Windows and Linux to compile and load special OpenCL C programs onto GPGPUs, which are just off-the-shelf high-end graphics cards that videogame enthusiasts usually buy. When you buy one of these cards for your supercomputing project, expect lots of snickers from your purchasing or shipping/receiving department when it arrives with computer videogame monsters on the box.
As an example, the approx. $500 Radeon 7970 has 2048 processing cores on it, each capable of double-precision floating point running at about 1 GHz executing on average one double-precision floating point operation per clock cycle. The double-precision is actually new to this generation of Radeon and the OpenCL PDF document standard hasn't even been updated yet to include the data type, even though the API SDK header files have been.
Using the freeware GPU Caps software, the Radeon 7970 by itself (without assistance from my desktop computer's 3.3 Ghz Intel i5 2500) clocks in at 25x the computational power of the four-core (single processor) Intel i5 by itself.
To get a dual-processor Intel motherboard and second Intel processor is a $1000 increment, and that's only a 2x speedup, so a 25x speedup for a $500 increment isn't just a better deal, it's a new paradigm. As Douglas Englebart said, a large enough quantitative change produces a qualitative change.
Up to four such cards can be ganged together in a single computer for a total 100x speedup. But since each card is physically three cards wide (to accommodate the built-in liquid cooling and fans) even though it has just one PCIe connector, you will need a special rack-mount motherboard to go to that extreme (note I have not tried this!).
By comparison, to go 100x in the other direction, to get a computer with 1% of the computation power of my desktop i5, it would require going back 15 years to a Pentium II. So a four-Radeon system represents a sudden 15-year leap into the future.