How Architecture Drove Me to Submit to an Algorithms Workshop: Lessons from the FAWN project

David Andersen, Associate Professor in Computer Science at Carnegie Mellon University


David Andersen

In this talk, I’ll detail a set of experiences building energy-efficient clusters of “wimpy” nodes. Using large numbers of low-GHz processors to accomplish big data tasks capitalizes on several underlying, fundamental realities of computer hardware: High frequency is inefficient; easily exploitable parallelism in programs is good; designing systems with an intense focus on locality is similarly good.

It is also painful.

In the past four years, we’ve built a series of these “fast arrays of wimpy nodes”, containing from 20 to 85 nodes, and used them to accomplish tasks such as high-performance key-value storage using flash memory; searching text corpuses for millions of search phrases at a time; and in-memory caching a la memcached. In each of these cases, the changes required to achieve efficient, fast performance were substantial, ranging from simple manual optimization and configuration changes; to software design changes; to algorithmic changes and the re-engineering of algorithms to accomplish the tasks. The engineering and programming efforts for each of these have been large, though worthwhile: The systems we constructed were several times more energy-efficient than their predecessors.

The talk will conclude on a mixed note of caution and optimism. The FAWN project, with all of its attendant challenges in software construction, is one potential harbinger of the things that architecture may deliver in the future: Not just a future of parallelism, but a future of massive parallelism, mandatory locality, and very constrained per-node memory. Fortunately, continuing to provide better, faster software in such a future should keep the computer science community as a whole out of trouble for a few decades, requiring simultaneous, concerted help from architecture, theory, and programming languages.


David Andersen is an associate professor in the Computer Science department at Carnegie Mellon University. He received his Ph.D. and M.S. degrees from MIT, and received B.S. degrees in Computer Science and Biology from the University of Utah. Before joining MIT, he was a co-founder and CTO of an Internet Service Provider in Salt Lake City. His research interests center on computer systems in the networked environment.