In broad terms I pursue research in large scale parallelism in computer systems and its implications on operating systems and computer architecture. My particular interests focus on large scale clustering technologies, parallel and distributed file systems, storage and system area networking, and secondary memory system technologies such as magnetic disk and flash storage design and optimization. I have a strong interest in shepherding technological advances from blackboard to commercial reality and widespread use. All of my research is housed in CMU's Parallel Data Laboratory (www.pdl.cmu.edu).
My early research on redundancy in parallel storage systems, called RAID, spawned a storage industry revolution and is now a checklist requirement of a $15+ billion dollar marketplace. My research on network-attached secure disks (NASD) is shaping new storage technologies including SCSI Object Storage Devices (OSD), high-performance IP-based storage and IETF Parallel NFS filesystems standards. NASD graduate students have gone on to shape Google's file system and database software and Seagate's next generation storage devices. By founding Panasas in 1999, I have also been driving the realization and deployment of these technologies into the mainstream of high performance storage technology. For example, the world's first PetaFLOP (and, as of July 2008, the world's fastest) computer is Los Alamos' Roadrunner cluster of Opteron nodes and 64b Cell accelerators. The primary storage system for Roadrunner is a Panasas storage cluster of about 1800 object storage servers bound together a single distributed system employing novel RAID techniques and virtualized as one storage pool.
On a broader note I play a leadership role in academic and industrial storage system developments. I sit on the steering committee of the leading storage systems conference, the USENIX conference on File and Storage Technology (FAST). I have sat on the technical council of the Storage Networking Industry Association. And I chair the IEEE technical field award for information storage systems contributions. Of late I have been reviewing my research with international communities; specifically, I have recently spoken in China, Britain, Germany, Canada, and Israel, and I have joined a scientific advisory board for a storage systems institute in Singapore.
DOE Petascale Data Storage Institute: Chartered by the Office of Science at the US Department of Energy I lead a team of researchers from CMU, U. Michigan, U. of California, and five National Labs: Los Alamos, Sandia, Oak Ridge, Pacific Northwest and Lawrence Berkeley. Our job is to anticipate the challenges of and guide efforts toward scaling high performance storage by 100% per year for the world's biggest computers' needs over the next decade (Peta- to Exa- scale systems).
Los Alamos Institute for Reliable High Performance Information Technology: I co-direct a partnership between CMU's Parallel Data Laboratory, CMU's Institute for Software Research and Los Alamos National Laboratory. Projects such as database interfaces on the metadata for huge scientific file systems and software debugging tools for large scale parallel scientific applications augment our basic goal of integrating the advanced systems thought going on in academic and national laboratories communities.
File Systems and Databases at Scale: My newest research directions explore the reorganization of large scale storage services inspired by internet services like Google's GFS and Bigtable and Yahoo's Hadoop open source cluster software stack. Parallel file systems and unstructured databases are being reconsidered and reorganized to be much more inherently scalable. Data Intensive Scalable Computing (DISC) is a paradigm shift for how large scale computing serves a very broad range of science, web users and industry. This is a very exciting time to be a core scalable storage systems researcher!