I have broad research interests in computer systems, including cloud computing, storage/file systems, operating systems and distributed systems. I am involved in several ongoing projects in such areas as systems for large-scale ML, cloud/cluster resource scheduling, and exploitation of new storage/NVM technologies.
Big-learning systems for Big Data
Modern data analytics often relies on statistical machine learning (ML) to parameterize models that fit observation data, for use in making predictions, correlating causes with effects, etc. Growth in data and desired model precision dictate parallel execution of ML algorithms on clusters, with the corresponding work distribution, synchronization, and data consistency challenges. The big-learning group is exploring powerful new approaches for efficient, scalable, and robust big-learning on Big Data.
We are exploring software systems challenges in efficiently supporting and exploiting cloud computing, such as resource allocation/scheduling and exploiting elasticity for stateful services (e.g., storage) and long-running computations (e.g., large-scale ML).
Parallel Data Lab (PDL)
As Director of the Parallel Data Lab, I lead and collaborate on a number of storage-related projects in areas such as storage system architecture, file systems, and Big Data systems. For example, in addition to the activities discussed above, we are exploring how system software should change to accommodate new storage technologies like non-volatile RAM (e.g., PCM) and best exploit Flash.