Friday, April 27, 2018 - 12:00pm to 1:00pm
Location:Traffic21 Classroom 6501 Gates Hillman Centers
Speaker:QING ZHENG, Ph.D. Student http://www.cs.cmu.edu/~qingzhen/
In this talk Qing introduces the Indexed Massive Directory, a new technique for indexing data within DeltaFS. DeltaFS is a software-defined user-space file system optimized for HPC systems and workloads. The Indexed Massive Directory is a novel extension to the DeltaFS data plane, enabling in-situ indexing of massive amounts of data written to a single directory simultaneously, and in an arbitrarily large number of files. We achieve this through a memory-efficient I/O pipeline to reorder data, and a log-structured storage layout to pack small writes into large log objects, all while ensuring compute node resources are used frugally. We demonstrate the efficiency of this indexing mechanism through VPIC, a widely-used simulation code that scales to trillions of particles. With DeltaFS, we modify VPIC to create a file for each particle to receive writes of that particles output data. Dynamically indexing the directorys underlying storage allows us to achieve a 5000x speedup in single particle trajectory queries, which require reading all data for a single particle. This speedup increases with application scale while the overhead is approximately 15% of increased I/O time.
Presented in Partial Fulfillment of the CSD Speaking Skills Requirement