Sara McAllister Toward Sustainable Datacenters through Efficient Data Retrieval Degree Type: CS Advisor(s): Nathan Beckmann, Gregory R. Ganger Graduated: August 2025 Keywords: Datacenters, sustainability, storage, caching, flash, hard disk drives Abstract Datacenters are projected to account for 33% of the global carbon emissions by 2050. As datacenters increasingly rely on renewable energy for power, the majority of datacenter emissions will be embodied – emissions from life-cycle stages including acquiring raw materials, manufacturing, transportation, and disposal. To reach the ambitious emission reduction goals set by both companies and governments, datacenters need to reduce emissions throughout their operations, including (and particularly relevant for this thesis) the storage system. Unfortunately, while data storage and retrieval systems are large contributors to embodied emissions, reducing their embodied emissions have largely been overlooked. This dissertation addresses how to reduce emissions in data retrieval for large-scale storage systems. These storage systems can reduce their carbon footprint by enabling storage devices to have longer lifetimes and use denser media. However, storage hardware's IO limits combined with software's unnecessary additional IO often severely restrict emission reductions, or at worse cause increased emissions. Thus, this thesis focuses on reducing IO in several parts of the storage stack to enable efficient and sustainable data retrieval. First, this dissertation addresses the sustainability of flash caching, a critical layer in datacenter storage systems that is limited by flash write endurance. This improvement results from two caching systems: Kangaroo and Fairy-WREN. Together, these caches dramatically reduce writes by over 28x, allowing flash devices to use denser flash for longer lifetimes, ultimately reducing emissions. Then, this thesis enables more sustainable bulk storage, where bandwidth limitations prevent deployment of denser HDDs. Declarative IO, a new interface for distributed storage, empowers the storage system to eliminate duplicate IO accesses in maintenance tasks through exposing the time- and order-flexibility in maintenance tasks. This work enables deployment of larger HDDs, further reducing emissions from storage systems. Thesis Committee Nathan Beckmann (Co-Chair) Gregory R. Ganger (Co-Chair) GFeorge Amvrosiadis Daniel S. Berger (Microsoft Azure / University of Washington) Margo Seltzer (University of British Columbia) Srinivasan Seshan, Head, Computer Science Department Martial Hebert, Dean, School of Computer Science Thesis Document CMU-CS-25-126.pdf (16.71 MB) (159 pages) Creative Commons: CC-BY (Attribution) Return to Degrees List Thesis Repositories SCS Technical Reports Kilthub Proquest (requires CMU login)