CSD Home | Important Links | SCS Admin Services | SCS Home

 

 

GENERAL INFO
  About
News Page
Faculty Positions Available 
  SCS Calendar
EDUCATION
Ph.D. in CS
B.S. in CS
M.S in CS
Doctoral Catalog
RESEARCH
Faculty Research Guide
Areas of Research
Undergraduate Research
Publications
PEOPLE
Who's Who
   Faculty List
  Administrative Staff
Grad Student Directory
Undergrad Student Site
  CSD On the Road
   

 

Search SCS

google

Research Areas - Large-Scale Distributed Systems Research in the Computer Science Department at Carnegie Mellon

 

CSD faculty: Anastassia Ailamaki, Dave Andersen , Greg Ganger (ECE), Phil Gibbons (Intel), Garth Gibson, Mor Harchol-Balter, David O'Hallaron, Chris Olston, Mike Reiter, M. Satyanarayanan, Srini Seshan, Peter Steenkiste, John Wilkes (HP Labs), Haifeng Yu (Intel), Hui Zhang

 

In the last few decades, computing has spread throughout our society to become an essential part of our lives. Today, many applications follow a distributed computing paradigm in which users access a distributed service infrastructure. Examples range from simple applications such as web browsing to sophisticated e-commerce and game applications. While distributed computing has been around since the early days of the DARPA net, the scale of today’s service infrastructure’s is unprecedented. At the same time we are seeing that embedded systems, which were typically stand-alone systems, are becoming part of the global infrastructure. The rapid deployment of sensors and sensor networks in many environments further blurs the line between embedded systems and “open” distributed systems.

In this section we focus on the distributed service infrastructure used in today’s distributed systems, thus complementing the client-centric view taken in Section Mobile and Pervasive Computing. The research agenda is driven by the critical role the distributed service infrastructure plays in today’s society. While high performance remains an important goal, system properties such as high availability, security and privacy, and manageability are in practice more important. These requirements are often bundled in the term “trustworthy computing,” meaning that the system should predictably and consistently meet the user’s requirements along each of the above dimensions. Trustworthy computing in large-scale distributed computing environments is a major research area at Carnegie Mellon.

Carnegie Mellon has a rich tradition in the area of distributed systems, with early work in parallel and distributed computers (CM*, iWarp), distributed file systems (AFS), and cluster computing (Nectar). This research was characterized by an empirical, application-driven approach: the research addressed pressing application needs and developed prototypes that could be used and evaluated by users. This research style continues to drive today’s research. In the last 5-10 years, we have also developed a “foundations of systems” research thrust that complements, but is integrated with, this empirical approach. It uses mathematical tools to resolve fundamental challenges in systems.

In the remainder of this section, we elaborate on recent and ongoing distributed systems research in our department.. The research is organized in four categories: data centers, information retrieval, peer-to-peer systems, and foundations. While the research is very diverse, “self-management” is a common theme shared by many of the projects. Self-management means that the system automatically optimizes one or more system features, in response to changes in the load, system status, and environment. This trend towards self-managing systems is caused by several factors. A first factor is cost: management is the dominant cost in computing systems, so self-management can lead to significant cost reductions. A second factor is complexity: today’s systems are so complicated that manual management is impractical and is likely to result in poor performance.

We focus on projects in CSD. However, many projects involve faculty, students, and staff from the Department of Electrical and Computer Engineering, Intel Research Pittsburgh, and other units in the School of Computer Science.

 

1 Data Centers

A significant research effort in data centers is taking place in the context of the Parallel Data Lab, the premier academic research center in data storage (Ganger, Gibson, Schlosser). In the last 10 years, the PDL has made research contributions in areas ranging from file systems, databases, survivable storage, self-securing devices, and MEMS-based storage. Based on this earlier work, the PDL has recently started a new broad effort in the area of self-managing storage systems. The motivation is that human administration of storage systems is a large and growing issue in modern IT infrastructures. PDL is exploring new storage architectures that integrate automated management functions and simplify the human administrative task. Self-* storage systems are self-configuring, self-organizing, self-tuning, self-healing, self-managing systems of storage bricks. Borrowing organizational ideas from corporate structure and technologies from AI and control systems, self-* storage should simplify storage administration, reduce system cost, increase system robustness, and simplify system construction.

A number of research efforts are also addressing the question of how to manage computation and services in large-scale distributed computing environments. The main challenges in this area are the development of a distributed infrastructure that can support services in a trustworthy fashion, and resource management algorithms that optimize service quality.

“Virtualization” is a key technique used to control resources in distributed computing infrastructures. It is traditionally used as a technique to meet application quality of service (QoS) requirements. Carnegie Mellon has done significant QoS research both for networks (discussed in the Networking Section) and processors. For example, the RT OS kernel provides QoS guarantees for CPU resources in a Linux environment. The more recent emergence of light-weight virtual machines has broadened the use of virtualization. VM technology is for example the enabling technology for Internet Suspend/Resume (Mobile and Pervasive Computing), but it can also be applied inside the service infrastructure. One example is the use of virtual machines to provide isolation between applications with respect to security and faults in large computing clusters.

The emergence of sensor networks as a new type of infrastructure has resulted in new requirements, including power-awareness and small kernel size. Leveraging the RT kernel, the NanoOS project has developed new techniques in this area (Rajkumar).

The Libra project is looking at how to develop a general, reusable infrastructure for service optimization and deployment, considering that these operations tend to be highly service-specific (Steenkiste). The approach being pursued is to have service developers write “service recipes” that capture the services-service specific knowledge (e.g., optimization metrics, desirable adaptation behavior, and so on). A generic service infrastructure uses these recipes to automatically instantiate, optimize, and adapt the services. This work leverages ideas from the Carnegie Mellon Rainbow project (Section Software Engineering).

 

2 Information Access

Another research direction is optimizing the client accesses to information services. More details on this appear in the description of Mobile and Pervasive Computing, but many applications can benefit from infrastructure-driven optimizations. Depending on the application domain, different challenges must be addressed.

The first research direction is a variant on an old idea (caching) (Kozuch, Helfrich, Satyanarayanan, Andersen). The new twist is to use the cryptographic hash of data blocks as a means of identifying contents. The main feature of this approach is semantically neutral, so it can be applied in many different contexts. Example storage-oriented applications include using portable storage for caching in Coda and improving WAN access to databases. The same idea can also be used to support data transport over networks with intermittent and highly variable connectivity. The Data-Oriented Transfer (DOT) system breaks the data stream into individual chunks that are addressed by the hash of their content. The sender transmits these hashes to the receiver before transmitting the actual data. Using the hashes, the receiver can retrieve the data from either the original sender or from other sources.

Databases play an important in role in today’s service infrastructure, e.g., many web applications make heavy use of back-end databases. As a result, optimizing and scaling of databases is an important research problem. One approach is to develop scalable distributed databases, which is being explored in the StagedDB project (see Databases). An alternative approach is being explored in the S3 project, which aims to allow Web application providers to outsource “database scalability” to a third party. The idea is to develop techniques that will support the selective caching of database contents, making it possible to scale the delivery of dynamic Web contents by relying on third-party content-providers, similar to what is done for static contents today. We have developed an initial working prototype based on a network of cooperating proxy cache nodes. Challenges include developing consistency models that are fiexible, “load-aware” cache management policies that avoid placing excessive load on application servers, and developing means of ensuring data security and privacy while still permitting efficient and effective consistency management.

Other information retrieval applications are CPU-limited. The Diamond project (Helfrich, Satyanarayanan, Steenkiste, Sukthankar) has developed a distributed application that supports interactive exploration of non-indexed data sets. The search is specified as a set of filters that can be dynamically mapped on different computing resources, e.g., active disks, compute servers, or the users PC. Challenges include mapping of the filters on available resources, runtime adaptation, and cross-query optimizations.

 

3 Peer-to-peer Services

The research described so far has focused on services that hosted by data centers. For many services, especially storage-oriented services, this is a natural solution since it simplifies management and helps in meeting requirements such as consistency and persistence. However, other services do not have such requirements (i.e., they have only short-term state) they use or are naturally distributed (e.g., video broadcast or sensor networks). For such services, fully-distributed implementations are an attractive solution since they avoid single-points of failures, scale naturally, and can leverage resources from users, thus avoiding the need for expensive data centers. Such fully distributed, peer-to-peer applications often borrow ideas from networking, such as overlay networking, and they often sit at the boundary between networking and distributed computing. The Networking section describes a number of network-oriented peer-to-peer research efforts. In this section, we focus on research in application-oriented peer-to-peer systems.

A first category of peer-to-peer projects includes applications that are inherently distributed. One example is the End System Multicast (ESM) project that aims to make real-time multicast video over the Internet as ubiquitous and easy as publishing a web page or making a telephone call. Traditional broadcast media such as satellite and cable are expensive because they rely on special-purpose communication infrastructures. Broadcasting over the Internet is possible, but today it requires powerful servers, which managed by third-party content providers (e.g., Akamai). Video broadcast using ESM leverages the bandwidth and processing resources of the machines participating in the broadcast, and has the potential to allow any source machine to broadcast to any group of receiver machines over the Internet. Our prototype system has been operational for more than 2 years and has been used to successfully broadcast many events, including a campaign rally by a 2004 presidential candidate at Carnegie Mellon. These broadcasts are not only important proof-of-concept demonstrations, but they also are an indispensable tool for our ongoing research on overlay multicast .

The availability of cheap sensors has made sensor networks a hot area, but most research efforts are focusing on embedded applications — for example, closed networks of sensor motes have been used to monitor wildlife. In contrast, the IrisNet project (Gibbons, Seshan, Yu) was the first to focus on software for a new general-purpose, shared sensor infrastructure in which sensor nodes, such as webcams and microphones, are attached directly to the Internet. The goal is to enable the easy development and deployment of a wide variety of interesting new sensor-enriched services. One challenge is the collection, filtering and processing of sensor data from sensing devices shared by a number of different services. A second challenge is the development of a distributed, XML database-based infrastructure for the storage of sensor readings. Queries are supported by a set of algorithms that efficiently route queries and cache information in any distributed XML database.

Another class of peer-to-peer applications uses distributed data structures such as Distributed Hash Tables (DHTs) to support efficient access to data in a scalable fashion. One example is the Camel publish-subscribe system. The goal of Camel is to support rich queries for realistic (i.e., skewed) workloads in a scalable fashion. A first component of Camel is a load balancing algorithm that dynamically recruits more nodes to support registrations and queries for popular items. A second element is a set of protocols that support range queries and multi-dimensional nearest neighbor queries by embedding either range search tree or a KD-tree into a DHT. Both mechanisms operate in a fully distributed fashion.

Mercury (Seshan) targets peer-to-peer multiplayer games and is developing naming/data lookup building blocks that are useful for building real applications, specifically multi-attribute, range-based lookup of data. Mercury leverages the idea of distributed sampling of the system workload, which enables looking up data in an n node system in O(log n) hops, effective load-balancing across all nodes, and selectivity estimation on the parts of a multi-attribute query. Using this basic multi-attribute, range-based lookup primitive, we have constructed a novel hybrid distributed object repository that that forms the basis for the implementation of a P2P version of the popular Quake 2 game that supports hundreds of players. We believe that the Mercury design can actually support distributed implementations of a wide variety of applications (e.g., databases, service discovery and financial data filtering).

 

4 Foundations of systems

Systems research often relies on empirical techniques for evaluation. However, many aspects of distributed computing can benefit from mathematical analysis. We briefiy discuss foundations of systems research on load balancing and fault tolerance in this section. Several other sections (formal methods, principles of programming) describe additional foundations of systems research.

Cycle stealing, whereby one server can improve its performance by ”stealing” the idle cycles of another server, is a key performance optimization in many types of distributed systems. However, until recently, it was not possible to mathematically compute the benefit of cycle stealing to the beneficiary server and to determine the optimal threshold. Harchol-Balter developed a new analytical technique, called Dimensionality Reduction, which allows the transformation of a 2D-infinite Markov chain into a 1D-infinite Markov chain, without truncation and with very little loss in accuracy. This technique has resulted in the first formal analysis of many common distributed computing problems involving dependencies between servers including: cycle stealing, priority queueing in multiserver systems, and sharing in networks with job affinities.

System availability, i.e. the ability to continuously provide uninterrupted service to the users, is a key feature of today’s computing systems. Despite its importance, availability research is still in a preliminary state. For example, fundamental results, such an understanding of component failure properties, and analytical availability studies of modern systems are missing. Gibbons, Yu, and Seshan have worked on the availability of large-scale distributed systems in the context of the Iris-Net project. In order to understand machine failure characteristics, their work starts by analyzing several real-world large-scale distributed systems. This analysis shows that there exists significant failure correlation in the systems we studied, and ignoring such failure correlation can result in orders of magnitude error in the availability evaluation. These researchers have also developed analytical techniques to improve availability solutions. Using Internet measurement traces, they observed that traditional quorums systems are overly pessimistic regarding the possibility of false failure detection and developed the concept of signed quorum systems to improve availability. In the second effort, they used analytical approaches to study and improve the availability of user tasks that access multiple data objects.

Additional research looks at techniques to improve the performance of quorum systems (Yu). One direction is the design of structures for accessing minimal portions of a distributed system while still achieving consistent semantics in quorum systems. Another research direction is making systems tolerant to Byzantine faults. Since a Byzantine fault makes no assumptions about the behavior of a faulty process, i.e. it can act arbitrarily maliciously, it can be used to model security compromises. Finally, an important problem is the consistency semantics for data access protocols, i.e., what does it mean for a system to be ”correct” in the face of partial failures and concurrent accesses, and what are protocols for implementing these?

 

 

 

      CSD Home   Webteam  ^ Top   SCS Home