Benjamin Lengerich Sample-Specific Models for Precision Medicine Degree Type: Ph.D. in Computer Science Advisor(s): Eric Xing Graduated: December 2020 Keywords: Personalized Machine Learning, Sample-Specific Models, Precision Medicine Abstract Modern applications of artificial intelligence are often characterized by training large machine learning (ML) models on large datasets. These datasets are composed of overlapping groups of samples, either explicitly (e.g. the large dataset is created by combining multiple datasets) or implicitly (e.g. the samples belong to latent sub-populations). Population models prefer weakly-predictive global patterns over highly-predictive localized effects, a problem because localized effects are critical to understanding complex processes such as in applications to computational biology (in which samples come from latent cell types) and precision medicine (in which patients come from latent disease subtypes). In this thesis, we propose that: The performance of intelligent computer systems can be improved by treating different samples as different tasks. This is especially helpful in domains such as computational biology and precision medicine, in which we care about understanding the highly specific context of each sample. We propose to solve this problem by estimating a collection of many small models. For large collections, each model is responsible for only a small number of samples, enabling simultaneous interpretability and accuracy. As we show in this thesis, this framework can be scaled to estimate different model parameters for every sample. This thesis begins by studying the challenges of characterizing real-world datawith population-level models. Next, we develop the methodology of PersonalizedRegression. Finally, we apply sample-specific inference to computational biologyand precision medicine by: (1) Identifying Discriminative Subtypes of Cancers from Histopathology Images and (2) Cell-Specific Transcriptomic Regulatory Network Inference. Thesis Committee Eric P. Xing (Chair) Zico Kolter Ziv Bar-Joseph Manolis Kellis (Massachusetts Institute of Technology) Rich Carunana (Microsoft Research) Srinivasan Seshan, Head, Computer Science Department Martial Hebert, Dean, School of Computer Science Thesis Document CMU-CS-20-139.pdf (27.67 MB) (103 pages) Copyright Notice Return to Degrees List Thesis Repositories SCS Technical Reports Kilthub Proquest (requires CMU login)