Dimitris Margaritis Learning Bayesian Network Model Structure from Data Degree Type: Ph.D. in Computer Science Advisor(s): Sebastian Thrun Graduated: May 2003 Keywords: Bayesian networks, Bayesian network structure learning, continuous variable independence test, Markov blanket, causal discovery, DataCube approximation, database count queries Abstract In this thesis I address the important problem of the determination of the structure of directed statistical models, with the widely used class of Bayesian network models as a concrete vehicle of my ideas. The structure of a Bayesian network represents a set of conditional independence relations that hold in the domain. Learning the structure of the Bayesian network model that represents a domain can reveal insights into its underlying causal structure. Moreover, it can also be used for prediction of quantities that are difficult, expensive, or unethical to measure -- such as the probability of lung cancer for example -- based on other quantities that are easier to obtain. The contributions of this thesis include (a) an algorithm for determining the structure of a Bayesian network model from statistical independence statements; (b) a statistical independence test for continuous variables; and finally (c) a practical application of structure learning to a decision support problem, where a model learned from the database -- most importantly its structure -- is used in lieu of the database to yield fast approximate answers to count queries, surpassing in certain aspects other state-of-the-art approaches to the same problem. Thesis Committee Sebastian Thrun (Chair) Christos Faloutsos Andrew W. Moore Peter Spirtes Gregory F. Cooper (University of Pittsburgh) Randy Bryant, Head, Computer Science Department James Morris, Dean, School of Computer Science Thesis Document CMU-CS-03-153.pdf (1.05 MB) (126 pages) Copyright Notice Return to Degrees List Thesis Repositories SCS Technical Reports Kilthub Proquest (requires CMU login)