Di Wang

Static Analysis of Probabilistic Programs: An Algebraic Approach Degree Type: Ph.D. in Computer Science
Advisor(s): Jan Hoffmann
Graduated: May 2022

Abstract:

Probabilistic programs are programs that can draw random samples from probability distributions and involve random control flows. They are becoming increasingly popular and have been applied in many areas such as algorithm design, cryptographic protocols, uncertainty modeling, and statistical inference. Formal reasoning about probabilistic programs comes with unique challenges, because it is usually not tractable to obtain the exact result distributions of probabilistic programs. This thesis focuses on an algebraic approach for static analysis of probabilistic programs. The thesis first provides a brief background on measure theory and introduces an imperative arithmetic probabilistic programming language Appl with a novel hyper-graph program model. Second, the thesis presents an algebraic denotational semantics for Appl that can be instantiated with different models of nondeterminism. The thesis also develops a new model of nondeterminism that involves nondeterminacy among state transformers and presents a domain-theoretic characterization of the new model. Based on the algebraic denotational semantics, the thesis proposes a general algebraic framework PMAF for designing, implementing, and proving the correctness of static analyses of probabilistic programs. The thesis also includes a concrete static analysis–central-moment analysis for cost accumulators in probabilistic programs–and elaborates implementation strategies to improve the usability and efficiency of the analysis. There is a gap between the general PMAF framework and the central-moment analysis, in the sense that the former is based on abstraction and iterative approximation, but the latter is based on constraint solving. The thesis provides some preliminary results on bridging the gap, via the development of novel regular hyper-path expressions, which finitely represent possibly-infinite hyper-paths on control-flow hyper-graphs of probabilistic programs without nondeterminism, and DMKAT algebraic structures, which can be used to interpret regular hyper-path expressions. Future directions for extending the research covered by this thesis include developing an algebraic static-analysis framework based on DMKAT and instantiating it to perform central-moment analysis, generalizing DMKAT with support for nondeterminism and formalizing an equational axiomatization for the generalized algebraic framework, as well as applying the analysis framework developed in the thesis to the analysis and verification of statistical guarantees for Bayesian probabilistic programming.

Thesis Committee:
Jan Joffmann (Chair)
Matt Fredrikson
Stephen Brookes
Thomas Reps (University of Wisconsin-Madison)
Hongfei Fu (Shanghai Jiao Tong University)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science

Keywords:
Probabilistic programs, Program analysis, Denotational semantics, Markov algebras, Control-flow hyper-graphs

CMU-CS-22-109.pdf (1.33 MB) ( 185 pages)
Copyright Notice