Rajesh Jayaram

Sketching and Sampling Algorithms for High-Dimensional Data Degree Type: Ph.D. in Computer Science
Advisor(s): David Woodruff
Graduated: August 2021

Abstract:

Sketching is a powerful technique which compresses high-dimensional datasets into lower-dimensional sketches, while preserving properties of interest. Sampling is a quintessential form of sketching, where one generates samples from the data to use as the basis for a sketch. Sketching and sampling methods have become ubiquitous across computer science, and are a central component of modern algorithm design.

This thesis studies the design and analysis of sketching and sampling algorithms for a variety of computational tasks. In particular, we make contributions to three closely related areas: Streaming and Distributed Algorithms, Numerical Linear Algebra, and Database Query Evaluation. Our contributions to these areas include:

Streaming and Distributed Algorithms:

  • The first perfect Lp samplers, near-optimal algorithms for Lp estimation for p ∈ (0, 1), improved sketches for the Earth Mover's Distance, and the introduction of the adversarially robust and bounded deletion streaming models.

Numerical Linear Algebra:

  • The first property testing algorithms for matrix positive semi-definiteness, and the first input-sparsity solvers for linear regression on design matrices which are (1) a Kronecker product of several smaller matrices or (2) a database join of several smaller tables.

Database Query Evaluation:

  • A characterization of the classes of conjunctive queries for which approximate counting and uniformly sampling can be accomplished in polynomial time. Along the way, we obtain the first polynomial time algorithm for estimating the number of strings of length n accepted by a non-deterministic finite automaton.

Thesis Committee:
David Woodruff (Chair)
Anupam Gupta
Andrej Risteski
Alexandr Andoni (Columbia University)
Jelani Nelson (University of California, Berkeley)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science

Keywords:
Sketching, Streaming, Sampling, Numerical Linear Algebra

CMU-CS-21-118.pdf (4.5 MB) ( 722 pages)
Copyright Notice