Doctoral Thesis Proposal - Honghao Lin

— 5:30pm

Location:
In Person and Virtual - Newell-Simon 4305 and Zoom

Speaker:
HONGHAO LIN , Ph.D. Student
Computer Science Department
Carnegie Mellon University

https://honghlin.github.io/

Advances in Algorithms for Massive Data: Optimal Bounds, Adversarial Robustness, and Data-Driven Insights

With the rapid growth of massive datasets in areas such as machine learning and numerical linear algebra, classical algorithms are often no longer feasible. In this thesis proposal, we develop provably efficient algorithms for various problems in these settings, such as streaming and distributed model. Our contributions span three directions:

  1. Optimal Bounds. We introduce a general technique for lifting dimension lower bounds for real-valued linear sketches to polynomially bounded integer inputs. This leads to the first optimal sketching lower bounds for discrete data streams in fundamental problems such as frequency moment approximation, operator norm estimation, and compressed sensing. Beyond this, we also establish nearly-optimal bounds for a variety of streaming and sketching tasks, including ℓ p subspace sketches for constant dimension d, p regression in the arbitrary-partition distributed model, and graph problems such as approximating the minimum cut and constructing cut sparsifiers in balanced directed graphs.
  2. Adversarial Robustness. While most streaming algorithms are studied in static worst-case models, many practical scenarios involve adaptive adversaries who generate inputs based on previous outputs. We present the first adaptive attack against linear sketches for  p-estimation over turnstile integer streams. Specifically, we show that any linear streaming algorithm with sketching matrix A ∈ ℤrxn can be broken using only poly(r log n) queries, with high constant probability. This result highlights fundamental limits of robustness in adaptive streaming.
  3. Learning-based Algorithms. Classical algorithms guarantee correctness in the worst case but often ignore structure in real-world data, while machine learning methods leverage structure but typically lack guarantees. We design learning-based algorithms that incorporate machine learning predictions to adapt to input distributions, achieving faster runtimes, reduced space, or improved accuracy. Crucially, these algorithms retain rigorous worst-case guarantees even when the predictions are imperfect, bridging the gap between theory and data-driven practice.  

Thesis Committee
David P. Woodruff (Chair)
Yang P. Liu 
Richard Peng
Jelani Nelson (University of California, Berkeley)

In Person and Zoom Participation.  See announcement. 

For More Information:
matthewstewart@cmu.edu


Add event to Google
Add event to iCal