Computer Science Thesis Proposal

Friday, November 30, 2018 - 3:00pm to 4:00pm


8102 Gates Hillman Centers



Pancasting: forecasting epidemics from provisional data

Speaker: Logan Brooks

Location: GHC 8102

Pancasting: forecasting epidemics from provisional data

Infectious diseases remain among the top contributors to human illness and death worldwide. While some infectious disease activity appears in consistent, regular patterns within a population, many diseases produce less predictable epidemic waves of illness. Uncertainty and surprises in the timing, intensity, and other characteristics of these epidemics stymies planning and response of public health officials, health care providers, and the general public. Accurate forecasts of this information with well-calibrated descriptions of the associated uncertainty can assist stakeholders in tailoring countermeasures, such as vaccination campaigns, staff scheduling, and resource allocation, to the situation at hand, which in turn could translate to reductions in the impact of a disease.

Domain-driven epidemiological models of disease prevalence can be difficult to fit to observed data while incorporating enough details and flexibility so that the observed data can be explained well. Meanwhile, more general statistical approaches can also be applied, but traditional modeling frameworks seem ill-suited for irregular bursts of disease activity, and focus on producing accurate single-number estimates of future observations rather than well-calibrated measures of uncertainty on more complicated functions of the data. The first part of the proposed work develops more flexible variants of simple statistical approaches that increase the flexibility of both.

Epidemiological surveillance systems commonly incorporate a data revision process, whereby each measurement may be updated multiple times to improve accuracy as additional reports and test results are received and data is cleaned. The second part of the proposed work discusses how this process impacts proper forecast evaluation and visualization. Additionally, it extends the models above to "backcast" how existing measurements will be revised, which in turn can be used to improve forecast accuracy.

Often, there are multiple available sources of estimates of a disease's prevalence, which vary in geographical and temporal scope and resolution, accuracy, and timeliness, and each of which may exhibit its own peculiarities. The final part of the proposed work further generalizes the above methodology to incorporate multiple data sources with similar temporal scopes and resolutions, in order to produce better forecasts than are possible with a single data source alone.

Thesis Committee:
Roni Rosenfeld (Chair)
Ryan Tibshirani
Zico Kolter
Jeffrey Shaman (Columbia University)

Copy of Thesis Proposal Summary

For More Information, Contact:


Thesis Proposal