Lifestyle Disease Surveillance Using Spatio-Temporal Search Intensity Models
The worldwide growth in “Googling” about health-related information
on the Internet over the past few years has created new
possibilities of using web search data for public health surveillance.
Diseases that are typically tracked at the population level
can be categorized into two domains: communicable diseases and
non-communicable diseases (NCDs). e poster child for tracking
communicable diseases was Google Flu Trends (GFT), launched by
Google to monitor the spread of influenza, a communicable disease.
Although, the system shut down in 2015 for overestimating
the influenza epidemics, it started a whole line of health care research
using Google Trends to nowcast anything from fast-moving
communicable diseases such as dengue fever and chickenpox to
slow-moving “lifestyle diseases” such as diabetes and obesity.
As lifestyle diseases are typically slow-moving, statistics for
these are only available at an annual basis creating sparsity issues
when training temporal models. Furthermore, these statistics are
released with several months of lag and the data for 2016 is not yet
available as of March 2017. However, even though the prevalence
rates for these diseases only change by a few percentage points
year-over-year, that small change still translates to billions of US
dollars, motivating attempts to bring down the latency for creating
In our research we present novel spatio-temporal calibration
approaches that overcome data sparsity issues by leveraging both
temporal and spatial trends for model ing of such slow-moving
diseases and trends. Our approach takes into account regional
variation in population sizes and Internet penetration. Furthermore,
we show how the predictive performance of the ed models can be
further improved by combining both historic online data and recent
online data. We also suggest a bootstrapping method of feature
selection using Google correlate and related-search queries. Finally,
we describe important idiosyncrasies related to using Google Trends
and suggest best practices.