Never mind stethoscopes and blood tests: could Tweets, hashtags and Google searches become diagnostic tools for influenzas and other contagious diseases?

Much research has focused on using data mining and analytics to monitor diseases and predict their progression. For example, HealthMap – a website created by researchers at Boston’s Children’s Hospital – collects and analyzes information from a variety of platforms, including social media, online news and travel sites, to provide early detection and surveillance of disease outbreaks. HealthMap made headlines in 2014 for detecting references to Ebola infections from local newspapers in Guinea, more than a week before government authorities in the country informed the World Health Organization about the outbreak.

The website Sickweather, meanwhile, was not designed as a public health tool but is instead geared towards individual use. The so-called “Facebook for hypochondriacs” displays maps that mark off the location of users who have self-reported symptoms, allowing you to “burrow down to street level to see who is sick in your neighbourhood.” Sickweather also mines Twitter and Facebook to supplement the data added by its own members; in 2011, the site reportedly detected an outbreak of whooping cough two weeks before health officials issued a public statement. Mind you, some things are better left to the imagination – after someone sitting behind you on the bus coughs or sneezes, maybe you don’t want to pull up a map and see a bright blue “sick” bubble.

Drawing on social media and other online data to predict and track diseases isn’t exactly new. The most well-known tool is probably Google Flu Trends, a website launched back in 2008 that compares flu-related search activity to reported flu incidence rates. The Centers for Disease Control and Prevention (CDC) notably monitors Google Flu Trends for possible early warnings of outbreaks.

Other researchers have tracked keyword usage on Twitter to track the rate of flu-related Tweets, and thus predict future infection rates of the flu itself, or examined how the “digital traces” generated by people in metropolitan areas can be used to map how a disease might spread through a population, drawing on data from public transit services to build model networks of contact.

Of course, traditional public health methods provide a level of accuracy that is not (and may never be) possible with social media and analytics. For instance, in October 2011, the singer Rihanna reportedly skewed Flu Trends data after Tweeting about having the flu, leading to a spike in search queries from users who were presumably curious about the celebrity’s health. The underlying assumption of Flu Trends (and many other tools that draw on social media and online data) is that we can use online behaviour – searches, Tweets, and so forth – as an indicator of infection status. Yet as Rihanna’s concerned fans demonstrated, this isn’t always the case.

Sure, these kinds of ‘Big Data’ approaches are typically viewed as potential complements to traditional methods, providing real-time and efficient data that could enable better informed (and faster) public health decisions. But this is an obvious limitation of using large data sets generated from social media and other digital sources: it can be difficult to distinguish between healthy and sick users. Despite all of my recent flu-related searches and online activity, for instance, I’m not sick – I was just writing a blog post.

However, Penn State researchers have been working on a system that could circumvent this barrier, possibly addressing the questionable reliability of Flu Trends and similar tools. In contrast to these kinds of tools, which attempt to identify or predict the course of outbreaks, these researchers were actually trying to determine whether specific individuals were sick or not – in other words, using social media data as a kind of diagnostic tool. Starting with data from people that were known to be sick, they examined Twitter data and tried to develop a model that would correctly match the diagnosis. This included looking at the content of tweets (keywords such as “medicine” or “fever”) and also the rate of tweeting, as illness supposedly changes an individual’s tweeting behaviour. The system matched the diagnosis over 99 percent of the time.

All together, this extensive body of research on the potential public health uses of social media and other ‘Big Data’ points to a promising tool for epidemiologists and health officials. Maybe the next step is looking at Instagram data to see if people are posting pictures of chicken noodle soup.

Further reading – also see:

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0012948

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0139701

http://phe.oxfordjournals.org/content/early/2013/08/13/phe.pht023.full

http://ijcai.org/papers13/Papers/IJCAI13-435.pdf