Wikipedia Positioned To Track Disease Outbreak: The Model That Could Rival Current Resources

Stephanie Castillo | Medical Daily | November 13, 2014

Since its launch in 2001, Wikipedia has become the sixth most visited site in the world. Researchers reported the site contains around 30 million articles in 287 languages and it serves roughly 850 million article requests per day. More importantly, it’s a free, open source of data that is gaining traction as an “effective and timely disease surveillance.”

“Traditional, biologically-focused monitoring techniques are accurate but costly and slow; in response, new techniques based on social Internet data, such as social media and search queries, are emerging,” researchers wrote. “These efforts are promising, but important challenges in the areas of scientific peer review, breadth of diseases and countries, and forecasting hamper their operational usefulness.”

In order to see if these challenges can be overcome, researchers used two data sources — Wikipedia article access logs and official disease incidence reports from the World Health Organization — to build a linear model to analyze around three years of data for seven diseases (cholera, dengue, Ebola, HIV/AIDs, influenza, plague, and tuberculosis) in nine different locations (Haiti, Brazil, Thailand, Uganda, China, Japan, Poland, United States, and Norway). Basically, the Internet keeps track of a user's health-related searches, and these searches can be captured and used to derive actionable information.  With the WHO's data and online traffic of select Wikipedia articles, researchers were able to warn against (forecast) incidences of disease at least 28 days ahead of time. The one excepetion were rates of tuberculosis in China...