The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic
Twitter is a free social networking and micro-blogging service that enables its millions of users to send and read each other's “tweets,” or short, 140-character messages. The service has more than 190 million registered users and processes about 55 million tweets per day. Useful information about news and geopolitical events lies embedded in the Twitter stream, which embodies, in the aggregate, Twitter users' perspectives and reactions to current events. By virtue of sheer volume, content embedded in the Twitter stream may be useful for tracking or even forecasting behavior if it can be extracted in an efficient manner. In this study, we examine the use of information embedded in the Twitter stream to (1) track rapidly-evolving public sentiment with respect to H1N1 or swine flu, and (2) track and measure actual disease activity. We also show that Twitter can be used as a measure of public interest or concern about health-related events. Our results show that estimates of influenza-like illness derived from Twitter chatter accurately track reported disease levels....
...An estimated 113 million people in the United States use the Internet to find health-related information [1] with up to 8 million people searching for health-related information on a typical day. Given these volumes, patterns showing how and when people use the internet may provide early clues about future health concerns and/or expectations. For example, in the case of influenza, search engine query data from Yahoo [2] and Google [3] are known to be closely associated with seasonal influenza activity, and to a limited extent, actually provide some information about seasonal disease trends that precede official reports of disease activity.
Search query data provides one view of internet activity (i.e., the proportion of individuals searching for a particular topic over time), albeit one that is both noisy and coarse. The general idea is that increasing search query activity approximates increasing interest in a given health topic. Since some search query data also carries geographic information (generally based on the issuing IP address), it may also be possible to detect simple geospatial patterns. But search query data do not provide any contextual information; questions like why the search was initiated in the first place are difficult to answer. People search for health information for any number of reasons: concern about themselves, their family or their friends. Some searches are simply due to general interest, perhaps instigated by a news report or a recent scientific publication. Without sufficient contextual information, the relation between search query activity and underlying disease trends remains somewhat unclear....
- Login to post comments