Massive data-sets of collected information are collectively called Big Data. From consumer behaviour to measuring security threats to monitoring financial risk and market and even as a tracker for baseball players—big data is used everywhere.
Significantly, big data’s most splendid application is to improve health and healthcare. Global public health is now even more intricate and intimate. The recent Ebola alert is a classic case of how epidemics are deadly threats to not just poor West African countries but also a First World country with finest healthcare and hospitals—the United States. And bigdata watches it all.
Epidemiologists have been asking the big data question for some time now. How to use big data effectively to contain—or even prevent—epidemics? Since epidemics are all about incidence, forecasting, distribution, and control, a big data solution is perhaps the cure it needs.
But how can big data endeavor us to do this? Before that it is primarily important to understand that any such data is unstructured, and big data analytics only will help visualize global health trends more clearly. Digital epidemiology is the new tool in the conflict with global health hazards.
Big data analytics can help track spread of disease based on a continuous data stream and visualize global outbreaks. This ultimately aids in zeroing in on the source. Since information can be acquired, shared, and even transmitted easily now, mobile phones, devices, surveillance systems all become data hotspots—dots in the evolving picture of an epidemic spread.
Another aspect is mining data off the counters. Medicine sales are early indicators of possible outbreaks and help in identifying short-term infection trends. Data in this aspect has revealed that for respiratory and gastrointestinal illnesses the lead time we can get is as much as two weeks!
The argument against is that just because we find a pattern doesn’t mean we stop the pathogen from spreading. But in the case of the cholera epidemic in Haiti in 2010 that claimed nearly 7,000, the news first went ‘viral’ over Twitter before officials got on ground and gathered real reports. Similarly, during the H1N1 pandemic of 2009, big data was unwittingly used by a manufacturer based on daily tissue paper use forecasts to gain a competitive edge and increase sales!
The US military’s Essence (Electronic Surveillance System for the Early Notification of Community-based Epidemics,) is already using big data and analytics to predict and combat epidemics and is used to support ground troops and those overseas on duty’s call.
IBM scientists collaborating with the John Hopkins University andUniversity of California, San Francisco are using smarter data tools to fight infectious diseases, particularly dengue and malaria. Open-source framework is proving invaluable contribution here even as analytical models and advanced mathematical skills are being employed. Their use of big data is slightly different. These scientists observe, record, and analyze climatic data (rainfall, temperature, soil pH) to understand how parasites and vectors interact and how eventually a human outbreak can be predicted and monitored!
With the current Ebola crisis, the Centers for Disease Control (CDC) forecasts up to 1.4 million new infections. Big data was at ground-zero way before anything else, and as Matthew Wall outlines, it helped in mapping the disease and proved more than useful in tracking the Ebola contagion. Massive computing power by HealthMap filtered early indicators from millions of social media posts and Internet pages and monitored the Ebola scare even before it was formally announced.
While big data continues to have impact on rate and spread of disease, what is needed is the foundation and development of computational systems to achieve real-time prediction and predictability. While traditional on-ground reports hold precedence over all social media noise, it is necessary to use large data-driven models coupled with machine-learning and algorithms to better forecast any outbreak.
The main criticism is that sourced from the Internet and other channels, big data is likely to be deceptive and even lead to false interpretations. The horror of what erroneous information can wrought is incredible—health workers and agencies rely on this data to contain and control epidemics. Since epidemics transcend international boundaries and pathogens are as unpredictable, abnormal data patterns can emerge either diluting or blanketing the real threat. Taking any disease data at face value is thus a grave folly.
Big Data offers the scope for identifying patterns, recognizing shifts, and gauging impact of epidemics.
Across disease data aggregates, analytics can reliably demonstrate illness reporting, disease anticipation and draw attention to multifaceted sentiments involved in the spread of a disease.
Of course, a great deal hinges on managing this data smartly and having powerful analytics, the key challenge is on the ground: educating the public about how diseases spread.