Big data in Science
Rapidly making in-roads into the scientific research domain, the potential of big data in advancing research is undeniable.
Traditionally though, big data has always been used by scientists. Large data sets were used by high-energy particle and nuclear physics. When the Human Genome Project finished mapping in 2003, they were sitting on a data set worth 3 billion in alphabets.
Of course, this is discounting the many codes for specific proteins which would make this number even more fantastic!
What has now changed is the magnitude and the scope of data. New-age sciences like the -omics (genomics, proteomics etc.), astrophysics, and astronomers in particular, are looking to harness big data and analytics to answer questions long in the dark.
Science = Big Data!
The scientific method relies on data: experimental and observational, yet always empirical. Big data is cold, unbiased information and available in large volumes – we’re talking petabytes! Data is generated exponentially every second.
This data is as diverse as: imaging, phenotypic, molecular, health, social demographics, behavioral, and others.
Bigdata analytics is invariably a path that must be set upon.
Who is using it
The goal of scientific research is to accelerate and improve on our understanding as it operates fields as diverse as climate and biodiversity research, seismology, neuroscience, human behavior and every other sphere of knowledge. Add to this ontology, conservation, ecology, biophysics, and evolution and the list of big data applications doesn’t even yet begin!
Computation has become an integral part of the scientific process and everyone from biologists, chemists, physicists, astronomers, earth and social scientists do, must or can use big data.
In fact, a few of the Higgsboson particles were discovered earlier this year by algorithms that recognized their signature among terabytes of collision data, something that isn’t humanly possible!
What this means is that growing computational and analytical programs are evolving and used to infer from data, a position formerly held by the researcher.
Social sciences which relied on smaller sample data today use real-time data crunching to derive more insights from reports. It makes easy to work with big data at the population level and offers more leverage in answering questions.
All is not well
While big data and analytics does slingshot knowledge by allowing researchers to refine questions by analyzing large data volumes, theinsights that are drawn from it can oft be poor. More often than not, this is a problem not with the data, but its structure and form. A big problem with big data is its high redundancy of unusable data values.
As more data is added to existing silos of knowledge, it becomes increasingly difficult to stay focused and sift through layers and layers of information that keeps coming in.
Not necessarily does large data mean more observations and ultimately, a deeper understanding; big data is only useful if it is combined with the right tools and theories.
Sitting on silos of data and magically expecting scientific breakthroughs is simply unwise.
Microscope vs. Computers
While big data may be actually be the magic stick in behavioral sciences, genetics, economics, psephology or even particle physics, it seems that it may not have takers in life sciences; yet.
Large data sets coupled with greater, faster computational machines threaten individual investigator-inspired projects. Typically, a researcher chews on a hypothesis and then proceeds to test it with sets of experiments. The results can be visualized in totality of the hypothesis or questions close to its domain. With big data the numbers speak for themselves, but it isn’t considered puritan by many.
Big data costs money, lots of it. With crunch in grants and research money, not many researchers find it easy to harness big data. Moreover, there is also the concern of big data projects funded while bleeding smaller, hypothesis-driven experiments.
Climate-change scientists use big data from satellites as observational data tocreate and predict better models.
The National Institutes of Health, meanwhile, has made a 200-terabyte data set on human genetic variation available online. The data is stored on Amazon Web services and free for researchers to query and analyze.1
Even as grants and research programs are turning to accommodate the need for infrastructure to make sense of big data, universities are setting up dedicated centers to train and churn data scientists and engineers and big data is turning into a science all by itself!