With huge investments being made into big data applications and business intelligence, it is no surprise to look forward to gathering diverse types and formats of data from equally diverse and non-uniform sources. However what actually is a surprise is the level of preparedness of companies for the next step i.e. the proper analysis of the collected data to aid their business growth. In the data value chain, an equal emphasis is given to –
- Collection and storage of data
- Processing of data
- Data Analytics
While competent IT infrastructure will help to design data warehouses and design schemas to collect and store huge volume of structured and unstructured data, the value chain doesn’t end there. There is a need for specialists to come in and establish a method to the madness, by using this humongous data for data- driven decision making with help of smart data analytics.
This is where data science comes into the picture.
What is data science?
Broadly speaking, data science helps businesses to uncover insights from mounds of disparate data from diverse sources. As compared to a data analyst who looks at data from a single source and in a single format, a data scientist looks at data from multiple sources and multiple formats. Because of their strong business acumen, they not only address business problems, but also choose the right type of problems that has the potential to bolster the value proposition of the organization. Imagine the world of potential opportunities offered with data science. It not only affects traditional domains such as economy, finance, and humanities, but provides a solid weapon to business in the form of competitive and market intelligence.
Right from its introduction as a separate discipline in 2001, data science for business has become a powerful tool in the hands of companies across size and scale to offer better products and services to targeted customers. It also helps them to reveal areas in business that can provide a growth impetus and provide an unbeatable competitive edge. Looking at its USP, associated technologies too have evolved greatly since its early days – right from data mining, data engineering, website scraping, and predictive analysis, to noSQL, machine learning, pattern recognition, artificial intelligence, and probability models.
What do data scientists do?
It is not for no reason that Harvard Business Review has rated Data Science as one of the ‘sexiest job of the 21st century’. However just like Big Data, there are certain misnomers about data scientists worth clarifying. First of all, contrary to popular notion, data scientists aren’t computer science majors. Rather some of the most successful professionals in this area tend to come from mathematician and physicist educational background. And there is a good reason for this. The strong mathematical background, computing and analytical competency, and viewing the larger picture – all these traits are more common in physicists and mathematicians. Blend in a bit of statistical aptitude, a inch of programming flair, some visualization competency, and most importantly, a lot of business sense – and what you get is a profile of a data scientist.
As per Arjun Bhambri, VP of Big Data products at IBM “A data scientist is somebody who is inquisitive, who can stare at data and spot trends. It’s almost like a Renaissance individual who really wants to learn and bring change to an organization.” An inquisitive attitude, use of ‘what if’ analysis, a knack of putting conventional thought process to question, and coming up with an attractive visualization to communicate the findings to the stakeholders – these define the thinking of a data scientist
Data scientists use high value tools such as indexing solution that scans almost a billion documents daily and throw out interesting patterns, trends, and segmentation opportunities. Got more than a billion documents to scan and analyze in quick time? Then opt for clustered search engines that exponentially increases the value of meaningful correlations. Use of indexing solutions on enterprise search solutions helps data scientists to reduce costs, improve operational efficiencies, lower staff dependency, or even process optimization.
Like an entrepreneur, data scientists combine patience with out-of-box thought process. They literally swim in reams of data all around them and emerge with gleaming insights that pushes strategic decision making in the right direction. In a nutshell a data scientist carries out these tasks
Step A – Pre-planning
- He designs the schema of data to be stored and processed in an ETL system.
- He uses the data against computing tools to gain BI from huge volume of data (more than 5-10 TB and thousands of variables/characteristics)
Step B – Getting into the groove
- He asks difficult questions from a mind boggling array of structured and unstructured data
- He formulates hypotheses and tests it so that the best solution can be found to known and unknown problems
Step C – Bringing in the artillery
- He passes the data through multiple tools, techniques, programs, and algorithms
- He uses the skills of traditional science (graphs, statistics, physics, mathematics, machine learning) and social science (geo-politics, economic, sociology)
- He has the patience, expertise, and experience to traverse through mind numbing volume, variety, and velocity of data
- He formulates the right approach to carry out data analytics on such humongous data so that patterns can be revealed, hypotheses can be tested, and insights be derived.
Step D – Story telling
- He has a knack of weaving an engrossing story around the trends, signals, and insights he has derived
- He uses the skill to convey his findings to the senior leadership and business management, but in a way that they can understand and relate to.
- This storytelling can be in the form of data visualizations, projections, charts, or graphs
Is data science leaving the ‘science’ behind?
The increasing proliferation of data scientists into organizations of various size, scale and shapes, has definitely put the methodology of data scientist into perspective. The universal fact is that data scientists of today’s times need to possess three key career traits –
- The ability to unravel insights from shapeless data and convey their findings in a language that their business stakeholders can comprehend.
- Using various tools like visualization, machine learning, predictive analysis, data mining, warehousing, cloud computing, Hadoop, artificial intelligence, or plain old computer programming to achieve step#1
- A burning curiosity of the world they work in. They need to have a desire to go beyond the traditional and the obvious. This helps them to uncover insights which skipped the attention of the stakeholders previously.
Accomplishing these steps will involve possessing an ever-burning fire of out of box thinking, propelled by their ability to question the known, and embrace the unknown. This can be done by formatting hypotheses, testing it against several influencing variables, and then designing insights that wasn’t previously known to business owners. This is preciously where the ‘scientist’ aspect crops up as a vital success driver to letting the data scientists carry out their tasks.
Take the case of Jonathan Goldman and LinkedIn. When he joined the company, the ‘social’ element was sorely missing from the website. It had scores of user accounts, but was missing the necessary spark to help expand connections with individual with same common denominator – be it same schools, skills, company, roles or even industry. This gave him the impetus to scrutinize and check the most likely network a given profile would land in. His theories, testing, and pattern finding led him to see that if a user profile can be shown a network he or she would likely be interested in connecting with. This ‘scientific’ approach to data analytics led to one of the most successful space on LinkedIn, the “People You May Know” portion of the site.
This success story is one of the instances that highlight that science can never be left out of data science. Since the primary goal of the data scientist will be to unravel insights and answers that couldn’t be unearthed through conventional methods, the element of ‘science’ is bound to come in at one stage or the other when the data scientist is working within the realm of the ‘unknown’ within big data applications.
Our parting thoughts
In order to propel an organization’s growth trajectory, it is essential to focus on what the customers are saying and what the competition is doing. In order to do so, it needs to listen in to mind boggling array of sources of data – social signals, community forums, reviews, et al. This is then passed through various data processing stages to optimize the layout and format of the various data sources. The most interesting thing is what happens when such processed data is passed through BI and business analysis tools. Without this step your data will remain powerless. Hence we feel that the role of data scientists is here to stay.
Whether they actually implement the ‘science’ part of their job title is another matter. As a part of their job description, they need to make use of empirical thought processes and question the set norms. Adhering to this key factor will ensure that they do not leave behind the ‘science’ in data science.