Data is literally growing at an exponential rate, such that it has reached petabytes! Could you believe that ninety percent of the world’s data has been created in the last two years alone. With this volume, data management has been a tricky affair. No wonder essential data science skills have taken the front seat.
The Five Vs of Big Data
Big data is often described using five Vs. Namely — volume, velocity, variety, veracity, and value.
- 1. Data Volume refers to the vast amounts of data generated every second, from all the millions of mobile devices in use these days. All those emails, twitter messages, photos, video clips, sensor data and more that we produce are data that are most valuable to many companies.
- 2. Data Velocity refers to the speed at which new data is generated, and the speed in which it can be shifted from one place to another so as to increase profitability.
- 3. Data Variety is what we all can relate to. Data, for most organizations, meant databases and excel sheets, in the past. However, today data means a lot more. Eighty percent of the world’s data is unstructured, think of the photos, videos, and twitter updates you make.
- 4. Data Veracity refers to the level of trustworthiness of the data. With data growing to magnanimous sizes, it is important that we try to keep the data as clean as possible since dirty data is a virus that can inflict pain upon you like no other.
- 5. Data Value is the true worth of your data. You gather a lot of data and decide to work on it. All well and good. But what value does the data add to your company? What benefit do you get out of investing in data is what is important?
So, in case you are taken in by the buzz of big data and data-science, I suggest you look at the five resources listed below.
Datacamp is best for people with little to no experience in Python and R. It starts with the very basics and has a stepwise approach, where you are given one problem after another. It is a godsend for beginners and priced in the budget range.
Look out for the heavy discounts that Datacamp offers, in case you want to subscribe for a year and access the premium project and features. It has several tracks that you can master, which consist of some 20-30 courses each. Popular tracks include:
- a. Data Scientist with Python
- b. Quantitative Analysis with R
- c. Data Manipulation with Python
- d. Importing & Cleaning Data with R
- e. Data Visualization with R
If you have less time, you can also do smaller courses like:
- a. Intro to Python for Data Science
- b. Introduction to R
- c. Joining Data in PostgreSQL
- d. Intermediate R
Coursera is one of the best platforms for learning anything from data-science to military-history and I have first-hand experienced it. You can choose to audit the courses and get access to the course materials for free. Some of the best Data Science courses on Coursera are:
a. Data Analysis and Presentation Skills: the PwC Approach– This Specialization will help you get a hands-on experience with data analysis and the know-how about turning business-intelligence into real-world outcomes. It will give you a better understanding, filtering, and application of data, that will, in turn, help you solve problems faster. You will get adept with Microsoft Excel, PowerPoint, and other common data analysis and communication tools. Most importantly you will learn to read data and present it.
b. Big Data, UCSD– In case you need to understand big data and how it will impact your business, this specialization is for you. You will be able to obtain hands-on experience with the tools and systems used by big data scientists and engineers like Hadoop with MapReduce, Spark, Pig, and Hive. You will learn to perform predictive modeling and leverage graph analytics to model problems. In case you do toil till the very end, you will be able to complete a Capstone Project, developed in partnership with data software company Splunk, in which you will be allowed to apply the basic concepts that you learned.
c. Data Science Specializatin by Johns Hopkins University– This Specialization covers the concepts and tools you’ll need throughout the entire data path, right from asking the right set of questions to making inferences as well as publishing results in a format that is simple and yet powerful.
d. SQL for data science, UC Davis- This course is designed to give you a primer in the fundamentals of SQL along with working with data that will help you migrate to the database needs of data science world. The course starts with the very basics and assumes zero SQL knowledge. The complexity grows steadily and gradually will have you write both simple and complex queries to help you select data from tables.
Check out Datastock in case you need comprehensive, clean and ready-to-use web datasets from different industries across the globe. The solution is ideal for those who are looking for ready-to-use datasets to perform analysis and gain insights and acquire data science skills.
What’s great is, you get a free sample dataset before you make the purchase. You can test the data quality for yourself and then decide.
Kaggle is the place to do data science projects, and one of the most popular websites among budding data scientists. It gives various options like:
- a. Starting your own new project
- b. Exploring projects created by others
- c. Joining one of their sponsored competitions
Their hands-on method teaches you all the skills you need to become a data scientist, data analyst, or data engineer. You can learn in various ways:
- a. Writing code
- b. Working with data
- c. Building projects