Last Updated on by
People all over the world are generating more and more data, every single day. Data is literally growing at an exponential rate- such that organizational data of certain companies has reached petabytes. Ninety percent of the world’s data has been created in the last two years alone. And so, people have started to analyze this data, but the problem isn’t the analysis. The problem is the management of data.
As per Gartner- “Big data is high-volume, high-velocity, and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.”
The five Vs of big data
The huge epidemic of big data, is often described using five Vs- Volume, Velocity, Variety, Veracity, and Value.
- Volume refers to the vast amounts of data generated every second, from all the millions of mobile devices in use these days. All those emails, twitter messages, photos, video clips, sensor data and more that we produce are data that are most valuable to many companies.
- Velocity refers to the speed at which new data is generated, and the speed in which it can be shifted from one place to another so as to increase profitability.
- Variety is something all of you can relate to. Data, for most organizations, meant databases and excel sheets, in the past. However, today data means a lot more. Eighty percent of the world’s data is unstructured- think of the photo updates, videos, and twitter updates.
- Veracity refers to the level of trustworthiness of the data. With data growing to magnanimous sizes, it is important that we try to keep the data as clean as possible since dirty data is a virus that can inflict pain upon you like no other.
- Value is another V to take into account when looking at Big Data. You gather a lot of data and decide to work on it. All well and good. But what value does the data add to your company? What benefit do you get out of investing in data is what is important?
So, in case you are taken in by the buzz of big data and data-science, I suggest you look at the five resources listed below.
Datacamp is best for people with little or no experience in Python and R. It starts with the very basics and has a stepwise approach, where you are given one problem after another, in each of which there is one or more incomplete lines that you have to correct. It is a godsend for beginners and priced cheaply. It also gives discounts up to 60-70% every few months, on special occasions, so be on a lookout for that, in case you want to subscribe for a year and access the premium project and features. It has several tracks that you can master, which consist of some 20-30 courses each. Popular tracks include-
- Data Scientist with Python
- Quantitative Analysis with R
- Data Manipulation with Python
- Importing & Cleaning Data with R
- Data Visualization with R
If you have less time, you can also do smaller courses like-
- Intro to Python for Data Science
- Introduction to R
- Joining Data in PostgreSQL
- Intermediate R
Coursera is one of the best platforms for learning anything from data-science to military-history and I have first-hand experience in some of the courses it offers. It offers a wide range of courses from the best colleges all over the world. You can choose to audit the courses and get access to the course materials for free, or you can pay a monthly subscription and finish the courses in your time and get a certificate from the college that’s taking the course. You can also apply for financial aid, that can amount to a hundred percent scholarship in case you can explain well, why you need one, or a short-term loan.
Some of the best Data Science courses on Coursera are-
- Data Analysis and Presentation Skills: the PwC Approach– This Specialization will help you get a hands-on experience with data analysis and the know-how about turning business-intelligence into real-world outcomes. It will give you a better understanding, filtering, and application of data, that will, in turn, help you solve problems faster. You will get adept with Microsoft Excel, PowerPoint, and other common data analysis and communication tools. Most importantly you will learn to read data and present it.
- Big Data, UCSD– In case you need to understand big data and how it will impact your business, this specialization is for you. You will be able to obtain hands-on experience with the tools and systems used by big data scientists and engineers like Hadoop with MapReduce, Spark, Pig, and Hive. You will learn to perform predictive modeling and leverage graph analytics to model problems. In case you do toil till the very end, you will be able to complete a Capstone Project, developed in partnership with data software company Splunk, in which you will be allowed to apply the basic concepts that you learned.
- Data Science Specialization by Johns Hopkins University– This Specialization covers the concepts and tools you’ll need throughout the entire data path, right from asking the right set of questions to making inferences as well as publishing results in a format that is simple and yet powerful.
- SQL for data science, UC Davis- This course is designed to give you a primer in the fundamentals of SQL along with working with data that will help you migrate to the database needs of data science world. The course starts with the very basics and assumes zero SQL knowledge. The complexity grows steadily and gradually will have you write both simple and complex queries to help you select data from tables.
Check out Datastock in case you need comprehensive, clean and ready-to-use pre-crawled web datasets from different industries across the globe. The solution is ideal for those who are looking for ready-to-use datasets to perform analysis and gain insights and acquire data science skills.
Kaggle is the place to do data science projects, and one of the most popular websites among budding data scientists. It gives various options like:
- Starting your own new project
- Exploring projects created by others
- Joining one of their sponsored competitions
Their hands-on method teaches you all the skills you need to become a data scientist, data analyst, or data engineer. You can learn in various ways-
- Writing code
- Working with data
- Building projects