In case you have decided to grow your business, with the support of data, and have decided to put together a data science team of experienced people, it’s very important to keep in mind, that data science is a team sport and you need to hire people who work better in a team-setting, no matter what their individual capabilities be. It takes a large group of people, all having worked under different settings, coming and working together, to solve practical data science projects. So who would your ideal data science team include?
The data scientists, who will be the driving engines for innovation in the projects.
The project managers, to make sure that everyone sticks to a timeline and projects develop into boxed science experiments.
Data engineers who would perform and develop the infrastructure.
People who have contacts outside, to help with getting data as well as feedback, main people involved in management posts.
A data engineer is a person who would have to deal with setting up the required infrastructure, environment, and also, convert theoretical algorithms and ideas into running code and applications. He might construct a database, or pull data out of that database for people to analyze. He might also have to convert ideas into production level machine learning products and convert them into a client-server model, so that they can be applied to a huge database of observations, or even run in real time, so that the product uses data, to get smarter with time.
So a data scientist might be somebody who would go and pull data out of a database, analyze it, perform experiments on it, visualize it and communicate those results to the data science manager, and to other people in the organization who will then move things forward. Often, a data scientist will pass on the implementation of any machine learning algorithm or prediction algorithm they develop, to the data engineer, who will then make sure that the program can run at scale.
The third key person is the data science manager- the person in charge of keeping the team in place and running efficiently. In an ideal world, you might not have even needed a data science manager, but then, the data science manager makes sure that everybody interacts with each other and that things keep moving. They also recruit and build the data science team, interact with upper management in the organization and collaborators that are at their same level across the organization, to make sure that they get all the information across.
They advertise the findings of the data science team to other people, and their capabilities and encourage people to bring their problems to the team.
They work together as a unit, and often each of these people is working on individual projects, or individual sub-problems of a data science problem and then they come together and have joint group meetings and joint presentations, where they discuss their ideas and the challenges that they are facing. They also interact with external folks for inputs and what they think would be attractive to customers. They also have to keep everyone updated on the regular infrastructure costs, as well as monthly costs such as AWS and more.
So you start your hunt, with finding the perfect data engineer for your team. But who should that perfect person be?
They need to have great hardware knowledge, both in terms of storage and in terms of computing, along with database software knowledge. You’ll be dealing with a massive amount of data. So these qualities are pretty important so as to run at scale those data processes and those data prediction algorithms that you’ve developed, without any interruption. They also need to know enough of data science and algorithms so as to interact with the rest of the members of the data science team. Although the background for data engineers is most often computer science and computer engineering, there is no such hard and fast rule, and they could also come from other places. They might come from a quantitative background and might have picked up some computer science knowledge on the go, via online courses in Coursera or maybe they took some courses in person. They might also need to know how to do things like implementation and running complex algorithms using software like Hadoop, which is a parallel processing infrastructure. Now it’s not necessarily true that they need to know any one of these latest buzzwords. But it is true that they need to have the combination of skills that allow them to build a data infrastructure that’s maintainable and built to scale.
And again, they need to be able to solve trivial problems on their own. This is again, a person that often will be one of the few people who is solely responsible for the data infrastructure. And so, often they need be able to answer some questions themselves. They need to be able to go out and gather various information from the Internet. They need to be able to ask questions and figure out what’s the right hardware be it online, or through forums. They need to be aware of security measures and protocols. The role is not well defined in the sense that new features and platforms come out every other day. So the data engineer has to know which tool to pick up and which technology to integrate and so on.
You cannot build a data science team that does not have one or more Data Scientists since they act as the engine of the car. A data scientist must have the set of skills that allow them to perform all of the research, analysis, and discovery related tasks that he might need to do on a day to day basis. If you’re at a very early stage, and you’re hiring your first data science team, they might have to be a little bit more of a jack of all trades. They might need to be able to do parts of data engineering, as well as data science. In general, they need to be able to do statistics as well as coding. They need to know quite a bit about prediction and machine learning. Those are two different tasks, inference, and prediction. It’s important to know that some people will be better at one, and some people will be better at the other. It boils down to what your organization is doing. If you’re doing more of, building out predictive tools, they might need to be a little stronger at machine learning.
But if you are more into experiments and need to come up with new hypotheses, they might need to be a little bit better at statistics and inference. In the end game, they need to perform the statistical inference or prediction that they need to do to crunch up the data, and then communicate those results. So data communication skills involve both being able to analyze the data, and create intelligent visualizations so as to communicate those findings and predictions in a way that people with no idea of data science understand how the data is associated with a real-life business problem. R and Python are the most popular among scientists, and even if they do not know both, one can easily pick up on the fly. Knowing some kind of visualization like angular.js would be a plus. They would have experience with at least one database -, MongoDB, SQL, Cassandra, or PostgreSQL, where they’ve actually interacted with pulling data out of a database.
The last, and probably the most important member of the data science team is a data science manager. Although it might seem that there is no need to manage a grown-up, a well-experienced team of data analysts, scientists and engineers, without a data manager, the team might even fall apart, due to ego clashes, a difference in opinion, etc. They work as communication bridges between the members of the data science team and are also responsible for identifying, and recruiting new individuals. They help everyone identify their personal goals and priorities, identify those problems within an organization that need to be solved by data science, and sort of putting the right people on the right problem.
So time to buckle up, build the right team and beat the world in the quest for data?