If you have a data project–whether it’s setting up a data pipeline, making sense of the data you’ve already collected, or something more complex–you’re going to need data scientists with the right skills. What skills and qualifications you’ll need will depend on the questions you’re trying to answer and the technologies you’ll need to answer them.
Data science is a complex field with a number of specialties. Their mix of technical and analytical skills make data scientists some of the most in-demand experts around. But not every data project requires a PhD in statistics and machine learning skills. In this article, we’ll walk through some things you should consider before choosing a data scientist, and also explain the difference between common data science areas of expertise, so you can find the right talent for your project. From there, we’ll help you estimate a budget for your project based on the skills you’ll need.
FIGURE OUT WHAT KIND OF DATA SCIENTIST YOU NEED
There’s a good deal of confusion and ambiguity around the term data scientist. A data scientist can actually describe several distinct specialties, which we’ll outline here.
- A data analyst is someone who spends most of their time querying databases. They’re often the most junior level of data scientist, and may or may not have much experience with statistical analysis or algorithms. They’re best suited to answering ad hoc questions that can be answered by pulling data from Excel or SQL databases. They should also be able to produce basic visualizations with tools like Tableau as needed. If you already have a well-constructed database and just need someone to answer specific questions, a data analyst might be right for you.
- A data engineer is less engaged in running specific queries than in building the systems to help provide the answers. Unlike data analysts, data engineers often work directly with developers, ensuring that data is properly being captured and stored by the relevant systems. They also ensure that scheduled processing jobs run on schedule. They are most likely to need expertise with big data frameworks like Hadoop and Spark, as well as knowledge of production-oriented programming languages like Java and Python.
- A data scientist needs to be able to oversee complex data projects from beginning to execution. In addition to having great technical skills, they need to be able to effectively communicate their findings to others in the organization. They should also be able to manage a team. They should be able to query databases like an analyst, but also able to perform much more sophisticated analysis using statistical techniques and machine learning, depending on the task at hand.
START ASKING THE RIGHT QUESTIONS
A common mistake that non-data people tend to make is thinking that the job of a data scientist is just to explore an organization’s data to look for insights. This is especially common in organizations that may have been collecting data for a long time but haven’t had the time or resources to dive into it. Another mistake might simply be to think that your organization needs to undertake a complex data project (building a recommendation engine with machine learning algorithms) without first considering what problem the project is trying to solve.
So, before you go any further, ask yourself: What problem am I trying to solve? A data scientist can help guide your organization’s decision-making process with data. Here are some common questions a data scientist might be able to help you answer:
- How do we improve user retention?
- Who are our most valuable customers?
- How can we decrease turnover among our employees?
- What new features should we prioritize?
As you can see, these questions get right to the core of your business goals. Once you’ve settled on a question or set of questions, you can start getting into more specifics. What data will you need to answer these questions? Are you collecting that data at present? What metrics will you use to measure success?
Identify the Skills and Technologies You Need
What kind of data scientist you’re looking for will depend on a couple of things: What kind of question are you trying to answer, what the current state of your data operation is, and what technologies is your team currently using.
For example, a data analyst who’s trying to segment your customer base is going to use very different tools than a data engineer who’s trying to build a streaming analytics platform that processes thousands of transactions a minute. Where the former will likely be using Excel, Tableau, or SQL to query your databases and produce nice graphs, the latter will need to know how to set up a data processing framework (like Hadoop’s MapReduce or Spark) and a distributed file system.
A lot will also depend on your current data setup. Does answering your question require you to start collecting data that you haven’t before? If so, you’ll need a data engineer to work with your dev team to make sure trackers are properly set up and that the data being collected is going to the right place. If you don’t already have a steady data pipeline, this can be a significant engineering outlay by itself. Once you’re collecting the data you need, how much will it need to be processed? If your data is messy (meaning it needs to be reformatted or otherwise transformed before it can be used) this adds another layer of complexity.
At present, the programming language Scala and the big data framework Spark are extremely valuable. According to the Stack Overflow Developer Survey, data scientists who use Scala, Spark, and Hadoop command the highest rates in the field, while those who use R, Java, and Python typically charge somewhat less.
How Much Is This Going to Cost?
Data science is a hot field, and qualified data scientists can charge more than other kinds of developers or business analysts. On Upwork, rates charged by freelance data scientists can range from $36 to $200 an hour with an average project cost of around $400. Keep in mind, however, that these rates go up when more specific skills, like Scala and Spark, are taken into account. That said, it may make more sense to negotiate a project fee based on the scope of your project.
Remember, aside from ad hoc queries, most data projects are long-term commitments. You’ll need someone who’s familiar with your systems and can help you measure the impact your decisions have made over time. For this reason you may also consider a retainer arrangement. Below we’ve put together a table of some common data science projects, along with some relevant skills and hourly rate ranges charged by some data scientists.
|Type of Project||Relevant Skills||Hourly Rate|
|Analysis and ad hoc queries||Excel, SQL, visualization||$20-100|
|Set up a data pipeline||Java, Python, Scala, Hadoop, Spark, data cleaning,||$25-100|
|Build a recommendation engine||Statistical analysis, Machine learning, Python, Scala, Spark||$50-210|