What Can a Data Scientist Do for You?

Image for What Can a Data Scientist Do for You?

It’s no surprise that data is an invaluable resource for companies. Even the New York City Fire Department is mining data to identify buildings that are more at risk of catching fire.

You hear a lot about data-driven business these days—and the amount of data collected is simply staggering. What may come as a surprise, though, is just how companies are using it and what conclusions they’re able to draw from it.

It all starts with data mining—the process of exploring large amounts of data and analyzing it for consistent patterns, relationships that occur between variables, then making sense of these patterns. Whether the patterns are used to summarize past data or predict future data, data mining offers valuable insights and can tell detail-rich stories about a business.

The New York Times article “How Companies Learn Your Secrets” looks at how Target analyzed purchase records to determine which customers were about to become new mothers. Research dating back to the 1980s showed that people facing major life changes, such as having a baby, getting married, or buying a new house, are particularly willing to change their purchasing routines and buy new brands. Data mining for pregnant customers was particularly valuable to Target’s marketing department because they could send these customers not only baby-related offers but also other products at a time when their brand loyalties were open to change. So how did Target do it?


Target used what is called “predictive analytics”—examining past data to look for correlations that suggest how people will behave in the future. This technique is all around us: not only with retailers trying to figure out our shopping habits, but also with transportation companies determining what truck parts are likely to fail, or credit card companies looking for signs of fraud. In the neonatal intensive care unit at a hospital, algorithms can look for worrisome combinations of vital statistics that might suggest an imminent infection—even when no symptoms are otherwise present.

While it might seem a bit like magic that data scientists can know so much about us, in reality, statistical methods are the driver of big data analytics. Some of these methods are more than a half-century old, and some have only been developed in the past five years.

Data scientists use these methods to look for previously unseen patterns and correlations. Regression models, cluster analysis, probability, software such as Hadoop or R, and algorithms are just a few tools of their trade.

Let’s take a look at a few ways a data scientist can help you leverage your company’s data to glean valuable insights to improve efficiency, streamline processes, find new revenue streams, and increase existing ones.


Mentions on social media sites like Facebook and Twitter can be a powerful real-time indicator of how consumers feel about your product—and also how intensely they feel about it.

For example, based on search criteria that you enter, Twitter offers a statistical snapshot of relevant current, popular, or recent tweets. So what can a data scientist do with all those tweets?

There are a variety of data science techniques that zero in on the sentiment of a tweet. As Jared Dean shows in his book Big Data, Data Mining, and Machine Learning, a data scientist might choose from a host of mathematical techniques to glean the main idea of a given piece of text, such as a singular value decomposition, or a latent Dirichlet allocation.

Whatever techniques your data scientists use, they’ll come up with certain features, such as what language the tweet uses, which can be helpful for geographical reasons, as well as whether consumers feel positively or negatively toward your brand. Once they’ve categorized the tweet as praise, a complaint, a question, or placed it in some other category, you can then incorporate the categorized tweets into structured data you already have.

If the tweet is positive, for example, you can weight the tweets for use in updated sales forecasts or as an indicator to order more supplies. Negative feedback can be a sign of poor product quality, processes, or customer service.

Segmenting Your Customer Base

Big data analytics is particularly valuable when it uncovers a useful correlation you didn’t know was there, such as finding like-minded groups among your customers.

For example, in his book Profiting from the Data Economy, David A. Schweidel tells the story of a lead-generation firm trying to sign up new customers for a credit card using ads on Facebook. By analyzing response rates, the company found that the people most likely to sign up for the card were women who also happened to be fans of the movie Dirty Dancing.

While big data can’t say why this movie was an effective proxy for signups, it is a useful example of market segmentation and how you can take transaction data to discover groups of users with common—and unexpected—interests. The lead-generation firm was then able to adjust its ad campaign to make it more appealing to female Facebook users who like Dirty Dancing.

The good news is that you too very likely have customer data that you can segment into demographic groups that you might not have known existed. Once you identify these demographic groups, you can target them with customized promotions or newsletters.

One useful method in statistics is “clustering,” which involves using mathematical techniques such as “K-means clustering” to find similarities among your customers that might not be apparent to the naked eye. In clustering, algorithms go through customer records without a preset agenda, looking for subsets of like-minded people, such as Dirty Dancing fans.

awesome job post


Making business operations run more efficiently is a central duty of any manager. And data science can help, using “optimization” modeling, in which a scientist mathematically represents the various features of a business process in order to improve production while keeping costs as low as possible.

Optimization is different than using an algorithm to try to predict something like which customers are pregnant based on the similarity of their purchases to known pregnant customers from the past.

Instead, optimization allows you to choose various business rules. For example, let’s say you have a consumer electronics business and want to evaluate the following factors:

  • Seven smartphone component suppliers
  • Four different component quality trade-offs
  • A quarterly budget of $4.7 million
  • Suppliers closest to your factory to make delivery more reliable

The optimization model finds the best combination of the various inputs. And the payoffs can be enormous. In their book Big Data, Kenneth Cukier and Viktor Mayer-Schönberger tell the story of how UPS was able to reduce delivery truck routes by 30 million miles, which saved 3 million gallons of fuel and 30,000 metric tons of CO2 emissions.

Post a job on Upwork. It's free!
Andrew Rosenblum

by - Writer

Andrew Rosenblum reports on drones, artificial intelligence, security, and the commercial space business for Popular Science, MIT Technology Review, Wired, Fortune, and other publications.