Hire the best Big Data developers

Check out Big Data developers with the skills you need for your next job.

Clients rate Big Data developers
Rating is 4.8 out of 5.
4.8/5
based on 1,697 client reviews
Toan H.
$200/hr
  • Trophy Icon Big Data
  • Tableau
  • Data Analysis
  • Data Visualization
  • SQL
  • Business Intelligence
  • Data Warehousing
  • Data Modeling
  • Project Management
  • Stakeholder Management
  • API Integration
  • Data Science

Tableau Visionary (2020, 2021, and 2022) for Excellence in Teaching, Mastery of the Platform, and Collaborator. I have worked in Business Intelligence, Data Management and Digital Transformation since 2004 and dedicated the last seven years to help people get the most out of their Tableau investments; I can assist you with all aspects of Tableau from developing dashboards, administrating and setting up your enterprise infrastructure, to developing customer Extensions or other technical integrations. If you require support in your projects, do get in touch.

Andrew C.
$115/hr
  • Trophy Icon Big Data
  • Apache Spark
  • Apache Kafka
  • Apache Hadoop
  • Data Engineering
  • Apache Hive
  • Linux

Passionate about bringing older, constrained data pipelines, processing, and storage into their modern equivalents. Expertise in Spark, Kafka, and Hadoop-ecosystems on bare metal and cloud infrastructure. Currently loving Trino and Iceberg, and finding interesting use cases for Kafka tiered storage.

Franck L.
$120/hr
  • Trophy Icon Big Data
  • MySQL
  • Database Administration
  • Database Programming
  • Data Warehousing
  • Data Modeling
  • ETL Pipeline
  • Amazon Relational Database Service
  • Snowflake
  • Amazon Web Services
  • AWS Glue
  • PostgreSQL
  • Database Optimization
  • MongoDB
  • Elasticsearch

Hello, Update: Due to high demand for our services, we don't take any project with a budget less than $500. If you are looking for Database Architecture, DBA work (DB migration, Optimization), Data Warehousing, Data Engineering, and Data Visualization expertise you have come to the right place. I have more than 25 years of expertise with database technologies, data modeling, and 12 years as a MySQL DBA. As a Sr MySQL DBA, I have been involved in various projects for Fortune 500 companies and independent businesses in the entertainment industry, and I lead projects for websites and database management for Acura, Variety, Lexus,...

Vignesh I.
$30/hr
  • Trophy Icon Big Data
  • Apache Spark
  • PySpark
  • Apache Hadoop
  • Apache Kafka
  • Scala
  • Apache Hive
  • Apache NiFi
  • Apache Cassandra
  • SQL
  • AWS Glue
  • ETL Pipeline

🏆 Expert in creating robust, scalable and cost-effective solutions using Big Data technologies for past 9 years. 🏆 The main areas of expertise are: 📍 Big data - Apache Spark, Spark Streaming, Hadoop, Kafka, Kafka Streams, HDFS, Hive, Solr, Airflow, Sqoop, NiFi, Flink 📍 AWS Cloud Services - AWS S3, AWS EC2, AWS Glue, AWS RedShift, AWS SQS, AWS RDS, AWS EMR 📍 Azure Cloud Services - Azure Data Factory, Azure Databricks, Azure HDInsights, Azure SQL 📍 Google Cloud Services - GCP DataProc 📍 Search Engine - Apache Solr 📍 NoSQL - HBase, Cassandra, MongoDB...

Rohit K.
$35/hr
  • Trophy Icon Big Data
  • Data Visualization
  • Data Analysis
  • JavaScript
  • Data Science
  • Marketing Data Analytics
  • Python
  • SQL
  • Database
  • Data Warehousing
  • ETL Pipeline
  • BigQuery
  • Snowflake
  • Amazon Redshift
  • Microsoft SQL Server

Hi, I am tech-savvy data analytics expert who also knows the marketing side of things as well. I can set up your proper technology stack for data gathering, data processing, and data visualization so that you can track your important business KPIs. I know javascript and python and can write custom scripts for tracking purposes wherever required. I can also build data pipelines (ETL) for your organization to get data from all the available data resources and put it into a data warehouse of your choice like BigQuery, Snowflake, Amazon redshift, etc. After that, I can work with SQL to combine data from various sources and create meaning out...

Teofil N.
$75/hr
  • Trophy Icon Big Data
  • MySQL
  • Bioinformatics
  • Git
  • Data Analysis
  • Data Visualization
  • Docker
  • Microsoft Excel
  • R Shiny
  • Data Science
  • Analytics
  • Data Mining
  • SQL
  • Machine Learning
  • Dashboard

"Teofil successfully delivered a complex R project with exceptional professionalism and craft. I would love to hire him again!" (ArizeAI) "Teofil is an absolute professional and a pleasure to work with. His communication was top-notch and he provided regular updates on how the project was progressing. Teofil delivered our project on time and on budget. If you are looking for a freelancer in a field relevant to Teofil's brilliant skills, you won't be disappointed!" (University of Newcastle, Australia) "Teofil was able to meet established goals and deadlines. He has superb collaboration skills and was a key factor in helping develop a...

Anton B.
$168/hr
  • Trophy Icon Big Data
  • Python
  • Apache Spark
  • Blockchain
  • Apache Airflow
  • SQL
  • Apache Hadoop
  • Data Engineering
  • Cloud Architecture
  • Data Warehousing
  • Amazon Web Services
  • ETL Pipeline
  • Cloud Computing
  • ClickHouse
  • Kubernetes

Spark – SQL – Blockchain – ETL – Airflow – Hadoop – ClickHouse – Python – DWH – AWS – GCP – Azure – MapReduce – BI – Analytics – NoSQL. 𝗢𝗽𝗲𝗻 𝗽𝗿𝗼𝗳𝗶𝗹𝗲 𝘁𝗼 𝘀𝗲𝗲 𝗱𝗲𝘁𝗮𝗶𝗹𝘀. Cloud Data Engineer & Solutions Architect with 7 years of expertise and hands-on experience. Including Blockchain projects. ✅ 𝗧𝗼𝗽-𝗿𝗮𝘁𝗲𝗱 𝗣𝗹𝘂𝘀 & Expert-Vetted on Upwork. 🤝 𝙒𝙃𝘼𝙏 𝙔𝙊𝙐 𝙂𝙀𝙏 𝙃𝙄𝙍𝙄𝙉𝙂 𝙈𝙀: — Architectural design: I will help you understand your technical needs and design a Data Platform 𝗳𝗼𝗿 𝘆𝗼𝘂𝗿 𝗽𝗮𝗿𝘁𝗶𝗰𝘂𝗹𝗮𝗿 𝘂𝘀𝗲 𝗰𝗮𝘀𝗲. — Coding: I will implement everything myself or with involving your team to...

Yangyang D.
$80/hr
  • Trophy Icon Big Data
  • Data Cleansing
  • Master Data Management
  • Data Mining
  • SAS
  • Data Entry
  • Data Modeling
  • Data Analysis
  • Mathematics
  • Statistical Modeling

A dynamic Statistician and Data Analyst seeks to bring an accomplished combination of education and experience to an industry-leading employer. 5+ years of experience in related data analysis, computer programming, and instruction roles for distinguished universities. Ph.D. Candidate in Biostatistics, with an M.S. in both Bio-Engineering and Applied Mathematics. Proficient in data management, mathematical modeling, grad/undergrad instruction, longitudinal/statistical analysis, project management, and more. Diligent and goal-oriented, with the skills, education, and hands-on experience needed to make an immediate impact as part of any...

Chunyi  W.
$55/hr
  • Trophy Icon Big Data
  • SAS
  • R
  • Data Science
  • Linear Regression
  • Data Visualization
  • Quantitative Analysis
  • Statistics
  • Analytics
  • Logistic Regression
  • Biostatistics
  • Statistical Analysis
  • Epidemiology
  • Healthcare & Medical
  • Public Health

I obtained my Ph.D. degree in Epidemiology at the University of Michigan and I also have the SAS Programmer certification. Currently, I am a Lead Data Analyst in Medical School. I have a strong background in biostatistics/ epidemiology and have 14 years experiences on analyzing large epidemiological, clinical, genetic and National Inpatient Sample data using various software packages (SAS, SPSS, R and R studio program). I have extensive knowledge of statistical models, and have developed various analysis strategies for different studies and meta-analysis. Statistical methods that I have applied in the research projects: 1. Multilevel...

Lucas W.
$50/hr
  • Trophy Icon Big Data
  • Tableau
  • Data Analysis
  • Data Visualization
  • Business Intelligence
  • Data Science
  • Microsoft Excel
  • Presentations
  • Dashboard
  • Excel VBA
  • Microsoft Power BI
  • Microsoft Power BI Data Visualization
  • Google Sheets
  • Infographic
  • Chart Presentation

I believe that data tells a story, and all too often the data is telling a different story than what managers think their story is. I'm passionate about drilling down on the company's KPIs, building dashboards, and generating insights needed to improve your business. The data world has grown faster than we've had the ability to consume it, and I'm here to help us catch up. My recent work history includes three years of data strategy focusing on agile analytics and data visualizations. I am able to understand your data and present it to you in an appealing way to bring you the answers you need. Further, I have five years of experience as a...

Siddharth G.
$80/hr
  • Trophy Icon Big Data
  • Data Analysis
  • Python
  • Data Visualization
  • Hypothesis Testing
  • Critical Thinking Skills
  • Problem Solving
  • ETL
  • Dashboard
  • SQL
  • Web Scraper
  • Machine Learning
  • Microsoft Power BI
  • Tableau
  • Marketing Analytics
  • Funnel Testing

🌟 Worked with World Bank and an Asian Govt on Machine Learning (Water predictions) 🌟 Top Rated Plus Freelancer 🌟 5 Star Client Feedback on all Analytics projects 🌟 1,150+ hours booked on Upwork 🌟 7+ years of global experience with Fortune 500 companies AND fast growing startups 🌟 Skills appreciated: Asking the right questions, attention to detail, clear communication, thoughtful, having a solid work ethic and being a wild optimist Hi, from India! I am a top 3% Upwork consultant for Analytics, Visualizations, and Machine Learning projects, with experience building end-to-end analytics processes for businesses. Industries I have...

Mark F.
$75/hr
  • Trophy Icon Big Data
  • Microsoft Excel
  • Excel VBA
  • Visual Basic
  • Macro Programming
  • Financial Analysis
  • Data Analysis
  • Data Cleansing
  • Data Extraction
  • Data Migration
  • Data Processing
  • Data Structure
  • Spreadsheet Software

An automotive engineer by trade, I've been working with spreadsheets for over 40 years. I've created a wide variety of Excel spreadsheets, both for my engineering work, and also for other projects (such as stock analysis, church recordkeeping, business processes, payroll records, etc.). I've completed over 650 different Excel projects for over 240 different clients, in the USA and around the world. My strengths are spreadsheet structure, complex formulas, and VBA macros. Whether you need a complex spreadsheet to automate data analysis, or a simple one, I can deliver. NOTE that I work in Excel (for Windows). I do NOT work with Google...

Amgad E.
$50/hr
  • Trophy Icon Big Data
  • Python
  • R
  • Tableau
  • Data Analysis
  • Machine Learning
  • Data Visualization
  • SQL
  • TensorFlow
  • Python Scikit-Learn
  • Deep Learning
  • Support Vector Machine
  • Random Forest
  • Data Science
  • Amazon Web Services
  • Google Cloud Platform

Available to work with one more client at current capacity. Top-rated machine learning engineer/ data scientist with over 5 years of industry and academic experience. Versed in powerful machine learning and data science tools, including Python, Scikit-Learn, TensorFlow, R, SQL, and Tableau. I have trained, built, and deployed models that have created revenue in excess of 10x of the investment my customers have put into working with me. I have expertise in every aspect of the data process, including wrangling, analyzing, visualizing, model building, and deployment, and will work with you to ensure we build the best product together. 100%...

Simon S.
$50/hr
  • Trophy Icon Big Data
  • Python
  • Predictive Analytics
  • Classification
  • Artificial Intelligence
  • Artificial Neural Network
  • Data Analysis
  • Code Review
  • Machine Learning
  • Machine Learning Model
  • Data Science
  • C++
  • Technical Writing
  • Data Visualization

If you are looking for a data scientist: I am an information technology engineer and data scientist. My background is in speech science, but I am more than capable to apply my skills to other fields as well as you can tell by the positive feedback I received from my former clients. If you have a job involving data mining, analysis, and/or visualization, I am the guy to solve it quickly and thoroughly! If you are looking for help writing a scientific paper: I have 8+ years of academic writing experience, published 20+ papers in peer-reviewed journals and conference proceedings, and am a part-time lecturer, as well. I have been a member of...

Yuri O.
$40/hr
  • Trophy Icon Big Data
  • Data Engineering
  • Data Analysis
  • Apache Spark
  • Apache Kafka
  • Scala
  • Functional Programming
  • Amazon Web Services
  • Software Architecture & Design
  • Leadership Skills

An enthusiastic and dedicated engineer with 8+ years in the industry. Hands-on experience and thorough understanding of all aspects of software development process - from requirements collection to end-to-end architecture design, from coding to solution delivery. A number of teams and environments that Yurii worked in throughout his career gave him leadership and mentoring skills, a wide range of technical skills, ability to adapt to fast-paced environment, great communication and cooperation skills. He considers software development as an art and creativity – and, as every artist enjoys the result, he gets a huge satisfaction of being...

Kseniia D.
$40/hr
  • Trophy Icon Big Data
  • Python
  • ETL
  • Python Numpy
  • Python Pandas
  • G-Cloud
  • MongoDB
  • PostgreSQL
  • BigQuery
  • SQL
  • ETL Pipeline
  • Data Migration

Briefly describing myself, I’m a self-motivated, dedicated and enthusiastic Python programmer with 2 years of production experience, creating ETL process, software architecture and functionality, deploying apps on GCP etc. My purposefulness and eagerness to improve hard and soft skills help me to become familiar with new technologies and difficulties. As a result, I want to take place in real projects and be useful there as much as I can.

Umar M.
$40/hr
  • Trophy Icon Big Data
  • Machine Learning
  • Data Science
  • Apache Spark
  • Python
  • Tableau
  • Data Visualization
  • Data Engineering
  • ETL
  • Amazon Web Services
  • Elasticsearch
  • Salesforce Lightning
  • Salesforce CRM
  • Website Development
  • React

Experienced Engineer having 9 years experience of working as Product developer in the domain of Data Science and Big Data. - Expertise in Python , Java and Scala - Backend development using Flask , Play Framework , NoSql Databases and Serverless technologies - Experience of building machine learning and ETL pipelines from scratch - Expertise in using Kafka, Apache Spark , Spark Streaming, MapReduce and Hadoop ecosystem - Experience with cloud environments like AWS, GCP and Azure - Experience using CI/CD tools like Chef, Jenkins, Capistrano - Experience in validation and selection of ensembled machine learning models. - Experience using...

Afzaal A.
$25/hr
  • Trophy Icon Big Data
  • C#
  • Visual Basic
  • .NET Framework
  • VB.NET
  • .NET Core
  • VBA
  • SQL
  • Microsoft SQL Server Reporting Services
  • WordPress
  • ASP.NET MVC
  • RESTful API
  • MySQL Programming
  • ASP.NET
  • Database
  • Google Data Studio

Expert Programming Services C#, VB.Net, JavaScript. Over 15 Years of software development experience mainly as programmer and database developer. I m an expert at converting legacy projects written in VB6,FoxPro, Classic ASP, and MS Access into modern web-based and cloud applications. Languages - C#,VB.Net,JavaScript,VB6,VBA,JAVA Databases - MSSQL, MySQL, PostgreSQL , Access, DynamoDB Reporting Tools -DevExpress Reports -Crystal Reports, SSRS, and Data Reports. Third-Party Tools and Controls - DevExpress, Telerik, etc. API - Bloomberg API SAP Business One Customization, SAP Business One Reports, SAP Business One Dashboards using Power...

Darin H.
$40/hr
  • Trophy Icon Big Data
  • Python
  • Web Scraper
  • Microsoft Excel

I am an experienced and proficient web scraper using Python to obtain very large amounts of data from a variety of online sources. I do fixed-price work and have successfully pulled data from 100's of sites with examples being business locations, directories, public information, IMDB movie info, sports-reference stats, music charts, Forbes company rankings/info, ESPN player pages, Google search results, as well as hundreds of other queries of all genres. You can see some of my results via the data sets which are used on my big data quiz site, hugequiz.com. I have been able to retrieve data from articles, tables, lists, recursively via...

Omkar R.
$60/hr
  • Trophy Icon Big Data
  • Snowflake
  • Python
  • BigQuery
  • Tableau
  • Data Visualization
  • SQL
  • Alteryx Analytic Process Automation Platform
  • ETL Pipeline
  • Microsoft SQL Server
  • Data Analysis
  • Statistical Analysis
  • Conversion Rate Optimization
  • Machine Learning
  • A/B Testing
  • Data Science
  • R

I help businesses collect data, extract meaningful insights and work with their team to implement them while ensuring exceptional ROI. Hello! I am Omkar and in my 10+ years of experience, I have helped transform 50+ global businesses with their complex problems in Data Science, Business Intelligence, and Advanced Analytics Services. I have worked with top companies like S&P Global, Newsweek, KPMG, Edelweiss, and many more. Let’s turn your enterprise data into a machine that directly impacts your revenue generation, profits, and strategic decision-making with insights that you need to become or maintain industry leadership. I excel at...

Hakob J.
$40/hr
  • Trophy Icon Big Data
  • Machine Learning
  • Python
  • Data Science
  • scikit-learn
  • PySpark
  • Apache Spark
  • Keras
  • Deep Learning
  • Software Testing
  • Automated Tests
  • Anomaly Detection
  • Model Tuning
  • PyTorch
  • Flask

I am an ML scientist working on startup called Intent.ai. My projects mainly are in the area of recommendation systems, predictive modeling, and time series modeling. Here are some of them: - Diversification: Developed a recommendation system algorithm that ensured a diverse result set and prevented the repetition of similar items during User search. - Developed algorithm which detects real-time User trends leveraged by internal users for marketing purposes. - Worked on a project which is about app crashes time-series anomaly detection of app crashes and rule mining on crash data. I have a strong mathematical background (Calculus,...

Trishna M.
$50/hr
  • Trophy Icon Big Data
  • Scientific Writing
  • HTML
  • Graphic Design
  • MATLAB
  • Excel
  • Scientific Research
  • WordPress
  • Data Visualization
  • Sales Analytics
  • R
  • SQL
  • Snowflake
  • Adobe Illustrator
  • Office 365
  • Tableau

I have worked as an data analyst in AstraZeneca UK and University of Chicago for over 10 years. My background is Biotechnology and Bioinformatics but I have worked in Neuroscience (Vision) for the last 5 years. I also worked as a domain knowledge expert for a startup last year. My top programming skills include R, MATLAB, Google Analytics, SQL, HTML. I have build several websites while at work and have experience writing SOPs, setting up labs and working Protocols. I have several online publications some of which are listed below. I have built several Google dashboards and analyzed retail and sales data for marketing insights, KPIs , key...

Hassan K.
$55/hr
  • Trophy Icon Big Data
  • Stata
  • Statistics
  • Data Analysis
  • Econometrics
  • Microsoft Excel
  • R
  • R Shiny
  • Microsoft Power BI Data Visualization
  • Quantitative Research
  • Data Science
  • Data Visualization
  • Statistical Analysis
  • Financial Modelling
  • Financial Projection
  • Presentations

I have a strong and proven track record of doing DATA ANALYSIS, STATISTICAL ANALYSIS, DATA MANIPULATION, DATA VISUALIZATION, SURVEY ANALYSIS, LIFE TIME VALUE (LTV) CALCULATIONS AND FORECASTING FUTURE OUTCOMES. I have expertise in tools like Stata, R, R shiny, Excel, Google Data Studio, Microsoft Power BI, Tableau, SPSS, VBA and Google Analytics. Proficient in doing analysis related to Life Time Value (LTV), Customer Retention Rate, Event Study and Probabilistic modeling to predict future outcomes. I have worked entire data analysis lifecycle from the data mining, listing, questionnaire design, field strategy, data gathering, cleaning,...

Rodion K.
$50/hr
  • Trophy Icon Big Data
  • Data Modeling
  • ETL Pipeline
  • Database Design
  • Cloud Migration
  • Amazon Redshift
  • Data Warehousing
  • Business Intelligence
  • ETL
  • Informatica
  • Talend Open Studio
  • IBM InfoSphere DataStage
  • Data Integration
  • Data Management
  • Snowflake

ETL developer and Data Warehouse Architect with 17 years experience in retail, financial, telecom, pharma, airline business areas. I specialize in data integration solutions. If you need to build BI/DWH/Database solution from scratch till production or need support in some part of the project - don't hesitate to contact me. I can help you in these areas - BI Solution Architecture Development (Software, Hardware layers) - Data Warehouse project (ROLAP, MOLAP) - Data Model Design (Logical, Physical, Star/Snowflake, Multidimensional, Data Vault) - Database design (Oracle, MS SQL, Netezza, Vertica, PostgreSQL) - Data Management Design...

Mariusz S.
$100/hr
  • Trophy Icon Big Data
  • Apache Spark
  • Apache Hadoop
  • Apache Kafka
  • ETL
  • Data Visualization
  • Data Scraping
  • Data Warehousing
  • Python
  • Apache Airflow
  • Data Migration
  • PySpark
  • MongoDB

I have over 4 years of experience in Data Engineering (especially using Spark and pySpark to gain value from massive amounts of data). I worked with analysts and data scientists by conducting workshops on working in Hadoop/Spark and resolving their issues with big data ecosystem. I also have experience on Hadoop maintenace and building ETL, especially between Hadoop and Kafka. You can find my profile on stackoverflow (link in Portfolio section) - I help mostly in spark and pyspark tagged questions.

Murat K.
$108/hr
  • Trophy Icon Big Data
  • SAS
  • Statistics
  • Data Analysis
  • Python
  • IBM SPSS
  • R
  • Tableau
  • Regression Testing
  • Econometrics
  • Time Series Analysis
  • Marketing Analytics
  • Neural Network
  • Biostatistics
  • Google Ads
  • Product Analytics

Cornell/MIT educated data scientist with financial modeling and digital analytics experience at Fortune 500 companies. • Academic experience as TA/RA at MIT and Cornell University. • Advanced R, Python, SPSS, SAS, Tableau, SQL, MPlus. Some of the projects I have done are: • Time series analysis, financial forecasting and other econometric methods • ANOVA, MANOVA, GLM, SEM in social sciences • Biostatistics and medical research including analysis of clinical trials • Regression modeling • Machine learning and various statistical models in big data • Valuation, investment analysis and financial modeling in the investment banking industry...

John L.
$150/hr
  • Trophy Icon Big Data
  • Domo
  • Marketing Strategy
  • Marketing
  • Entrepreneurship
  • Startup Consulting
  • Marketing Analytics
  • Campaign Reporting
  • Dashboard
  • Report
  • Microsoft Excel
  • KPI Metric Development
  • ETL
  • Usability Testing
  • Landing Page

Imagine never having to push a button again to see reports. For instance, I had a client where an analyst spend 8 HOURS A WEEK just downloading data to then put together weekly reports. They brought me in, I hooked up different data sources like Google Analytics, Google Ads, Bing, etc and now those reports update each day and some update every 15 minutes. Not only did this save time for the analyst, but that allowed the analyst to spend time analyzing instead of stitching together reports. I have a lot more stories like this that I'd love to share with you.

$NaN/hr

How it works

1. Post a job (it’s free)

Tell us what you need. Provide as many details as possible, but don’t worry about getting it perfect.

2. Talent comes to you

Get qualified proposals within 24 hours, and meet the candidates you’re excited about. Hire as soon as you’re ready.

3. Collaborate easily

Use Upwork to chat or video call, share files, and track project progress right from the app.

4. Payment simplified

Receive invoices and make payments through Upwork. Only pay for work you authorize.

Trusted by 5M+ businesses

What Is Big Data?

While big data has become a trendy catchphrase, the good news is that there is real substance to it. With a little effort, even nontechnical people can understand that substance and start putting it to work for their companies.

Part of demystifying the trendy catchphrase “big data” is understanding that you’re analyzing your business using techniques of statistical analysis, some of which have been around for 50 years or more.

What is fundamentally different about the 21st-century phenomenon of “big data” is the computing power we can bring to bear. Advances in the sensors that collect data, the drives that store it, and the software and hardware to analyze it mean that we can efficiently analyze far more material than was feasible in earlier centuries.

It’s no longer hard to create and store gigabytes of data—the challenge is to find something meaningful in all of that material. What makes analyzing the data such a rich source of business insights?

Big data is good at finding correlations but not at causality

A great place to start is with the distinction between “what you like” and “why you like it”—or what is technically called the difference between correlation and causality. These algorithms don’t know why you like what you like. But they have learned what you will like based on what you’ve purchased before.

From a business perspective, that’s OK—what matters far more than why. Knowing what you will like drives clicks and sales. Skilled data scientists have a host of statistical techniques—some new, some old—for analyzing information. Before you start working with a data scientist, however, there’s an important question you need to ask first.

What’s the type of dataset you want to learn more about?

If you don’t ask this all-important question, you could get overwhelmed with raw data. Many executives feel pressure to just do something with big data, so they begin collecting without a clear goal in mind.

If you do “track everything,” you’ll still have to go through that data again once you figure out what you’re trying to do. And in the meantime, you’ll be racking up software, hardware, and personnel costs.

A key takeaway? Don’t just rush in and start tracking everything. The best way to get started is to look at the types of problems people have successfully attacked with big data in order to see what you might accomplish in your business. Here are a few examples:

  • Branding: Look at mentions of a product on Twitter in order to derive an analysis of “customer sentiment.” By collecting mentions of your brand from Twitter, data scientists not only can tell how customers feel about it but also how strongly they feel about it. Data scientists can also then help you automate your responses: re-tweeting of positive comments, and prompt, private messages to unhappy customers.
  • Market research: Analyze your past sales records to segment your customer base so that you can find and target like-minded clusters of people with carefully customized marketing campaigns.
  • Operations: Analyze the geolocation data of your delivery drivers to optimize the most efficient routes in terms of gasoline usage and time. Data scientists can compare up-to-the-minute data about where your vans are on the road with historical data about what routes are congested with vehicles or require time-consuming left-hand turns across traffic.
  • Production optimization: A large beverage company used data to find the optimal blend of different kinds of oranges, which have different costs, astringency, sweetness, and tartness, in order to maximize profit while maintaining quality standards.
  • Research: A large hedge fund hired researchers to keep track of real-time news on 200 companies at a time. The team was spending so much time seeking data, like looking for company press releases, regulatory sites, SEC filings, and updates to company websites, that they couldn’t keep up with all of the changes. Data consultancy BrightPlanet put together an algorithm to search the Internet and compile information automatically, freeing up the team to focus on analyzing the findings.

Tips for analyzing big data

There are some unusual features of massive datasets that you should keep in mind.

1. The “messiness” of big data

You may be surprised by how much time your consultants are using on a stage of the project called “data preparation.” Don’t be. Because computers, databases, and algorithms have gotten so fast, getting large datasets, often disorganized and drawn from multiple sources, in a position to be analyzed is quite challenging. “

Data scientists unabashedly describe their datasets as “messy.” (That’s really the technical term for it.) Imagine, for example, you tell a web-crawling algorithm to compile massive amounts of press releases, tweets, news reports, and government filings from different websites and in different formats. The results from the web-crawling algorithm are not going to consist of neat, well-organized rows in a spreadsheet or fields in a database.

This “unstructured” data will need to be “cleaned” or made uniform in a way that algorithms can analyze. That’s why “data preparation” often takes so much time.

2. You don’t need to sample

Unlike the analog days of statistics, when you might have given a survey to 1,100 people to stand in for your entire customer base, computing power today means you can look at all the data. And using all the data instead of a sample can make an enormous difference.

3. “Datafication

Viktor Mayer-Schönberger and Kenneth Cukier coined the term “datafication,” meaning that inexpensive sensors, hardware, and data storage have made it possible to collect certain types of data that were impractical to track previously.

4. Data exhaust

Because storage and collection has gotten cheap, you can save the equivalent of data “junk” and perhaps find ways to use it. For example, Google receives a large amount of search queries with typos or misspelled words each day. The company has taken this “exhaust” from its lucrative search engine business in order to not only improve search (“Did you mean ornithologist?”) but also to build a powerful spell-checker.

View less
Schedule a Call