Big Data Developer job description template
An effective description can help you hire the best fit for your job. Check out our tips to provide details that skilled professionals are looking for.
Tips for Writing a Big Data Engineer Job Description
A big data engineer is a professional who is responsible for the management of data sets that are too big for traditional database systems to handle. They create, design, and implement data processing jobs in order to transform the data into a more usable format. They also ensure that the data is secure and complies with industry standards to protect the company’s information.
Below, we will cover a sample job description, exploring the daily responsibilities and necessary qualifications for a big data engineer.
The Job Overview
We are seeking a big data engineer to join our data analytics team. The successful candidate will be responsible for overseeing the creation and maintenance of our database infrastructure, including collecting and maintaining data, ensuring the integrity of our data, and creating and training data models.
Responsibilities
Below are some of the responsibilities of a big data engineer:
- Design the architecture of our big data platform
- Perform and oversee tasks such as writing scripts, calling APIs, web scraping, and writing SQL queries
- Design and implement data stores that support the scalable processing and storage of our high-frequency data
- Maintain our data pipeline
- Customize and oversee integration tools, warehouses, databases, and analytical systems
- Configure and provide availability for data-access tools used by all data scientists
Job Qualifications and Skill Sets
Below are the qualifications expected of a big data engineer:
- 3 to 5 years of relevant data engineering experience
- Bachelor’s degree or higher in computer science, data science, or a related field
- Hands-on experience with data cleaning, visualization, and reporting
- At least 2 years of relevant experience with real-time data stream platforms such as Kafka and Spark Streaming
- Experience working in an agile environment
- Familiarity with the Hadoop ecosystem
- Experience with platforms such as MapReduce, Apache Cassandra, Hive, Presto, and HBase
- Excellent analytical and problem-solving skills
- Excellent communication and interpersonal skills
Big Data Developers you can meet on Upwork
Chunyi W.
Big Data Developer
Big Data
- SAS
- R
- Data Science
- Linear Regression
- Data Visualization
- Quantitative Analysis
- Statistics
- Analytics
- Logistic Regression
- Biostatistics
- Statistical Analysis
- Epidemiology
- Healthcare & Medical
- Public Health
I obtained my Ph.D. degree in Epidemiology at the University of Michigan and I also have the SAS Programmer certification. Currently, I am a Lead Data Analyst in Medical School. I have a strong background in biostatistics/ epidemiology and have 14 years experiences on analyzing large epidemiological, clinical, genetic and National Inpatient Sample data using various software packages (SAS, SPSS, R and R studio program). I have extensive knowledge of statistical models, and have developed various analysis strategies for different studies and meta-analysis. Statistical methods that I have applied in the research projects: 1. Multilevel Logistic Regression Models, and Ordinal Logistic/Logistic Regression Models 2. Linear Mixed Models and Linear Regression Models 3. Survival Models, Cox Proportional Hazards model, Accelerated Failure Time Modeling, Kaplan-Meier Plot) 4. Poisson Regression Model 5. GEE (Generalized Estimating Equations) 6. Propensity Score Matching (PSM) 7. ROC curve, ANOVA, T-test, Nonparametric Statistics (Kruskal-Wallis test and Wilcoxon Signed Rank Test), Cohen's alpha, Pearson's Correlation Coefficients, Chi-squared test. 8. CMS-HCC Risk Adjustment Model (HCC, RxHCC, ESRD) 9. Data analysis with weighted data in the survey sample. 10. Power Analysis In addition, I have performed the statistical analysis by using the large longitudinal national data in the past: A. Health Retirement Study B. National Health and Nutrition Examination Survey (NHANES) C. National Inpatient Sample (NIS), and Healthcare Cost and Utilization Project (HCUP)) D. CMS-HCC Risk Adjustment Model (HCC, RxHCC, ESRD) E. Meta-analysis to perform the analysis on a large database (Genome-Wide Association Studies) efficiently. . As a data scientist, I am passionate about data analysis, solving complex and interesting task. Once you hire me as a freelancer, the results will be delivered to you within 1-10 days (including weekends). Small project: 1-4 hours. Results will be delivered within 1-2 days. Medium project: 4-10 hours. Results will be delivered within 2-4 days. Large project: 10-20 hours. Results will be delivered within 4-6 days. Project more than 20 hours: Results will be delivered within 5-15 days. Please feel free to contact me and I will response your message within 24 hours. Thank you.
...Amar K.
Big Data Developer
Big Data
- DevOps
- Amazon Web Services
- Google Cloud Platform
- AWS Lambda
- PySpark
- MongoDB
- Content Writing
- Apache Kafka
- SQL
- Apache Airflow
- Data Engineering
- Docker
- Python
Top Rated | #1 Freelancer in India for Big Data, Python, GCP, AWS etc. I have 𝟴+ 𝘆𝗲𝗮𝗿𝘀 of professional 𝗗𝗮𝘁𝗮 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 and 𝗣𝗿𝗼𝗴𝗿𝗮𝗺𝗺𝗶𝗻𝗴 experience with 𝗣𝘆𝘁𝗵𝗼𝗻 and 𝗝𝗮𝘃𝗮 with 𝗚𝗖𝗣 & 𝗔𝗪𝗦 Cloud. I am fortunate to have worked with 𝗙𝗼𝗿𝘁𝘂𝗻𝗲 𝟱𝟬𝟬, 𝘁𝗼𝗽 𝗶𝗻𝘃𝗲𝘀𝘁𝗺𝗲𝗻𝘁 𝗯𝗮𝗻𝗸𝗶𝗻𝗴 companies in the past. Moreover, I posses solid 𝗗𝗲𝘃𝗢𝗽𝘀 experience with good hands-on in Cloud Infrastructure. Currently, I am an Upwork 𝗧𝗼𝗽-𝗥𝗮𝘁𝗲𝗱 freelancer who focuses on providing premium service to my clients and quality projects with on-time delivery. Previously, I have worked full-time with top-notch product companies which includes - 𝗖𝗲𝗿𝗻𝗲𝗿 𝗯𝘆 𝗢𝗿𝗮𝗰𝗹𝗲, 𝗞𝗣𝗠𝗚, 𝗚𝗼𝗹𝗱𝗺𝗮𝗻 𝗦𝗮𝗰𝗵𝘀, 𝗠𝗼𝗿𝗴𝗮𝗻 𝗦𝘁𝗮𝗻𝗹𝗲𝘆, etc. Skills : - 𝗖𝗹𝗼𝘂𝗱 ⌥ GCP (Google Cloud Platform) , AWS (Amazon Web Services) - 𝗣𝗿𝗼𝗴𝗿𝗮𝗺𝗺𝗶𝗻𝗴 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 ⌥ Java, Scala, Python, Ruby, Groovy - 𝗗𝗮𝘁𝗮 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 ⌥ Spark, Kafka, Crunch, MapReduce, Hive, HBase - 𝗗𝗲𝘃𝗢𝗽𝘀 ⌥ GitHub, GitLab. BitBucket, CHEF, Jenkins, Bamboo, Nexus, JFrog, etc - 𝗔𝗣𝗜 ⌥ SpringBoot, Jersey, Flask
...Muhammad Jarir K.
Big Data Developer
Big Data
- Python
- Machine Learning
- Apache Spark
- Data Engineering
- Database Design
- R
- Data Science
- Marketing Data Analytics
- Data Visualization
I'm a data science and analytics professional with a physics degree to boot. I'm currently pursuing a master's degree in computer science at Georgia Tech, focusing on machine learning and AI. If you have a data problem, I'm your guy. Data engineering, visualization, and analysis, I can do it all — using either Python or R. Want to build a machine learning model for your business but lack the technical expertise? I will build it for you! And I can also design a database to help you manage all that precious data or use big data tools like Spark and MapReduce to help improve your data pipeline. Some of my skills include: ✔️ Data Science ✔️ Machine Learning ✔️ Data Engineering ✔️ Big Data Systems (Hadoop, MapReduce, Spark) ✔️ High Dimensional Data Analysis ✔️ Database Design (MySQL, Postgres, BigQuery, EER models, DB normalisation, etc.) ✔️ Python, R, SQL ✔️ Flask ✔️ Python libraries: NumPy, pandas, SciPy, sklearn, matplotlib, seaborn, networkx, Tensorly, OpenCV, PyCaret, CatBoost, TensorFlow, etc. ✔️ R libraries: tidyverse, ggplot, caret, kernlab, etc.
...Join the world’s work marketplace

Post a job to interview and hire great talent.
Hire Talent