Apache Spark Engineer Job Description Template
An effective description can help you hire the best fit for your job. Check out our tips to provide details that skilled professionals are looking for.
Example of Apache Spark Engineer job description
Apache Spark is an open-source framework that supports both data streaming and batch processing. The engine interprets multiple programming languages, and its users can harness its power to improve data engineering, data science, and machine learning. With over 1,000 contributing software engineers and developers from hundreds of organizations, it has been leveraged by businesses across all industries to process data of any size.
Recognized as one of the world’s largest data processing clusters, even engineers with just a few years of experience are capable of learning, building, and leveraging the framework. Experienced engineers bring vast knowledge to support various functions, from enhancing processing speed to rebuilding and monitoring data pipelines. The value they add can benefit many facets of an organization.
The job overview
We're looking to hire a new Apache Spark engineer for our team who can help us develop and evolve a large real-time data processing system. As an expert software engineer, your problem-solving and scripting skills will help us manage business requirements to support data scientists. You'll work closely with our system designers and software developers to collaborate on interface development and data pipelines.
Responsibilities of an Apache Spark Engineer
Below are the responsibilities an Apache Spark team member:
- Design and implement Spark jobs to define, schedule, monitor, and control processes
- Develop and test algorithms for large-scale machine learning
- Optimize Spark jobs to maximize speed and scalability while remaining data-use compliant
- Manage data pipelines and acquisition processes
- Perform data processing and analysis
- Build machine learning models using Spark or MapReduce to visualize and present results
- Work with other Spark developers and back-end data engineers to design interactive Spark pipelines
- Develop REST APIs for Spark jobs
Job qualifications for an Apache Spark Engineer
Below are the qualifications for an Apache Spark engineer:
- Expertise building data and processing pipelines
- Familiarity with Spark engine syntax modules, including Spark SQL
- Familiarity with APIs including RDD, DataFrame, Dataset, and PySpark
- Fluency in programming languages including Python, Java, and Scala
- Knowledge of Spark internals and streaming technology (Kafka, KSQL, etc.)
- Expertise in SQL and big data processing (Hadoop ecosystems, Hive, Impala, Druid, etc.)
- Familiarity with machine learning algorithms and foundations such as PyTorch
- Experience with an ETL tool and expertise in managing the post-loading data
- Expert in one or more distributed file systems, such as HDFS, S3, and Ceph
- Familiarity with visualization tools
- Familiarity with Amazon's AWS for building Apache Spark clusters
A bachelor's degree in data science, software development, and computer science isn't required for Apache Spark jobs. But having a higher certification is highly encouraged (specifically from Cloudera, MapR, or Hadoop).
Apache Spark Engineers you can meet on Upwork
- $45/hr $45 hourly
Moises R.
- 4.9
- (10 jobs)
Barcelona, CTApache Spark
RESTful APIMicrosoft AzureDatabricks PlatformAmazon Web ServicesNoSQL DatabaseApache KafkaDockerETLPythonData Engineer with a demonstrated history of working in the consulting industry. Skilled in the development of ETL processes and the development of APIs. Proficient in Python, PostgreSQL, R, and Azure with knowledge also in AWS, Spark, and Scala. Analytical, team-oriented, and resilient. Technologies and skills: AWS Azure Databricks Data Architect Data Lake Docker Hadoop Lakehouse Microsoft Fabric MongoDB Python Spark - $40/hr $40 hourly
Hassan U.
- 5.0
- (13 jobs)
Karachi, SDApache Spark
Microsoft ExcelAmazon RDSApache AirflowAmazon S3Amazon RedshiftdbtPythonSQLData Engineering7+ 𝗬𝗲𝗮𝗿𝘀 𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗠𝗼𝗱𝗲𝗿𝗻 𝗗𝗮𝘁𝗮 𝗦𝘁𝗮𝗰𝗸𝘀 & 𝗔𝗜 𝗦𝗼𝗹𝘂𝘁𝗶𝗼𝗻𝘀 | 𝗗𝗮𝘁𝗮 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿 & 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 𝗦𝗽𝗲𝗰𝗶𝗮𝗹𝗶𝘀𝘁 I’m a Data Engineer and Analytics Specialist delivering production-ready data pipelines, scalable architectures, and cloud-based platforms that hold up under real-world usage. Currently pursuing a Master’s in Data Science, I seamlessly bridge the gap between heavy-duty data engineering and advanced AI/machine learning implementations. I work with founders, startups, and enterprise product teams to design, build, and optimize data systems. Whether you need to migrate legacy workflows, build an AI-powered forecasting tool on Databricks, or establish a single source of truth for your business, I build data infrastructure that performs reliably under heavy data loads. Over the years, I have successfully supported high-growth organizations across SaaS, Retail, Finance, Telecom, IoT, and Pharmaceuticals. 𝗛𝗼𝘄 𝗜 𝗰𝗮𝗻 𝗵𝗲𝗹𝗽 𝘆𝗼𝘂: ✔️ Data Architecture & Warehousing: Planning, structuring, and implementing end-to-end cloud data warehouses. ✔️ Scalable ETL/ELT Pipelines: Designing, building, and optimizing robust ingestion and automation workflows. ✔️ Databricks & AI Implementation: Developing AI-enabled solutions, advanced analytics, and intelligent reporting features on Databricks. ✔️ Performance Optimization: Troubleshooting complex data pipeline bottlenecks, slow queries, and performance issues. ✔️ Workflow Automation: Turning manual data processes (like legacy Excel tracking) into automated, clean, and well-modeled data systems. ✔️ Data Infrastructure & Security: Implementing database replication, secure backups, and reliable recovery solutions. 𝗧𝗲𝗰𝗵 𝘀𝘁𝗮𝗰𝗸 (𝗽𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻-𝗳𝗼𝗰𝘂𝘀𝗲𝗱): ✔️ Data Engineering & Orchestration: Apache Airflow, Airbyte, dbt, PySpark, SparkSQL, Hadoop (Impala), Batch & Distributed Processing ✔️ Cloud & Infrastructure: Azure Databricks, Azure Data Factory, AWS (Redshift, S3, EC2, RDS, Athena, EMR), Docker, CI/CD (Jenkins) ✔️ Databases & Warehouses: SQL (PostgreSQL, MySQL, MariaDB), NoSQL (MongoDB - Aggregation Pipelines, Replication), ClickHouse ✔️ Programming & Analytics: Python, SQL, Pandas, NumPy, PyMongo, BeautifulSoup, Requests, Plotly ✔️ AI & Data Science: Databricks AI Solutions, Machine Learning Foundations, Predictive Reporting & Models 𝗔𝘃𝗮𝗶𝗹𝗮𝗯𝗶𝗹𝗶𝘁𝘆: ✔️ Open to part-time, full-time, and long-term roles ✔️ Available for a free consultation call (discounts applied for long-term projects) 𝗟𝗲𝘁’𝘀 𝗰𝗼𝗻𝗻𝗲𝗰𝘁! If you’re looking for a senior data partner to reduce manual work, eliminate data engineering overhead, and unlock AI-driven insights for your platform, feel free to send me a message. - $150/hr $150 hourly
Dan S.
- 5.0
- (17 jobs)
Corvallis, ORApache Spark
APIData AnalysisDatabaseAmazon Web ServicesBusiness AnalysisSnowflakeDatabricks PlatformETL PipelinePythonApache AirflowDashboardTableauSQLAs a Data and Business Intelligence Engineer, I strive to deliver consulting and freelance data engineering services, with a focus on overseeing and executing projects in alignment with customer needs. With services encompassing the full data journey, I create and implement robust data foundations that streamline process development and enable leaders to execute rapid business decisions. Three categories of service include: • Consulting: Data Strategy Development, Data & Reporting Solution Architecture, and Process Development. • Data Products/Engineering: Dashboard Development & Reporting, Data Pipelines (ETL), Process Automation, and Data Collection. • Analytics: Key Performance Indicators (KPIs), Metrics, Data Analysis, and Business Process Analysis. Leveraging over eight years of experience in business intelligence, data visualization, business analysis, and requirements analysis, I build data pipelines and translate data into actionable insights that provide a competitive edge. Tools of choice include Amazon Web Services (AWS), Databricks, Snowflake, Kafka, Snowpipe Streams, Airflow, Tableau/PowerBI, SQL, NoSQL, APIs, Python, and Spark/PySpark. Let me know what I can do for YOU!
- $45/hr $45 hourly
Moises R.
- 4.9
- (10 jobs)
Barcelona, CTApache Spark
RESTful APIMicrosoft AzureDatabricks PlatformAmazon Web ServicesNoSQL DatabaseApache KafkaDockerETLPythonData Engineer with a demonstrated history of working in the consulting industry. Skilled in the development of ETL processes and the development of APIs. Proficient in Python, PostgreSQL, R, and Azure with knowledge also in AWS, Spark, and Scala. Analytical, team-oriented, and resilient. Technologies and skills: AWS Azure Databricks Data Architect Data Lake Docker Hadoop Lakehouse Microsoft Fabric MongoDB Python Spark - $40/hr $40 hourly
Hassan U.
- 5.0
- (13 jobs)
Karachi, SDApache Spark
Microsoft ExcelAmazon RDSApache AirflowAmazon S3Amazon RedshiftdbtPythonSQLData Engineering7+ 𝗬𝗲𝗮𝗿𝘀 𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗠𝗼𝗱𝗲𝗿𝗻 𝗗𝗮𝘁𝗮 𝗦𝘁𝗮𝗰𝗸𝘀 & 𝗔𝗜 𝗦𝗼𝗹𝘂𝘁𝗶𝗼𝗻𝘀 | 𝗗𝗮𝘁𝗮 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿 & 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 𝗦𝗽𝗲𝗰𝗶𝗮𝗹𝗶𝘀𝘁 I’m a Data Engineer and Analytics Specialist delivering production-ready data pipelines, scalable architectures, and cloud-based platforms that hold up under real-world usage. Currently pursuing a Master’s in Data Science, I seamlessly bridge the gap between heavy-duty data engineering and advanced AI/machine learning implementations. I work with founders, startups, and enterprise product teams to design, build, and optimize data systems. Whether you need to migrate legacy workflows, build an AI-powered forecasting tool on Databricks, or establish a single source of truth for your business, I build data infrastructure that performs reliably under heavy data loads. Over the years, I have successfully supported high-growth organizations across SaaS, Retail, Finance, Telecom, IoT, and Pharmaceuticals. 𝗛𝗼𝘄 𝗜 𝗰𝗮𝗻 𝗵𝗲𝗹𝗽 𝘆𝗼𝘂: ✔️ Data Architecture & Warehousing: Planning, structuring, and implementing end-to-end cloud data warehouses. ✔️ Scalable ETL/ELT Pipelines: Designing, building, and optimizing robust ingestion and automation workflows. ✔️ Databricks & AI Implementation: Developing AI-enabled solutions, advanced analytics, and intelligent reporting features on Databricks. ✔️ Performance Optimization: Troubleshooting complex data pipeline bottlenecks, slow queries, and performance issues. ✔️ Workflow Automation: Turning manual data processes (like legacy Excel tracking) into automated, clean, and well-modeled data systems. ✔️ Data Infrastructure & Security: Implementing database replication, secure backups, and reliable recovery solutions. 𝗧𝗲𝗰𝗵 𝘀𝘁𝗮𝗰𝗸 (𝗽𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻-𝗳𝗼𝗰𝘂𝘀𝗲𝗱): ✔️ Data Engineering & Orchestration: Apache Airflow, Airbyte, dbt, PySpark, SparkSQL, Hadoop (Impala), Batch & Distributed Processing ✔️ Cloud & Infrastructure: Azure Databricks, Azure Data Factory, AWS (Redshift, S3, EC2, RDS, Athena, EMR), Docker, CI/CD (Jenkins) ✔️ Databases & Warehouses: SQL (PostgreSQL, MySQL, MariaDB), NoSQL (MongoDB - Aggregation Pipelines, Replication), ClickHouse ✔️ Programming & Analytics: Python, SQL, Pandas, NumPy, PyMongo, BeautifulSoup, Requests, Plotly ✔️ AI & Data Science: Databricks AI Solutions, Machine Learning Foundations, Predictive Reporting & Models 𝗔𝘃𝗮𝗶𝗹𝗮𝗯𝗶𝗹𝗶𝘁𝘆: ✔️ Open to part-time, full-time, and long-term roles ✔️ Available for a free consultation call (discounts applied for long-term projects) 𝗟𝗲𝘁’𝘀 𝗰𝗼𝗻𝗻𝗲𝗰𝘁! If you’re looking for a senior data partner to reduce manual work, eliminate data engineering overhead, and unlock AI-driven insights for your platform, feel free to send me a message. - $150/hr $150 hourly
Dan S.
- 5.0
- (17 jobs)
Corvallis, ORApache Spark
APIData AnalysisDatabaseAmazon Web ServicesBusiness AnalysisSnowflakeDatabricks PlatformETL PipelinePythonApache AirflowDashboardTableauSQLAs a Data and Business Intelligence Engineer, I strive to deliver consulting and freelance data engineering services, with a focus on overseeing and executing projects in alignment with customer needs. With services encompassing the full data journey, I create and implement robust data foundations that streamline process development and enable leaders to execute rapid business decisions. Three categories of service include: • Consulting: Data Strategy Development, Data & Reporting Solution Architecture, and Process Development. • Data Products/Engineering: Dashboard Development & Reporting, Data Pipelines (ETL), Process Automation, and Data Collection. • Analytics: Key Performance Indicators (KPIs), Metrics, Data Analysis, and Business Process Analysis. Leveraging over eight years of experience in business intelligence, data visualization, business analysis, and requirements analysis, I build data pipelines and translate data into actionable insights that provide a competitive edge. Tools of choice include Amazon Web Services (AWS), Databricks, Snowflake, Kafka, Snowpipe Streams, Airflow, Tableau/PowerBI, SQL, NoSQL, APIs, Python, and Spark/PySpark. Let me know what I can do for YOU! - $60/hr $60 hourly
Mohamed A.
- 5.0
- (18 jobs)
Giza, AL OMRANEYAHApache Spark
ETL PipelineData WarehousingSQL ProgrammingElasticsearchMongoDBDatabase ArchitectureScalaFlaskNeo4jDatabase DesignApache KafkaApache HadoopApache HivePythonSenior Data Architect & AWS Data Engineering Manager | 12+ Years Enterprise Experience I help enterprises design, build, and govern scalable data platforms that drive real business decisions — not just store data. Currently leading enterprise-wide data transformation at Publicis Sapient as AWS Data Architect for Omantel's Customer Personalization Platform — architecting real-time analytics pipelines, Data Governance frameworks using AWS DataZone, and Agentic AI solutions for advanced customer intelligence. Previously at DELL Technologies, I led migration of DELL's global data lake from Azure to on-prem Greenplum, built data pipeline strategies, and supported MLOps teams across multiple business units. What I bring to your project: End-to-end Cloud Data Architecture (AWS-certified: Data Analytics Specialty + Cloud Practitioner) Big Data Platform design & migration (Hadoop, Spark, Kafka, Hive, Airflow) Data Governance & Cataloging (AWS DataZone, Alation, Collibra) ETL/ELT pipeline engineering (Airflow, NiFi, Talend, SSIS, ODI) Database expertise: Greenplum, PostgreSQL, Oracle RAC, MSSQL Server Infrastructure as Code: Terraform (HashiCorp Certified) I've delivered data solutions across Telecom, Fintech, Banking, and Government sectors in Egypt, the Gulf, and globally. Whether you need a full architecture review, a migration plan, or hands-on engineering — I deliver. - $60/hr $60 hourly
Azamat A.
- 5.0
- (3 jobs)
Kenosha, WIApache Spark
Jakarta EEAndroid SDKAndroid App DevelopmentData LakeData ModelingAmazon Web ServicesMicrosoft AzureAWS LambdaAWS GluePySparkETLData EngineeringMachine LearningDatabricks PlatformSQLJavaPythonABOUT ME: I am Lead Data Engineer with strong software development background. I have over 10 years of professional experience in IT, 7 years of which in Data Engineering. I have MS in Software Engineering from DePaul University (Chicago, IL USA) WHAT I CAN DO FOR YOU: Having worked as a Lead Data Engineer in Fortune 500 big enterprises, I can help startups with with *developing comprehensive data governance and security strategies, *designing and implementing cloud data platforms (Azure, AWS, Databricks) * data warehouse modelling * data lake/data lakehouse modelling *cost optimization of data and ML pipelines *performance optimization of data and ML pipelines TECHNICAL SKILLS Python| Java| Scala| PySpark| Apache Spark| Apache Airflow| Databricks| AWS| Azure| AWS EMR| AWS GLUE | Azure Datafactory | Azure Synapse - $90/hr $90 hourly
Mihail K.
- 5.0
- (31 jobs)
Shtip, ŠTIPApache Spark
Data MiningGitLabDockerGoogle Cloud PlatformPostgreSQLBigQueryTerraformBig DataAmazon Web ServicesApache AirflowData ScrapingPythonETL PipelineSQLI have 6 years of experience in ETL, Big Data processing, streaming, web scraping, and infrastructure as code using Terraform. Technologies I work with: ETL: Python, Scala, PySpark, Airflow Storage: BigQuery, Cloud Storage, S3 Streaming: Apache Beam, GCP DataFlow Web Scraping: Python Databases: PostgreSQL, MongoDB, InfluxDB Infrastructure as Code: Terraform Visualization: Grafana Vendors: GCP, AWS My specialization lies in providing end-to-end solutions, ensuring seamless processes in ETL, robust handling of Big Data, efficient streaming capabilities, effective web scraping, and proficient database management. I am well-versed in utilizing Python, PySpark, and Airflow for smooth data extraction, transformation, and loading. Leveraging Apache Beam and GCP DataFlow, I enable real-time data processing. I also utilize PostgreSQL and MongoDB to efficiently store and organize data. With my expertise in Terraform, I bring automation to infrastructure management, making provisioning and maintenance hassle-free. Let's collaborate and transform your data into valuable insights. Reach out to me now, and together, we can leverage my expertise in ETL, Big Data processing, streaming, web scraping, and database management, backed by the convenience of infrastructure as code with Terraform! - $35/hr $35 hourly
Aleksandr B.
- 5.0
- (2 jobs)
Alvsjo, ABApache Spark
Data Quality AssessmentBig DataSoftware TestingReactTypeScriptPythonRobot FrameworkSelenium WebDriverAutomated TestingFunctional Testing- Lead QA Automation professional with 8+ years of experience in test process optimization in GCP and AWS environments. - Enhanced CI pipeline performance, advocated for TestCases-as-a-code, and achieved high automation coverage. - Proficient in TypeScript, Python, Groovy, Java, Scala, and tools like WebdriverIO, Mocha, Allure, and Robot Framework. - Specialized in Big Data and Machine Learning with a focus on Data Quality, using AWS, Deeque, Great Expectations, Hadoop, Spark, Airflow, and Kubernetes. - $60/hr $60 hourly
Joaquim V.
- 5.0
- (4 jobs)
Amora, SETÚBALApache Spark
ETL PipelineAmazon Web ServicesWeb ScrapingAPIKubernetesTerraformPySparkAWS LambdaApache HadoopPythonpandasApache HiveOver the past years I have been gathering knowledge of all things data. Throughout my career I have successfully merged the concerns of data processing with those of software development, delivering datasets and tools with immense added value for my employers. As of late I have increasingly adopted the philosophy of DevOps, not only managing data transformation pipelines, but also their life-cycle and that of their supporting infrastructure, most notably by the use of Terraform in combination with AWS. I am hoping to capitalize on my accumulated expertise in a way that would not be possible on a long term job, delivering great value to individuals and companies that are willing to invest in order to reap excellent results. I am looking for projects that require a wide range of expertise and a capacity to think outside the box, projects with hard and challenging projects. I am also a fan of automation so projects that aim at a software solution for repetitive tasks (either for increased efficiency or scale) are also welcome. I hope my profile fits your requirements and I'm looking forward to hearing from interesting clients. - $40/hr $40 hourly
Muhammad Umar A.
- 5.0
- (8 jobs)
Dubai, DUApache Spark
MLOpsSolution ArchitectureDeep Neural NetworkModel TuningLarge Language ModelMicrosoft AzureData EngineeringPythonData ScienceTensorFlowNatural Language ProcessingDeep LearningMachine LearningArtificial IntelligenceA seasoned Data & AI Solution Architect with over 6 years of experience delivering cutting-edge solutions in GenAI, Machine Learning, and Advanced Analytics across diverse industries, including telecom, retail, automotive, finance, and energy. I specialize in designing and implementing end-to-end data-driven solutions leveraging platforms like Databricks, AWS, Azure, and GCP, ensuring scalability, efficiency, and business impact. Key Highlights - AI Expertise: Proven success in developing AI-powered solutions, including fine-tuning LLM models (e.g., Llama-3-8B), automating workflows, and creating recommendation engines that increase customer engagement and revenue. - Generative AI: Skilled in designing and implementing Agentic AI solutions using LangGraph and Model Context Protocol (MCP), enabling AI assistants to interact with enterprise systems, APIs, databases, cloud resources, and business applications through standardized tool interfaces. Built intelligent multi-agent workflows capable of orchestrating business processes, automating decision-making, and integrating seamlessly with enterprise ecosystems. - Agentic A & MCPI: Skilled in designing and implementing Agentic AI solutions using LangGraph and Model Context Protocol (MCP), enabling AI assistants to interact with enterprise systems, APIs, databases, cloud resources, and business applications through standardized tool interfaces. - Data Engineering Excellence: Proficient in building optimized data pipelines, transforming raw data into actionable insights, and implementing Delta Lakehouse architectures to reduce costs and improve operational efficiency. - Cloud Mastery: Extensive hands-on experience in cloud environments (AWS, Azure, GCP) for deploying scalable infrastructure and integrating cloud-native AI/ML solutions. - Databricks Expertise: A Databricks-certified professional with deep expertise in Unified Analytics, Delta Live Tables, and enabling AI-driven efficiencies for large-scale enterprises. - Business Impact: Delivered measurable results, such as reducing incident handling time by 90%, increasing app engagement by 150%, and optimizing production assembly lines across 40+ plants. Certifications & Recognition - Databricks Certified (Data Engineer Professional, Machine Learning Associate, Spark Developer). - AWS Community Builder for the past 2 years, showcasing expertise and active contributions to the AI and cloud community. - 25+ certifications from Coursera in Data Science, AI, and Cloud Computing. What I Bring to the Table - A client-centric approach with a knack for understanding business challenges and aligning technical solutions to meet organizational goals. - A proven track record of leadership, having led teams of data scientists and engineers to deliver impactful projects across geographies. - Expertise in designing AI-driven systems for personalization, predictive analytics, and automation, enhancing customer experiences and driving growth. - $350/hr $350 hourly
Michael M.
- 5.0
- (35 jobs)
Brigham City, UTApache Spark
Large Language ModelVisual Basic for ApplicationsModelingForecastingChatGPTNatural Language ProcessingMachine LearningPython Scikit-LearnMicrosoft ExcelSQLTensorFlowPython"Michael is just FANTASTIC. He is by far the best freelancer I have worked with over the past four years. He makes the process so seamless." Ranked in the top 1% of freelancers, member of the Upwork vetted expert program, and over 12 years experience. Please reach out to me for any of your AI/ML & Data Science Needs. Please see modelforge.ai for more information. - $40/hr $40 hourly
Muhammad U.
- 4.8
- (16 jobs)
Lahore, PBApache Spark
AWS Cloud9React NativeMobile AppFlutterSpring FrameworkReactTypeScriptAngularSpring BootNode.jsI help companies modernize legacy systems and accelerate SaaS development. With 9+ years of experience in Angular, Spring Boot and AWS, I’ve led projects ranging from low-code platforms to multi-tenant enterprise applications serving thousands of users. What I can do for you • Custom Web Applications – From MVPs to full-scale enterprise platforms. • SaaS Development – Multi-tenant architectures, subscription systems, and complex integrations. • Front-End Excellence – Clean, responsive interfaces built with Angular or React. • Back-End APIs – Secure and efficient Node.js or Spring Boot services with REST or GraphQL. • Performance & Scalability – Optimized solutions that evolve with your business. Why clients choose me • 9+ years of proven full-stack experience. • Strong grasp of both technical and business perspectives. • Clear communication, reliable delivery and long-term partnership mindset. If you’re looking for a full-stack architect who can take ownership from concept to deployment, let’s discuss how I can help bring your vision to life. - $35/hr $35 hourly
Rakesh D.
- 5.0
- (13 jobs)
Pune, MAHARASHTRAApache Spark
C++JavaScalaApache HadoopPythonApache CassandraOracle PLSQLApache HiveClouderaGoogle Cloud Platform✨ Seasoned software professional with 20+ years of experience in end-to-end software development, including 8+ years specializing in Big Data technologies and cloud-based solutions. Proven expertise in building scalable, high-performance data platforms using Apache Spark, Hadoop, Hive, Cassandra, and programming in Scala, Python, Java and C++. ✨ I focus on designing robust, enterprise-grade Big Data and Data Engineering architectures on GCP, AWS, and Azure, both in on-prem and cloud environments. My role involves solution architecture, technical leadership, and hands-on development of critical components. ✨ I am passionate about leveraging my experience to build cutting-edge data and AI solutions. Open to senior technical roles, consulting opportunities, and innovative startup environments. 🔹 Keen eye on scalability, sustainability of the solution 🔹 Can come up with maintainable & good object-oriented designs quickly 🔹 Highly experienced in seamlessly working with remote teams effectively 🔹 Aptitude for recognizing business requirements and solving the root cause of the problem 🔹 Can quickly learn new technologies 🔹 Transparency, Dedication, Qualtity and Satisfaction Guaranteed Sound experience in following technology stacks: ✨ Big Data: Apache Spark, Spark Streaming, HDFS, Hadoop MR, Hive, Apache Kafka, Cassandra, Google Cloud Platform (Dataproc, Cloud storage, Cloud Function, Datastore/Firestore, Pub/Sub), Cloudera Hadoop 5.x ✨ Languages: Scala, Python, Java, C++, C, Scala with Akka and Play frameworks ✨ Build Tools: Sbt, Maven ✨ Databases: Postgres, Oracle, MongoDB/CosmosDB ✨ GCP Services: GCS, DataProc, Cloud functions, Pub/Sub, Data-store, BigQuery ✨ AWS Services: S3, VM, VM Auto-scaling Group, EMR, S3 Java APIs, Redshift, MongoDB ✨ Azure Services: Blob, VM, VM scale-set, Blob Java APIs, Synapse, CosmosDB ✨ Other Tools/Technologies: Kubernetes, Dockerization, Terraform Worked with different types of Input & Storage formats: CSV, XML, JSON file, Mongodb, Parquet, ORC - $40/hr $40 hourly
Tahir A.
- 5.0
- (7 jobs)
Islamabad, ISApache Spark
SupabaseCRM DevelopmentAutomationAirtableAWS LambdaData EngineeringArtificial IntelligenceETLMicrosoft Power BIData AnalyticsMachine LearningData SciencePythonSQLResults-driven Cloud Solution Architect with deep expertise in Data Engineering, DevOps, and AI Engineering, specializing in LLM (Large Language Models), LangChain, and RAG (Retrieval-Augmented Generation). Adept at designing scalable cloud-native solutions, optimizing data pipelines, and implementing cutting-edge AI integrations to drive business innovation. Core Skills & Expertise: ✔ Cloud Architecture & DevOps – AWS, Azure, GCP | Kubernetes, Docker, Terraform, CI/CD ✔ Data Engineering – Big Data (Spark, Hadoop), ETL/ELT, Data Lakes/Warehouses (Delta Lake, Snowflake) ✔ AI/ML Engineering – LLM (GPT, Llama 2), LangChain, RAG, Vector Databases (Pinecone, FAISS) ✔ Generative AI & NLP – Fine-tuning, Prompt Engineering, AI Agent Development ✔ Integration & Automation – API-first Architectures, Event-Driven Systems (Kafka), MLOps ✔ Optimization & Scalability – High-Performance AI/Data Systems, Cost-Efficient Cloud Deployments Key Contributions: Designed AI-powered cloud solutions leveraging LLMs, LangChain, and RAG for enterprise applications. Built scalable data pipelines for real-time analytics and AI model training. Implemented MLOps & DevOps best practices to streamline AI/ML deployments. Developed custom AI agents for automation, knowledge retrieval, and intelligent decision-making. Passionate about bridging the gap between cloud infrastructure, data engineering, and AI innovation to deliver transformative business solutions. - $35/hr $35 hourly
Gideon A.
- 4.9
- (5 jobs)
Ile-Ife, OSApache Spark
SeleniumAmazon Web ServicesData AnalysisBigQueryData ExtractionAWS GlueWeb CrawlingData EngineeringETL PipelineScrapyMicrosoft Power BISQLData SciencePythonYour Go-To Data & Analytics Engineer for Scalable, Cloud-Native Solutions Need someone who can clean up chaotic data, design pipelines that don't break, and turn raw numbers into real decisions? That's where I come in. I engineer end-to-end data solutions using: - GCP & AWS for cloud-native deployments - Airflow, dbt, PySpark, BigQuery, Snowflake for seamless data orchestration and warehousing - Kafka for real-time streaming pipelines - PostgreSQL, MongoDB for robust data storage - Great Expectations for ensuring data quality and trust From building batch/streaming pipelines and handling SCD Type 1 & 2, to modeling clean, analytics-ready layers (Star Schema, 3NF, or Data Vault) — I bring structure, clarity, and business focus to every project. Clients appreciate my no-fluff approach: clear communication, fast turnarounds, and data systems that just work. Let's build a data foundation that scales with your business. - $125/hr $125 hourly
Chisom E.
- 4.8
- (14 jobs)
Dallas, TXApache Spark
JavaApache HadoopAmazon Web ServicesSnowflakeMicrosoft AzureGoogle Cloud PlatformDatabase ManagementLinuxETLAPI IntegrationScalaSQLPython🏆 Achieved Top-Rated Freelancer status (Top 10%) with a proven track record of success. Past experience: Twitter, Spotify, & PwC. I am a certified data engineer & software developer with 5+ years of experience. I am familiar with almost all major tech stacks on data science/engineering and app development. If you require support in your projects, please do get in touch. Programming Languages: Python | Java | Scala | C++ | Rust | SQL | Bash Big Data: Airflow | Hadoop | MapReduce | Hive | Spark | Iceberg | Presto | Trino | Scio | Databricks Cloud: GCP | AWS | Azure | Cloudera Backend: Spring Boot | FastAPI | Flask AI/ML: Pytorch | ChatGPT | Kubeflow | Onnx | Spacy | Vertex AI Streaming: Apache Beam | Apache Flink | Apache Kafka | Spark Streaming SQL Databases: MSSQL | Postgres | MySql | BigQuery | Snowflake | Redshift | Teradata NoSQL Databases: Bigtable | Cassandra | HBase | MongoDB | Elasticsearch Devops: Terraform | Docker | Git | Kubernetes | Linux | Github Actions | Jenkins | Gitlab - $35/hr $35 hourly
Vignesh I.
- 5.0
- (32 jobs)
Chennai, TAMIL NADUApache Spark
SQLAWS GluePySparkApache CassandraETL PipelineApache HiveApache NiFiApache KafkaBig DataApache HadoopScalaSeasoned data engineer with over 11 years of experience in building sophisticated and reliable ETL applications using Big Data and cloud stacks (Azure and AWS). TOP RATED PLUS . Collaborated with over 20 clients, accumulating more than 2000 hours on Upwork. 🏆 Expert in creating robust, scalable and cost-effective solutions using Big Data technologies for past 9 years. 🏆 The main areas of expertise are: 📍 Big data - Apache Spark, Spark Streaming, Hadoop, Kafka, Kafka Streams, Trino, HDFS, Hive, Solr, Airflow, Sqoop, NiFi, Flink 📍 AWS Cloud Services - AWS S3, AWS EC2, AWS Glue, AWS RedShift, AWS SQS, AWS RDS, AWS EMR 📍 Azure Cloud Services - Azure Data Factory, Azure Databricks, Azure HDInsights, Azure SQL 📍 Google Cloud Services - GCP DataProc 📍 Search Engine - Apache Solr 📍 NoSQL - HBase, Cassandra, MongoDB 📍 Platform - Data Warehousing, Data lake 📍 Visualization - Power BI 📍 Distributions - Cloudera 📍 DevOps - Jenkins 📍 Accelerators - Data Quality, Data Curation, Data Catalog - $50/hr $50 hourly
Junaid A.
- 5.0
- (16 jobs)
Islamabad, ISApache Spark
Databricks PlatformClaudeChatGPTPerformance OptimizationChatbot TrainingData EngineeringGenerative AIETL PipelinePySparkData ScienceMachine LearningPythonPyTorchLarge Language Model✅ Top 1% AI Freelancer ✅ Top Rated Plus ✅ 100% Job Success Score ✅ $80K+ Earnings ✅ 90%+ Model Accuracies for Clients Over the years I have built production grade AI systems for SMBs combining LLM training, fine-tuning and deployment at scale. Here's what I deliver: ✔️ ML Model Training: High performance pipelines that deliver accuracy using scikit-learn stack for e-commerce, healthcare, fintech. ✔️ LLM Fine-Tuning: Domain specific LLM fine-tuning using PyTorch, HuggingFace transformers that solve benchmarks and allow you to ship intelligence to users. ✔️ Inference Optimization: Deploy the LLMs on NVIDIA clusters (H100, H200, B200) such that they sustain the production load of long LLM chats using SGLang, vLLM, Tensor-RT and Nvidia Dynamo ✔️ RAG Solutions: Help reduce the model costs and improve the model efficiency using Milvus, Qdrant, Embedding models, LangChain and LangGraph. ✔️ Data Engineering: High-throughput pipelines for warehouses using Kafka, BigQuery, Prometheous, SQL and Python. ✔️ Crypto AI: Bots that continously read the market signals and make trading decisions based on the LLMs and in-house ML models. ✔️ Fintech: AI solutions powered by Claude, Qwen, DeepSeek that solve taxation problems for personal taxes. ⭐ Recent Client Feedback "He cleaned up training artifacts (models, calibrators, label maps, vendor vocab), helped us get an acceptance bar we can trust. He handled data engineering + deployment details without drama. We shipped on his work. Strongly recommend and would hire again." Let's have a FREE consultation call to understand your requirements and you get a production level roadmap for your problem. - $45/hr $45 hourly
Eniko V.
- 5.0
- (7 jobs)
London, ENGLANDApache Spark
AWS LambdaTerraformSnowflakeData IngestionGrafanaSQLAWS GlueAmazon ECSPythondbtCI/CDData ModelingApache HadoopI’m a Senior Data Engineer and freelance consultant with 9+ years of experience designing, building, and optimizing cloud-based data platforms. I help startups and enterprises scale their data infrastructure, improve performance, and ensure reliability, while reducing costs and improving governance. I specialize in ETL/ELT pipelines, cloud databases, serverless architectures, and infrastructure as code, working with tools like AWS, Terraform, Spark, dbt, Snowflake, Redshift and PostgreSQL/MySQL databases. I’ve partnered with clients across HealthTech, PropTech, Telecoms, Banking, Retail, Marketing, and Cybersecurity, delivering high-impact, production-ready solutions. What I Do Best ✅ Design and implement scalable ETL/ELT pipelines using Python, PySpark, and AWS ✅ Architect and manage cloud-based databases (RDBMS and cloud warehouses) for performance, security, and scalability ✅ Build serverless and event-driven architectures using AWS Lambda, SQS, SNS, Glue, Athena, EMR, and ECS ✅ Provision infrastructure reliably with Terraform and implement CI/CD pipelines (CircleCI, GitLab CI/CD, Github Actions) ✅ Implement data warehousing, modeling, and analytics solutions using Snowflake, BigQuery, dbt, and PostgreSQL/MySQL/Aurora ✅ Monitor and alert on job health with CloudWatch, Grafana, and custom dashboards ✅ Containerize applications with Docker and manage batch or service-based workloads on AWS Tech Stack ✅ Languages & Tools: Python, PySpark, Pandas, SQL, dbt, Docker ✅ Cloud & Infrastructure: AWS (Lambda, Glue, EMR, ECS, Batch, EC2, S3, SNS/SQS, ELB, DMS, DynamoDB, RDS, MWAA, API Gateway), Terraform, Serverless Architecture ✅ Databases: Snowflake, BigQuery, RDS (PostgreSQL, MySQL, Aurora), Redshift ✅ Monitoring: CloudWatch, Grafana ✅ CI/CD & Version Control: CircleCI, GitLab CI/CD, Github Actions, GitHub, GitLab - $56/hr $56 hourly
Abha K.
- 5.0
- (9 jobs)
Mumbai, MHApache Spark
Apache NiFiPySparkDatabricks PlatformETL PipelineBig DataGrafanaKibanaApache KafkaPostgreSQLMicrosoft AzureMongoDBScalaPythonElasticsearchGoogle Cloud PlatformAmazon Web Services🚀 Data Engineer & Solution Architect | Scaling Data Platforms 10× Without Breaking Them I design data systems that don’t just run, they scale, perform, and stay reliable under real-world pressure. With 7+ years building enterprise-grade platforms, I’ve seen the same story repeat: A pipeline works at 10M records… then collapses at 100M. Costs spiral. Latency explodes. Nobody wants to touch the legacy system. That’s where I come in. 🧠 What I Actually Deliver I architect cloud-native data platforms built for tomorrow not quick fixes for today. ✔ Migrate fragile legacy systems to modern, resilient architectures ✔ Design scalable data lakes and lakehouses ✔ Optimize pipelines bleeding money and compute ✔ Build real-time analytics for mission-critical decisions ✔ Create foundations ready for AI/ML workloads Result: Systems that grow with your business instead of holding it back. ⚙️ Deep Technical Expertise Across the Stack ☁️ Cloud Platforms AWS: Glue, EMR, Redshift, Kinesis, S3, Lambda, Lake Formation, DMS, MSK, RDS Azure: Data Factory, Synapse, Databricks, DevOps GCP: Dataflow, Cloud Functions, Cloud Storage 🔥 Big Data & Streaming Apache Spark (Scala & PySpark) • Kafka • Kinesis • NiFi • Hadoop Ecosystem • Airflow • Delta Lake 💻 Programming Python • Scala • SQL • Shell • Java 🗄️ Databases & Storage PostgreSQL • MySQL • Oracle • SQL Server • MongoDB • Cassandra • DynamoDB • Elasticsearch 🛠️ DevOps & Infrastructure Docker • Kubernetes • OpenShift • Terraform • Jenkins • Ansible • Git 📊 Observability & Governance CloudWatch • ELK • Grafana • Athena IAM • Lake Formation • Encryption • Audit Logging • Okta • Cognito 🏢 Enterprise Experience That Matters I’ve delivered production systems for Fortune 500 organizations across finance, energy, hospitality, and SaaS handling hundreds of millions of records daily. From ingestion → transformation → real-time analytics → security → DevOps automation — I design the full lifecycle. 🏆 Proven Impact ✔ Re-architected legacy pipelines → 5× performance boost & 60% cost reduction ✔ Built event-driven systems processing 500M+ records/day ✔ Delivered secure data lakes with row-level governance ✔ Reduced MTTR by 70% with end-to-end observability ✔ Led zero-downtime cloud migrations ✔ Secured $2B+ transaction data with encryption platforms 🤝 Best Fit For Organizations That Need 🔹 Cloud migration with strong architectural guidance 🔹 Performance or scalability bottlenecks 🔹 Data platforms for AI/ML initiatives 🔹 Multi-cloud or hybrid strategies 🔹 Long-term reliability over quick hacks ⚠️ Not a Fit For ❌ One-off scripts or basic SQL tasks ❌ Temporary data cleanup work ❌ Short-term patch solutions I focus where architecture decisions create lasting business value. 💬 What Clients Value Most Clear thinking on complex problems Communication executives understand Engineering teams trust Systems built to last 👉 If your data platform needs to scale, stabilize, or modernize then let’s talk. - $40/hr $40 hourly
Teoman Y.
- 5.0
- (18 jobs)
Ankara, ANKARAApache Spark
AnsibleRed Hat AdministrationApache NiFiDevOps EngineeringKubernetesDockerScriptingPythonBashHi! I'm Teoman. I currently work as a full time DevOps Engineer for the Ministry of Interior. My main responsibilities include: - Managing Kubernetes Clusters that vary from development, staging to production. I also hold the CKA certificate. The applications run on the cluster are Java microservices, infrastructure related applications such as internal packaging systems (plugins, image registries), deployment related applications such as ArgoCD and GitLab runners and so on. I manage Big Data Engineering Kubernetes clusters that host Spark Applications, NiFi clusters, Trino backends, etc. - GitLab CI/CD pipelines where as a DevOps team, managing more than 50 projects and tracing every pipeline 24/7, creating smooth deployments that are being used by the whole country. - Managing infrastructure as code where I make calls affecting hundreds of Linux servers including production servers, tracking changes with Git. - Monitoring the running infrastructure where this many servers need to be intervened immediately in case of any failure, in which I rely on Grafana Prometheus Loki stack which again deployed on Kubernetes and bare metal, with many instances running on many networks collecting logs and metrics. I'm a Linux user since 16, and a professional administrator for 3 years now. Would be glad to be of your service, Thanks - $60/hr $60 hourly
Fernando M.
- 5.0
- (8 jobs)
Bradenton, FLApache Spark
Business IntelligenceBig DataSQL ProgrammingData ModelingSASData MiningData WarehousingMicrosoft SQL ServerETLBigQuerySnowflakeSQLData EngineeringI have successfully harnessed a wide range of data sources, skillfully extracting and transforming them into valuable assets by leveraging cost-effective open-source architectures. In the process, I have adeptly addressed architectural and modeling challenges for businesses. I am eager to contribute my expertise to projects, enhancing their effectiveness while cutting costs through the use of open source solutions and my proven problem-solving abilities. - $80/hr $80 hourly
Omer Emirhan T.
- 5.0
- (3 jobs)
Istanbul, ISTANBULApache Spark
Web ScrapingApache AirflowJupyter NotebookData ScienceData EngineeringPostgreSQLpandasPython💡 About Me I’m an AI Engineer & Data Scientist with hands-on experience in building, deploying, and optimizing end-to-end machine learning and AI systems. I specialize in LLMs, MLOps pipelines, and predictive modeling, turning complex data into actionable insights and intelligent automation. 🧠 AI & Machine Learning I design, train, and deploy machine learning and deep learning models using Python, PyTorch, TensorFlow, and LightGBM. From LLM fine-tuning and prompt engineering to computer vision and time-series forecasting, I’ve worked across diverse domains — e-commerce, logistics, and manufacturing. I also build AI microservices and APIs using FastAPI and Docker, integrating models seamlessly into production systems. 📊 Data Science & Analytics I have deep expertise in data analysis, feature engineering, and statistical modeling. Using tools like SQL, Pandas, and Scikit-learn, I build robust, explainable models that drive business impact. I’m experienced in A/B testing, experimentation, and visualization with Matplotlib, Plotly, and Power BI. ⚙️ Data & Cloud Engineering I build scalable data pipelines using AWS (Redshift, S3, Athena, Lambda) and Kafka, ensuring reliability and performance. I design ETL processes and data warehouse architectures that enable real-time analytics and automated model retraining. 🌐 Web Scraping & Automation I develop custom web scrapers and automation pipelines using BeautifulSoup, Selenium, and Playwright, handling dynamic pages, proxies, and CAPTCHAs. Data can be delivered in your preferred format — CSV, JSON, SQL, or via API. - $70/hr $70 hourly
Matthew D.
- 4.7
- (12 jobs)
New York City, NYApache Spark
ggplot2Data VisualizationPySparkMicrosoft Power BIApache HiveR ShinyApache HadoopSQLTableauMachine LearningPythonDeep LearningRI’m Matt, a U.S.-based Data Scientist and AI Consultant with an M.S. in Data Science from Columbia University’s Fu Foundation School of Engineering and Applied Science. I help clients understand, visualize, and act on their data—translating advanced machine learning and AI concepts into clear business insights. With experience spanning finance, healthcare, and analytics consulting, I specialize in designing solutions that balance technical depth with practical clarity. Clients hire me to communicate complex models simply, advise on strategy, and deliver production-ready systems that executives can trust. My core services include: AI & ML Consulting: Business problem scoping, model design Machine Learning Engineering: Predictive modeling, feature pipelines, optimization, and deployment Natural Language Processing: Text classification, sentiment analysis, topic modeling, summarization, and retrieval Data Visualization & Storytelling: Dashboards and reports for stakeholders (Plotly, Dash, Streamlit, Power BI, ggplot2) Client Communication: Presenting findings, running client meetings, and translating technical work for non-technical teams My technical skills include: Languages: Python, R, SQL, NoSQL (MongoDB) Frameworks: scikit-learn, PyTorch, TensorFlow, spaCy, Hugging Face, BERTopic Visualization: Plotly, Dash, Streamlit, ggplot2, Power BI MLOps & Cloud: AWS (SageMaker, S3, Lambda), MLflow, Prefect, Docker, Git Databases: PostgreSQL, Hive, MS SQL, MongoDB Selected Experience: Deutsche Bank – Anti-Financial Crime Modeling Developed anomaly-detection models that improved fraud detection precision while maintaining interpretability. Epic Systems – Healthcare Analytics Built readmission risk and quality-metric models using claims and registry data. Political Data Dashboards Created interactive demographic and voter-trend dashboards used by advocacy and policy groups. Financial Forecasting Modeled stock-market and economic indicator trends with advanced time-series and sentiment features. NLP Summarization Deployed transformer-based summarizers for long-form financial reports and research analysis. Communication & Delivery Clients value my ability to bridge the technical and strategic. I routinely: Lead and participate in client meetings to align business goals with technical design Present data findings in clear, jargon-free language to executives and stakeholders Provide written reports, annotated notebooks, and reproducible deliverables Manage timelines, expectations, and transparency from start to finish Approach Every engagement starts with one question: “What decision needs to be made?” I design data workflows and AI systems that make those decisions faster, more accurate, and more explainable. Each project ends with clean, interpretable, and documented outputs—ready for production or presentation. - $55/hr $55 hourly
Adnan A.
- 5.0
- (12 jobs)
Ely, ENGLANDApache Spark
Artificial IntelligenceStatistical AnalysisMicrosoft AzureData Science ConsultationPython Scikit-LearnData SciencePythonDatabricks PlatformApache Spark MLlibAzure Machine LearningMachine LearningDeep Learning- Rich Academic Pedigree: PhD in Data Science and Machine Learning from the University of Surrey, UK, complemented by a Postdoctoral Research in Artificial Intelligence at King's College London. - Decade-Long Experience in AI: Boasting over 10 years of hands-on experience, particularly in machine learning, statistical data analysis, and AI applications spanning intelligent transportation systems, smart energy management, online learning analytics, and public healthcare. - Expert in MLOps and Generative AI: Demonstrated excellence in deploying machine learning models with MLOps principles, and leveraging generative AI techniques, notably with GPT-4, for AI-driven tools and conversational solutions. - Strategic Leadership Role: Currently spearheading as the Head of Data Science & Innovation at The Open University, driving innovation and setting benchmarks in AI-driven strategies and executions. - Renowned Scholar: Proven track record in research with significant publications in high-impact journals; recognized for extracting valuable insights from big data in both academic and commercial settings. - Collaborative Spirit: A history of thriving in interdisciplinary teams for various research and commercial projects, ensuring optimal outcomes and impactful innovations. - Cloud Computing Aficionado: Strong proponent of cloud-based solutions, boasting hands-on proficiency with platforms like Microsoft Azure and Google Cloud Platform. - Python & Pyspark Maestro: A decade of mastery over Python and Pyspark, underlining a robust technical foundation. - $50/hr $50 hourly
Ahmed E.
- 5.0
- (8 jobs)
Milton, ONApache Spark
Microsoft Power BIStatisticsData VisualizationArabicBig DataForecastingWeb ScrapingData AnalysisMicrosoft ExcelMechanical EngineeringStatistical AnalysisSQLMachine LearningPythonI help businesses automate complex operations and make smarter decisions through advanced data solutions and AI-powered systems. With a Mechanical/Mechatronics Engineering background and proven experience across manufacturing and supply chains, I've delivered results for companies from startups to Fortune 500 enterprises across four continents. -What I Deliver: *Intelligent Automation: Transform manual processes into scalable systems that eliminate bottlenecks and reduce operational costs *Predictive Analytics: Build forecasting models and real-time dashboards that guide strategic decisions *Legacy Modernization: Convert outdated systems into robust, cloud-based solutions *Enterprise Data Solutions: Process massive datasets and create actionable insights for executive teams -Technical Capabilities: Advanced Python/SQL development, machine learning implementation, cloud platforms (AWS), and enterprise dashboard creation. Let's discuss how I can streamline your operations and accelerate growth. - $60/hr $60 hourly
Arthur M.
- 5.0
- (28 jobs)
Swindon, ENGLANDApache Spark
Data ManagementDatabase DesignGraphQLNeo4jScalaGolangPostgreSQLData ScrapingMySQLETL PipelinePythonSQLSkilled Data Engineer and Analyst with experience on multiple programming languages (Python,SQL,Golang,Scala) and multiple platforms. I specialise in building data pipelines that take data from source to destination and can process data as needed in the pipeline. Currently focused on using Talend & Pentaho Data Integeration (PDI) as tools of choice but equally comfortable creating bespoke ETL solutions using other software or writing data processing scripts using bash,python etc. Proficient in databases and setting up Data Warehousing solutions. Finally, long experience building Data Dashboards using R Shiny, Pentaho Server and PowerBI. - $40/hr $40 hourly
Huzefa K.
- 4.9
- (51 jobs)
Islamabad, ISLĀMĀBĀDApache Spark
API IntegrationAmazon AthenaData ModelingAWS LambdaAmazon Web ServicesETL PipelineAmazon RedshiftETLData IngestionPySparkAWS GluePythonApache KafkaSQLSeasoned Senior Data Engineer with 10 years' expertise crafting and implementing sophisticated data enrichment solutions. Proficient in developing and architecting robust data systems within production environments, utilizing an array of data engineering tools such as Python, SQL, Pyspark, Scala, and more. Specialized in constructing top-tier ETL Pipelines leveraging airflow, AWS Glue, and Apache Spark for seamless data processing. Proficiency in building and managing CI/CD pipelines, automating deployment workflows, and ensuring seamless integration and delivery of data engineering solutions. Extensive proficiency in leveraging cloud-based technologies within the AWS ecosystem—expertise spans S3, Glue, EMR, Athena, Redshift, Lambda functions, and RDS. Proficiently design and extract data from diverse sources, optimizing it for Data Scientists' use in constructing machine learning models to predict various customer-centric scenarios. Adept at remote work environments, delivering consistent excellence in collecting, analyzing, and interpreting extensive datasets. Skilled in data pipeline development using Spark, managing data across DWH, Data Marts, and Data Cubes within SQL, NO-SQL, and Hadoop-based systems. Proficient in building Python scrapers via Scrapy and Beautiful Soup to streamline data acquisition processes. Extensive freelance experience has broadened my expertise, enabling me to collaborate with diverse clients on challenging data engineering projects. This exposure has strengthened my capabilities and equipped me to tackle any forthcoming challenges as a seasoned data engineer. Want to browse more talent?
Sign up
Join the world’s work marketplace

Post a job to interview and hire great talent.
Hire Talent
Find work you love with like-minded clients.
Find WorkSoftware Architecture
Java
Golang
Back-End Development
Python
PHP
Web Development
Web Design
Vue.js
User Experience Design
Stripe
SquareSpace
SQL
Software QA Testing
Software Engineering
Software Development