Hire the Best Pyspark Developers
in the United States

More than 3,000 reviews on G2
Rating is 4.5 out of 5.
4.5/5
of Upwork by G2 peer reviewers
Tochukwu I.

Columbia, Missouri

$35/hr
5.0
3 jobs

About Me: I’m an AWS-certified Machine Learning Engineer with 4+ years of hands-on experience building and deploying production-grade AI and ML systems. I specialize in Large Language Model (LLM) development, Retrieval-Augmented Generation (RAG) pipelines, and Natural Language Processing (NLP) applications such as classification, summarization, and sentiment analysis. With deep expertise in AWS SageMaker and Bedrock, I help organizations move generative AI projects from proof-of-concept to production, ensuring scalability, compliance, and performance. My focus is on building reliable, maintainable AI solutions that integrate seamlessly into existing business workflows. Currently pursuing my MSc in Computer Science (AI/ML) at Georgia Tech, I combine strong theoretical foundations with proven practical skills to deliver solutions that truly drive business outcomes. Services I Offer: 1. Fine-tuning Large Language Models (LLMs) for domain-specific applications 2. Building end-to-end RAG (Retrieval-Augmented Generation) systems with vector databases (Pinecone, Bedrock KB, Opensearch ) 3. Developing AI Agents for intelligent automation and knowledge retrieval 4. Designing and deploying NLP solutions – classification, summarization, sentiment analysis, entity extraction 5. Prompt engineering and model optimization for improved performance and accuracy 6. Production ML Systems – CI/CD pipeline automation, model versioning, and continuous monitoring using AWS SageMaker and related services Skills: Programming: Python, PySpark, JavaScript, FastAPI AI/ML: LLM Development (Hugging Face, LangChain, Bedrock), NLP (spaCy, Transformers) Frameworks/Libraries: Scikit-learn, Pandas, NumPy, TensorFlow, PyTorch Cloud: AWS (SageMaker, Bedrock, Lambda, Glue, API Gateway, CloudWatch, S3) Model Lifecycle: MLflow, CI/CD for ML, Model Monitoring Databases: SQL, Pinecone (Vector DB), DynamoDB AI Optimization: Prompt Engineering, Model Fine-Tuning, Quantization Visualization: Tableau, AWS QuickSight Education: 🎓 MSc in Computer Science (AI/ML) – Georgia Institute of Technology, Atlanta, United States. 🎓 MSc in Cloud Computing (Software Engineering) – Munster Technology University, Cork, Ireland 🎓 BEng in Computer Engineering – Federal University of Technology, Owerri, Nigeria. Certifications: ✅ AWS Certified Solutions Architect – Associate ✅ AWS Certified Machine Learning – Specialty ✅ AWS Certified GenAI Developer – Professional (In progress) Industry Experience: AWS Data/ML Engineer @ HiveTekCorp (Dec 2025 - till present) MLOps Engineer @ Veterans United (Sept 2022 - Sept 2023) Data Scientist @ Veterans United (Dec 2020 - Sept 2022)

  • PySpark
  • Python
  • Machine Learning
  • MLflow
  • CI/CD
  • SQL
  • PyTorch
  • Data Analysis
  • Dashboard
  • Amazon SageMaker
  • Amazon Bedrock
  • LangChain
Mel L.

Los Angeles, California

$60/hr
5.0
15 jobs

My Specialties: - PySpark development/optimization - Airflow DAG architecture - AWS EMR pipelines - Parquet / S3 performance - Spark shuffle optimization - Python developement I am a Python Data Engineer and my services primarily include: 1/ The development, design and evaluation of ETL/ELT pipeline (this is the core my expertise) 2/ Python development . and 3/ My Data engineering services would include expertise with : - Airflow - SPark/pySpark - EMR and setup of Spark EMR clusters - Python coding - S3 Buckets (creation, Update, OCR upload, ...) - dynamoDB - postgresSQL I do also offer: - Unix Shell scripting (for which I am guru off) - ETL Workflow design (for which I am an expert of) - genAi with the use of Milvus or pgvector My development work comes fully QA tested and ready to be promoted in production. On demand I will include all Unit-test Pytest functions. My work is guaranteed, you are 100% satisfied or it is 100% free! My work also includes project management as I will divide my tasks accordingly and can segment my project in user stories if that can help your organization.

  • Python
  • FastAPI
  • Bash Programming
  • ETL Pipeline
  • Vector Database
Tomasz D.

Westmont, Illinois

$45/hr
4.8
81 jobs

I am a certified AWS Developer with over 10 years of working experience with AWS Cloud, who specializes in server-less architectures such as ETL processes, serverless APIs, serverless websites. ETL processes: - various compute environments: from EMR, via Glue, Batch, to Lambda with focus to be config driven and as serverless as possible. - custom orchestration systems with Step Functions, Scheduled Events, On-demand, queueing and post processing - Python, PySpark, NodeJS (with Typescript) APIs: - as serverless as possible to reduce cost and development time, with ability to deploy whole API service as single Lambda function (NestJS, FastifyJS, ExpresJS, FastAPI, Flask) - Python, NodeJS (with Typescript) Serverless websites: - as serverless as possible for reports about system status, small intake forms, management tools, etc. - build whole websites with static generated content, server site rendering (NuxtJS), - Python, NodeJS (with Typescript) I prefer to use Serverless Framework as IaaC (infrastructure as a code) for deployment with a connection with GitHub Actions as a pipeline. Also, I have experience with and I am open to use other deployment tools such as: AWS CDK, SAM, Terraform, etc. I also have experience designing DynamoDB tables (one table design principle) and connecting them with Lambdas as data producers, and creating DynamoDB Streams with Lambdas as event processors (which could've been useful in your project). Let's talk about your project needs and use my 10 years of working experience with AWS. Tomasz

  • PySpark
  • Amazon Web Services
  • AWS Lambda
  • Amazon DynamoDB
  • AWS Glue
  • Node.js
  • TypeScript
  • Python
  • Vue.js
  • CI/CD
  • Serverless Stack
  • Terraform
  • SQL
  • ETL Pipeline
  • PHP
Uwais K.

Jersey City, New Jersey

$30/hr
5.0
2 jobs

Updated on 3rd june 2026 I build AI systems that act as a "force multiplier" for executives and their teams. Case Study: Built an enterprise AI analytics and reporting platform for a multi-company organization with 1,000+ users, integrating Salesforce, Slack, Asana, Google Drive, and Google Sheets into a unified AI ecosystem. Leveraged MCP, RAG, Qwen, LangChain, Pinecone, and Python-based knowledge ingestion pipelines to enable enterprise search, automated reporting, cross-platform data retrieval, and AI-driven decision support, significantly improving operational efficiency and reducing manual workflows. Most AI implementations fail because they are disconnected from the actual data employees use every day. I specialize in building Agentic Workflows on Azure that bridge the gap between your raw data (Slack, Email, Meetings) and actionable business intelligence. My Signature Achievement: 1. The Virtual COO (Executive Intelligence): Developed an AI "Chief of Staff" that consolidates Slack, Gmail, and meeting transcripts into a single decision-engine. Built with Azure OpenAI and n8n, it identifies bottlenecks and prioritizes tasks so nothing drops through the cracks. 2. AI Clinical Assistant (Scaled Healthcare): Built a RAG-based diagnostic support tool currently used by 300,000+ practitioners in Europe. It leverages LangChain and Pinecone to analyze patient patterns in real-time, surfacing evidence-based treatment suggestions in Portuguese while staying strictly compliant with local privacy laws. 3. Private Vision Pipeline (Secure Taxation): A client needed to extract data from handwritten tax documents but couldn't use OpenAI/Claude due to privacy. I deployed a 2B parameter Vision Language Model on private servers. The Result: 98% accuracy on handwritten text and a 14,000% increase in processing speed (from 480 docs/day manually to processing a doc every 5 seconds). What I can do for you: Azure & Private AI: Whether you need Azure AI Foundry or a fully private, locally-hosted LLM to keep your data secure, I’ve built both. Agentic Automation: I don't just "connect APIs." I build agents that think, categorize, and act—using n8n, Python, and custom hooks. Production-First Data: As a Microsoft Certified Data Engineer, I ensure your data is clean before the AI touches it. No "garbage in, garbage out." The Tech Stack: Azure (OpenAI/Fabric) • LangChain • n8n • Python • Pinecone • SQL • dbt • Vision-Language Models (VLM) Why $30/hr? I am a veteran engineer, but I am new to the Upwork ecosystem. My current rate reflects my goal to build a 5-star reputation on this platform quickly. You are getting enterprise-grade architecture at a "reputation-building" price. Let’s hop on a call to talk about your workflow. I’ll tell you exactly what’s possible, and more importantly, what’s a waste of your money. Uwais

  • PySpark
  • Data Engineering
  • Python
  • Databricks MLflow
  • AI Agent Development
  • AI Development
  • ETL Pipeline
  • Fabric
  • SQL
  • Business Intelligence Software
  • Apache Kafka
  • AWS CloudFormation
  • Azure DevOps
  • LangChain
  • Retrieval Augmented Generation
Russell V.

Port St. Lucie, Florida

$68/hr
5.0
1 jobs

I'm a Data engineer and Back-end developer. Technical skills -- Data Engineering • Expertise in Python • SQL • ETL/ELT with Python, Databricks (Pyspark), DBT, Dagster, Airbyte and a lot of AWS services. • SQL master • Python Google Style Guide. • Microsoft Power BI / Tableau for Data Analysis • Agile, Extreme Programming (XP) & Clean Code (and Google Python Style Guide) -- Backend Developer • Django / Fast API • Node.js • PostgreSQL -- Cloud/DevOps • AWS: Batch, Step Functions, Glue, Athena, Boto3, Lambda, S3, EC2, IAM, KMS, SQS, etc. • Bash. Docker + ECS. CI/CD - GitHub actions. Terraform, SAM, Code Pipeline. If your project requires my skills, call me anytime.

  • PySpark
  • Python
  • Data Engineering
  • ETL
  • SQL
  • pandas
  • Django
  • FastAPI
  • BigQuery
  • Data Analysis
Chun-Min J.

San Jose, California

$30/hr
5.0
9 jobs

I refactor your brittle, high-latency data workflows and volatile schema mutations to engineer clean, secure, production-grade GenAI platforms that eliminate operational lag and guarantee absolute output accuracy. - Slashed LLM hallucinations by 40% and accelerated processing speeds 10x by overhauling legacy monolithic parsing with a layout-aware multi-agent architecture and semantic chunking. - Permanently eliminated a cluster of 75+ daily data defects and reduced pipeline lag by 12 hours across 500,000+ daily records by deploying serverless, SOLID-principled PySpark ETL workflows. - Accelerated executive decision cycles by two weeks and cut underlying API infrastructure costs by 25% by building real-time, metadata-grounded Streamlit and centralized BI ecosystems. Ready to transition your AI infrastructure from an unpredictable, costly prototype into a high-fidelity, secure production engine? Let's connect to map out your architecture.

  • PySpark
  • Machine Learning
  • TensorFlow
  • pandas
  • Python Scikit-Learn
  • Deep Learning
  • AWS Development
  • Data Analytics & Visualization Software
  • Keras
  • Docker
  • AI Agent Development
  • Google Cloud Platform
  • Natural Language Processing
  • Time Series Forecasting
  • PostgreSQL Programming
  • dbt
  • Data Science Consultation
  • Statistical Programming
  • STEM Tutoring
  • MATLAB

How it works

Post a job for free Post a job

Tell us what you need. Create your own job post or generate one with AI then filter talent matches.

Hire top talent fast

Consult, interview, and hire quickly, so you can meet the freelancers you're excited about.

Collaborate easily

Use Upwork to chat or video call, share files, and track project progress right from the app.

Payment simplified

Manage payments in one place with flexible billing options. Only pay for approved work, hourly or by milestone.

Don't just take our word for it

How do I hire a Pyspark Developer in the United States on Upwork?

You can hire a Pyspark Developer in the United States on Upwork in four simple steps:

  • Create a job post tailored to your Pyspark Developer project scope. We'll walk you through the process step by step.
  • Browse top Pyspark Developer talent on Upwork and invite them to your project.
  • Once the proposals start flowing in, create a shortlist of top Pyspark Developer profiles and interview.
  • Hire the right Pyspark Developer for your project from Upwork, the world's largest work marketplace.

At Upwork, we believe talent staffing should be easy.

How much does it cost to hire a Pyspark Developer?

Rates charged by Pyspark Developers on Upwork can vary with a number of factors including experience, location, and market conditions. See hourly rates for in-demand skills on Upwork.

Why hire a Pyspark Developer in the United States on Upwork?

As the world's work marketplace, we connect highly-skilled freelance Pyspark Developers and businesses and help them build trusted, long-term relationships so they can achieve more together. Let us help you build the dream Pyspark Developer team you need to succeed.

Can I hire a Pyspark Developer in the United States within 24 hours on Upwork?

Depending on availability and the quality of your job post, it's entirely possible to sign up for Upwork and receive Pyspark Developer proposals within 24 hours of posting a job description.