Hire the best Pyspark developers

Check out Pyspark developers with the skills you need for your next job.
Clients rate Pyspark developers
Rating is 4.7 out of 5.
4.7/5
based on 169 client reviews
  • $30 hourly
    Seasoned data engineer with over 11 years of experience in building sophisticated and reliable ETL applications using Big Data and cloud stacks (Azure and AWS). TOP RATED PLUS . Collaborated with over 20 clients, accumulating more than 2000 hours on Upwork. 🏆 Expert in creating robust, scalable and cost-effective solutions using Big Data technologies for past 9 years. 🏆 The main areas of expertise are: 📍 Big data - Apache Spark, Spark Streaming, Hadoop, Kafka, Kafka Streams, HDFS, Hive, Solr, Airflow, Sqoop, NiFi, Flink 📍 AWS Cloud Services - AWS S3, AWS EC2, AWS Glue, AWS RedShift, AWS SQS, AWS RDS, AWS EMR 📍 Azure Cloud Services - Azure Data Factory, Azure Databricks, Azure HDInsights, Azure SQL 📍 Google Cloud Services - GCP DataProc 📍 Search Engine - Apache Solr 📍 NoSQL - HBase, Cassandra, MongoDB 📍 Platform - Data Warehousing, Data lake 📍 Visualization - Power BI 📍 Distributions - Cloudera 📍 DevOps - Jenkins 📍 Accelerators - Data Quality, Data Curation, Data Catalog
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    SQL
    AWS Glue
    PySpark
    Apache Cassandra
    ETL Pipeline
    Apache Hive
    Apache NiFi
    Apache Kafka
    Big Data
    Apache Hadoop
    Scala
    Apache Spark
  • $75 hourly
    Tool-oriented data science professional with extensive experience supporting multiple clients in Hadoop and Kubernetes environments, deployed with Cloudera Hadoop on-premise and Databricks in AWS. My passion is client adoption and success, with a focus on usability. With my computer science and applied math background, I have been able to fill the gap between platform engineers and users, continuously pushing for product enhancements. As a result, I have continued to create innovative solutions for clients in an environment where use-cases continue to evolve every day. I find fulfillment in being able to drive the direction of a solution in a way that allows both client and support teams to have open lanes of communication, creating success and growth. I enjoy working in a diverse environment that pushes me to learn new things. I'm interested in working on emerging solutions as data science continues to evolve.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    R
    Serverless Stack
    React
    Apache Hadoop
    Java
    Cloudera
    AWS Lambda
    Apache Impala
    R Hadoop
    Bash Programming
    PostgreSQL
    Apache Spark
    Python
    AWS Development
    Apache Hive
  • $40 hourly
    ✔️Experienced Data Engineer, specializing in Data ETL, Machine Learning pipelines, building managed/serverless solutions on AWS, and AWS Cloud Architecture. I worked with high profile oragnizations, and customers in my career, including the following: ✔️ A top 5 organization in the automotive industry (Fortune 500) ✔️ A top 3 organization in the railway public sector (Fortune 500) ✔️ An organization in the top 20 Brands in Jewlery (2.7B Euros revenue per year) Main Competencies: ✔️Data Pipeline development ✔️Building Dashboards ✔️Cloud Architecture ✔️DevOps knowhow ✔️Stakeholder management ✔️Requirements Analysis ✔️Troubleshooting ✔️Knowledge transfer Main Technologies: ✔️Python, Jupyter ✔️SQL ✔️PySpark ✔️AWS S3, EMR Serverless, Glue, Athena, Redshift, SNS, EKS, EC2, VPC, CloudFormation ✔️Apache AirFlow, Kafka, NiFi ✔️Docker, Kubernetes Why work together? ✔️Clear understanding, and breakdown of requested services ✔️Correct, and timely delivery ✔️Responsiveness Please reach out to me, so that we can discuss how to address your business needs.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Amazon S3
    Amazon Athena
    Jira
    Jupyter Notebook
    PySpark
    AWS Glue
    AWS Lambda
    Data Integration
    ETL Pipeline
    JSON
    Data Extraction
    Amazon SageMaker
    Amazon Web Services
    Python
    Google Cloud Platform
  • $100 hourly
    I have over 4 years of experience in Data Engineering (especially using Spark and pySpark to gain value from massive amounts of data). I worked with analysts and data scientists by conducting workshops on working in Hadoop/Spark and resolving their issues with big data ecosystem. I also have experience on Hadoop maintenace and building ETL, especially between Hadoop and Kafka. You can find my profile on stackoverflow (link in Portfolio section) - I help mostly in spark and pyspark tagged questions.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    MongoDB
    Data Warehousing
    Data Scraping
    ETL
    Data Visualization
    PySpark
    Python
    Data Migration
    Apache Airflow
    Apache Spark
    Apache Kafka
    Apache Hadoop
  • $20 hourly
    I am a Data Engineering and Data Science professional with 3+ years of experience. I have Masters of Science in Data Analytics and Bachelor of Engineering in Computer Science. In the past I have worked as a SME on multiple projects in Analytics domain. I have successfully delivered projects where I was responsible to build Data pipelines, perform Data wrangling, Data analysis using ML algorithms, build Dynamic dashboards, etc. I will perform End to End analysis from ETL, to analysis and reporting. I will gather every ounce of information from your data and back the insights generated scientifically with Statistical tests and ML Algorithms. I have experience of working with AWS services such as AWS Lambda, AWS ECR, AWS S3, AWS Step Function, AWS EC2, AWS Batch, AWS Fargate, AWS EFS, AWS Glue, AWS EMR, AWS IAM, AWS RDS/Aurora, AWS Secrets Manager, AWS DynamoDB, AWS Redshift. Technically I am sound in Python, SQL, Machine Learning, PySpark, AWS, Statistical tests, Power BI and Tableau. I have experience working in tech startups and MN-Cs alike. I am no stranger to hard work and pride in having a sincere attitude and learning on the fly as needed to get the job done. I offer: 1 - 100% satisfaction 2 - Multiple Revisions till satisfaction 3 - 24/7 support 4- followup post-delivery
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    AWS Glue
    ETL
    Amazon Web Services
    Data Mining
    PySpark
    AWS Lambda
    Machine Learning
    Python
    Apache Spark
    Data Analysis
    SQL
  • $30 hourly
    Over 4 and 1/2 years of working experience in data engineering, ETL, AWS, and python. AWS data analytics and machine learning certified.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Docker
    Terraform
    Apache Airflow
    Amazon ECS
    AWS Lambda
    Amazon Redshift
    Amazon S3
    Amazon Web Services
    Analytics
    PostgreSQL
    PySpark
    SQL
    pandas
    AWS Glue
    Python
  • $15 hourly
    --Cloud Big Data Engineer I am Azure certified data engineer with professional experience of DataBricks,DataFactory,StreamAnalytics,EventHubs,Datalake store. I have developed API driven and DataFactory orchestration , developed Databricks jobs orchestration, cluster creation and job management through DataBricks REST API. I have successfully developed around 3 full scale enterprises solution on Microsoft cloud(DataBricks,Datafactory,stream analytics, Datalake store,Blob storage) . I have developed DataBricks orchestration and cluster management mechanism in .NET c#, Java, Python. Hopefully I will serve you in better way due to my experience and knowledge. Following are BigData and cloud tools in which I have experties. -Apache Spark -scala -python -kafka -Datafactory -stream analytics -Eventhubs -spark streaming -Azure DataLake store -Azure Blob storage -parqute files -Snowflake MPP -Databricks -.NET C# --Webscraping Data mining I have professionalHDFS experience in Datamining , webscraping with selenium python. I have professional experience of scraping on many e-com sites like Amazon, Ali express, Ebay, Walmar and of social sites like Facebook, Twitter,linkdin and many other sites. I will provide required scraped data and script as well as support. Hopefully I will serve you in better way due to my relevant professional experience and knowledge .
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Google Cloud Platform
    Apache Airflow
    Apache Spark
    Data Management
    Microsoft Azure
    Snowflake
    Big Data
    Selenium
    Data Scraping
    Python
  • $40 hourly
    I am highly dedicated to achieving professional excellence, career progression and personal development by working in a learning environment that encourages growth and enriches experiences. Key data integration capabilities: * Access, cleanse, transform and blend data from any data source * Create complex datasets with no coding required to drive downstream processes (Talend) * Improve ad-hoc analysis requests to answer business questions faster Specialties: Big Data Ecosystems: HDFS, HBase, Hive and Talend Databases – Oracle, Postgres, MySql Data Warehouse concepts ETL-Tools – Talend Data Integration Suite / Talend Open Studio / Talend ESB API Integration : Salesforce/Zoho/ Google freebase/Google Adwords/Google Analytics/Marketo Programming – Java, SQL, HTML, Unix Reporting Tools - Yellowfin BI, Tableau, Power BI, SAP BO I am an expert in creating an ETL data flow in Talend Studio using best design patterns and practices to Integrate data from multiple data sources and I have a good understanding of the Java programming language to build Talend routines to extend its inbuilt functionality. I will be happy to show you a demo of the existing Talend jobs or you can share a sample requirement to have confidence on my services.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Talend Data Integration
    Data Warehousing
    Data Visualization
    SQL
    ETL
    Data Migration
  • $100 hourly
    Senior technologist (18+ Years) with strong business acumen and technical experience in Big Data and cloud. A results-oriented, decisive leader in Big Data and Cloud space that combines an entrepreneurial spirit with corporate-refined execution in tech strategy • Architect- Big Data and Cloud (AWS, Azure, Google Cloud) with 18+ years of professional experience in Analysis, Design, and Development of Enterprise grade Applications. • Databricks Certified Developer – Apache Spark 2.x 2019 • AWS Solution Architect Certified 2018 • AWS Big Data Specialty Certified 2019 • Good Experience in AWS services (EC2, EMR, S3, RDS, Athena, Glue, CloudTrail, Redshift) • Deep expertise in Spark and Hadoop ecosystem • Experience in setting up the Enterprise Data Lake (cloud and on-premises) and deploying Big Data solutions on servers ranging from 50 to 200. • Proven ability to learn quickly and apply new technologies with an innovative approach. • Previous experience in architecture-design, database-design, and performance management. I am an AWS solution architect Associate and AWS Big Data specialty certified. I also have prior experience working on Java and Python. I am hands-on with pyspark and Sparksql. I am interested in getting work on the Big Data and Cloud Solution designing and implementation. Since i have been working a lot on the Healthcare and telecom domain, Security considerations in cloud is one of my expertise area. I have good english conversational skills. My role requires a lot of interactions with the client and this is one of my strong areas.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Microsoft Azure
    AWS Lambda
    Google Cloud Platform
    AWS Application
    AWS Fargate
    Apache Spark
    YARN
    Big Data
    Amazon ECS
    Machine Learning
    Apache Hadoop
  • $15 hourly
    👋 Greetings, Thank you sincerely for considering my profile. I am a passionate and proactive professional with diverse expertise as both a Microsoft Certified Data Scientist and a Full-stack developer, specializing in crafting engaging user experiences and delivering high-quality software solutions that drive business success. **Data Science Expertise:** With over four years of industry experience, I specialize in GenAI, data analytics, and natural language processing across diverse sectors such as Telecom, Transportation, and Customer Success. My skills include: ✔️ GenAI Technologies proficiency (LLMs, RAG, Prompt Engineering, LangChain, Agents) ✔️ Data Analytics & Visualization ✔️ Predictive Analytics ✔️ Text Analytics ✔️ Time Series Analysis ✔️ Big Data Analytics (Spark) **Full-stack Development Expertise:** As a Full-stack developer, I specialize in crafting visually stunning user interfaces and seamless navigation using modern frameworks like Next.js and React. My technical proficiency includes: ✔️Frontend: Next.js, React, Angular ✔️Backend: Node.js, Express, Nest.js, FastAPI, Django ✔️Databases: PostgreSQL, MongoDB, MySQL ✔️Cloud Services: Azure Cloud, AWS, Firebase ✔️APIs: GraphQL, REST, WebSockets ✔️DevOps: Docker, Kubernetes ✔️Version Control: Git **Offers Provided:** From a Data Science perspective, I offer: ✔️Production-ready AI-powered applications. ✔️Fully automated Power BI dashboards with relevant charts and KPIs. ✔️Fully functional industry-ready AI chatbots for business. ✔️Data processing including Data Cleaning, Transformation, and Integration. ✔️Azure Databricks, Azure OpenAI, Azure Document Intelligence, Adls Gen2, Azure authentication, Azure Data Explorer (ADX), and Azure Machine Learning solutions. ✔️Task automation using Python. ✔️Univariate and Multivariate Forecasting for Demand, Inventory, Sales, etc. ✔️Technical support in Python, JavaScript, SQL, KQL, NLP, and various libraries and frameworks. From a Full-stack Development perspective, I offer: ✔️Crafting visually stunning user interfaces with modern frameworks. ✔️Seamless navigation and captivating user experiences. ✔️DevOps practices for seamless deployment and continuous integration. ✔️Utilization of cloud service providers like Azure and AWS. ✔️Smooth collaboration and efficient code management within development teams using Git. I am committed to delivering high-quality results on challenging projects within tight timelines, prioritizing accessibility, collaboration, and effective communication. It is my passion and responsibility to execute tasks in the most efficient manner possible, earning your satisfaction and trust. I am open to flexibility and negotiation regarding budget and costs. Looking forward to the opportunity to collaborate with you. Warm regards, Mishab NB: Examples of my work are available in the portfolio section.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Data Analysis
    Data Processing
    Data Visualization
    LangChain
    Generative AI
    PySpark
    Kusto Query Language
    Microsoft Power BI
    Chatbot
    Time Series Analysis
    Databricks Platform
    Machine Learning
    Natural Language Processing
    Python
  • $55 hourly
    I have more than seven years of hands-on experience in data engineering. My specialities are building data platforms and data pipelines with different sources. I'm keen to work on an end-to-end data pipeline building on AWS or GCP. I can fix your time and resource-killing data pipeline issues. Share your gig. Feel the difference. Also I have an expertise in: - Full Stack Web Application Development / Database Design / API development & Integration - DevOps/Linux Server Administration/Deployment/Migrations/Hosting - Web Automation / Scraping / Crawlers / Bots
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    PySpark
    API
    AWS Lambda
    Amazon Web Services
    ETL Pipeline
    Apache Spark
    Python
    Scrapy
    Amazon S3
    Data Mining
    AWS Glue
    Apache Airflow
    DevOps
    Docker
    Data Migration
  • $20 hourly
    Proficient data engineer experienced in big data pipeline development and designing data solutions for retail, healthcare, etc. I've designed and implemented multiple cloud-based data pipelines for companies located in Europe and the USA. I'm Experienced in designing enterprise-level data warehouses, have Good analytical and communication skills, team player, and am hard working. Experiences: - More than 4+ years of experience in data engineering. - Hand-on experience in developing data-driven solutions using cloud technologies. - Designed multiple data warehouses using Snowflake and Star schema. - Requirement gathering and understanding business needs, to propose solutions. Certified: - Databricks Data Engineer Certified. - Microsoft Azure Associate Data Engineer. Tools and tech: - Pyspark - DBT - Airflow - Azure Cloud - python - Data factory - Snowflake - Databricks - C# - Aws - Docker - CI/CD - Restful API Development
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    AWS Lambda
    PySpark
    Microsoft Azure
    Databricks MLflow
    dbt
    Snowflake
    API Development
    Data Lake
    ETL
    Databricks Platform
    Python
    Apache Airflow
    Apache Spark
  • $40 hourly
    I am a passionate person. And I am most passionate about solving problems with Data. Being a Data Scientist with the industrial experience of 4 years, I am equipped with the machine learning knowledge to make a world a better place with Data Science. In my professional career, I have worked both as a freelance Data Scientist and a full-time employee. I have worked in the IoT industry for clients in Pakistan and the Middle East. I also have experience working in the Transport Industry, providing solutions using text analytics and NLP. My current industry is retail and I am working for a Danish Retail and beauty company MATAS as a Data Scientist. I am responsible for all stages of the Data science process, from Business understanding to model deployment. Skillsets:- - Understanding of the business problem and where Data Science can create value. - Ability to research the academia and Industry for modern solutions. - Ability to explain Data Science to non-technical business stakeholders. - Key areas, where I consider my self well versed are Recommendations Systems, Multi-Armed Bandits, Send Time Optimization, Demand Forecasting, Price Elasticity, Word2vec, and sentence embeddings, and pretty much all the machine learning algorithms. - Well versed in Big data frameworks such as Spark, with the hands-on experience on PySpark Dataframes and the Databricks platform. - Building Data integration pipelines and collaborating with Data Engineers to support the ETL. - Designing the Power BI dashboards to present the insights to the stakeholder. - Developing the DevOps pipeline for model deployment using Docker, Kubernetes. - Maintaining motivation and enthusiasm within the team when the model accuracy falls.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    ETL Pipeline
    Data Integration
    PySpark
    Data Visualization
    Machine Learning
    Apache Spark MLlib
    Python
    Apache Spark
    R
    Natural Language Processing
    Deep Learning
    Recommendation System
    Databricks Platform
    Computer Vision
  • $45 hourly
    As a highly experienced Data Engineer with over 10+ years of expertise in the field, I have built a strong foundation in designing and implementing scalable, reliable, and efficient data solutions for a wide range of clients. I specialize in developing complex data architectures that leverage the latest technologies, including AWS, Azure, Spark, GCP, SQL, Python, and other big data stacks. My extensive experience includes designing and implementing large-scale data warehouses, data lakes, and ETL pipelines, as well as data processing systems that process and transform data in real-time. I am also well-versed in distributed computing and data modeling, having worked extensively with Hadoop, Spark, and NoSQL databases. As a team leader, I have successfully managed and mentored cross-functional teams of data engineers, data scientists, and data analysts, providing guidance and support to ensure the delivery of high-quality data-driven solutions that meet business objectives. If you are looking for a highly skilled Data Engineer with a proven track record of delivering scalable, reliable, and efficient data solutions, please do not hesitate to contact me. I am confident that I have the skills, experience, and expertise to meet your data needs and exceed your expectations.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Snowflake
    ETL
    PySpark
    MongoDB
    Unix Shell
    Data Migration
    Scala
    Microsoft Azure
    Amazon Web Services
    SQL
    Apache Hadoop
    Cloudera
    Apache Spark
  • $60 hourly
    I am thrilled to introduce myself as an experienced Qlik project manager and BI developer with a track record of successfully delivering more than 50 projects in the last 10 years. My expertise spans across various industries, including Manufacturing, Retail, Distribution, Finance, and SCM. I have built version control environment with full support CI/CD architecture. Throughout my career, I have developed several BI projects using Qlik and other systems, contributing to the analysis of stocks using the Theory of Constraints and flow analysis, multi-factorial sales analysis, and creating KPI analysis apps for sales teams. Furthermore, I have conducted extensive analyses of incomes and costs, involving marginal analysis of sales, payment discipline of contractors, and cash-flow forecasting. I have also conducted long-term statistical analyses of market trends in FMCG, focusing on sales forecasting for the next 12 months. In addition, I have performed Sales and Financial analysis using data from 23 ERP and Accounting systems. My diverse experience in Qlik project management and BI development equips me to deliver results that exceed expectations in your organization. I look forward to the opportunity to utilize my expertise in delivering exceptional results for your team. Thank you for your time and consideration.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Data Warehousing
    Report Writing
    ETL
    SQL
    QlikView
    Data Analysis
    Data Mining
    Business Analysis
    Business Process Modeling
    Business Intelligence
  • $100 hourly
    Do you need a Data Scientist or a Data Engineer specializing in Big Data? Worked more than 500 hours on Upwork coding Data Science and Big Data Engineering projects for clients like you to perform Data Quality, ETL, Data Ingestion from Data Lakes, and designing Master Data. I am an excellent data scientist/data engineer, and I am committed to you that my performance is top rated. With ten years of experience being an engineer with an entrepreneurial background, my tech stack is Apache Spark, Dask, Python, Python Flask, SQL, and Machine Learning. I am fully capable of taking charge of your data-driven application using Python, Flask, Apache Spark, Dask, and SQL from scratch to production. Data Science and Analytics: * Python (Files & Note Books) * Numpy * Pandas * Sklearn * SQL * NoSQL Machine Learning and Model Prediction * Supervised Learning (Classification and Regression) * Unsupervised Learning * Natural Language Processing (NLP) * Deep Learning with Pytorch * Recommender System Big Data Analysis: * Apache Spark * Dask * SQL * NoSQL Data Engineering * Data Modeling * Cloud Data Warehouse Architecture * OLAP Cubes * ETL Pipeline * Data Lakes with Spark * Data Pipelines with Airflow * Python * Pandas * PySpark * Data Pipelines * Python Flask * Dask * Dedupe, and Normalization of Data Full Stack Development * RESTful API with Python Flask * Data-Driven Web Application Data Science Analytics, such as Descriptive and Inferential Analysis, Predictive Analytics and Machine Learning will be performed on your project. Experience in designing components of your project's ETL and Big Data using Python Pandas Dataframe, Dask, and Apache Spark PySpark. Having a Bachelor's in Civil Engineering, and in March 2019, I graduated in Data Science Nanodegree from Udacity. Currently, I am taking Data Engineer Nanodegree from Udacity. I am Talented, Creative, and Very Hard Working. Ensuring that your project will get completed, and it's in the safe hands of a professional Data Scientist/ Data Engineer who will deliver within your budget and by the given deadline. Whether you belong to a team that needs a Data Scientist to perform a specific task or Data Engineering, you are receiving the best of both worlds—quickly understanding Machine Learning and Data Science. Cleaning and performing analysis on your data. Your projects even being frictionlessly turned into web applications or just maintaining your web application code. I can save your money and time by integrating your Data Science projects while building its production web application. Exclusively quoting you a reasonable estimate after going through all the details and covering all the aspects of the project. Contact Today. Cheers, and talk to you soon! Best Regards, Atif Z FYI: Relentlessly working within the deadline until I have derived accurate and excellent results is my motto. I am very thorough in my work, and I don't cut any corners.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    ETL Pipeline
    Data Modeling
    Data Warehousing
    Data Management
    AWS Glue
    Business Intelligence
    PySpark
    Database Architecture
    Apache Spark
    pandas
    SQL
    Supervised Learning
    Machine Learning
    Data Science Consultation
    Data Science
    PostgreSQL
  • $40 hourly
    Data Engineer with over 5 years of experience in developing Python-based solutions and leveraging Machine Learning algorithms to address complex challenges. I have a strong background in Data Integration, Data Warehousing, Data Modelling, and Data Quality. I excel at implementing and maintaining both batch and streaming Big Data pipelines with automated workflows. My expertise lies in driving data-driven insights, optimizing processes, and delivering value to businesses through a comprehensive understanding of data engineering principles and best practices. KEY SKILLS Python | SQL | PySpark | JavaScript | Google cloud platform (GCP) | Azure | Amazon web services (AWS) | TensorFlow | Keras | ETL | ELT | DBT | BigQuery | BigTable | Redshift | Snowflake | Data warehouse | Data Lake | Data proc | Data Flow | Data Fusion | Data prep | Pubsub | Looker | Data studio | Data factory | Databricks | Auto ML | Vertex AI | Pandas | Big Data | Numpy | Dask | Apache Beam | Apache Airflow | Azure Synapse | Cloud Data Loss Prevention | Machine Learning | Deep learning | Kafka | Scikit Learn | Data visualisation | Tableau | Power BI | Django | Git | GitLab
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Data Engineering
    dbt
    ETL
    Chatbot
    CI/CD
    Kubernetes
    Docker
    Apache Airflow
    Apache Kafka
    PySpark
    Machine Learning
    Exploratory Data Analysis
    Python
    SQL
    BigQuery
  • $40 hourly
    Do you need to expand your data team with someone who can deliver results? Or do you need someone to build data processes and applications for driving your business forward? I have built ml/data-based pipelines and infrastructures across the globe for clients like Audible, BMGF, Amazon Channels, etc. I have a cumulative experience of 6+ years (in Data Engineer, Data Scientist roles) with a strong focus on building data pipelines and warehouses, Big Data Analytics, ETL Processes, machine learning models, visualization, and business intelligence. What I can do? - Process your raw data and present it in a meaningful way - this could be either be output from a machine learning pipeline or BI dashboard or intermediate table for you to work with. - Create new channels to update your existing systems with insights and KPIs derived from BI Projects. - Suggest the best course of action for your data processes moving forward - Suggest new possible revenue sources or cost reductions. I feel comfortable with most modern tech stack but these are the tech I have used in the past for various projects. DATABASES: Postgres, Redshift, MSSQL, Aurora, Redshift, BigQuery, Snowflake, Cassandra, Stored Procedures AI STACK: Keras, NLTK, Sklearn, Pandas, Seaborn, Matplotlib, Numpy, Scipy, Gensim ETL: Airbyte, Glue, Datafactory, Airflow, Dagster, DBT LANGUAGES & LIBRARIES: Python, Java, C++, R, Pyspark, NetworkX BI & ANALYTICS: Plotly, Streamlit, Data Studio, PowerBI, Tableau, Powerquery WEB DEV: Flask, Bootstrap, APIs, Authentication, Auth0, FastAPI CLOUD: AWS, GCP, Azure, EMR, EC2, Sagemaker Studio, Lambda Functions, SNS, IAM, SSO, ECR, RDS, Cloudformation, Databricks MISC: Docker & Custom Images, Kubernetes, docker-spawn, GIT, Bash, Github actions Want to discuss your project? Please drop me a message. Thanks. - Sagar
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    PostgreSQL
    AWS Lambda
    ETL Pipeline
    Docker
    dbt
    PySpark
    Amazon Redshift
    Amazon S3
    AWS Glue
    Snowflake
    Tableau
    BigQuery
    Amazon SageMaker
    Chatbot
    Databricks Platform
  • $35 hourly
    Data Engineer with extensive experience in building large scale Data Warehouse, Data Lake and Data Pipeline with Cloud native approach. In my pervious projects, I have worked on; Hadoop Ecosystem / Big Data Tools: • Apache Spark, Airflow, Cloudera Impala, Hive, Cassandra, Snowflake, AWS Tools: • EC2, S3, EMR, Athena, Secrets Manager, Lambda, Redshift, RDS, Glue Azure Tools: • VM, Blob Storage, ADLS, HDI, Synapse, Databricks Databases: • Oracle PL/SQL, PostgreSQL, MySQL, T-SQL Programming/Scripting: • Java, Python, Scala, Bash
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Apache Airflow
    PySpark
    Data Management
    Apache Spark
    Amazon Web Services
    Cloud Computing
    Big Data
    ETL
    Data Extraction
    ETL Pipeline
    SQL
    Data Scraping
    Python
  • $110 hourly
    Distributed Computing: Apache Spark, Flink, Beam, Hadoop, Dask Cloud Computing: GCP (BigQuery, DataProc, GFS, Dataflow, Pub/Sub), AWS EMR/EC2 Containerization Tools: Docker, Kubernetes Databases: MongoDB, Postgres-XL, PostgreSQL Languages: Java, Python, C/C++
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    MapReduce
    Apache Kafka
    Cloud Computing
    Apache Hadoop
    White Paper Writing
    Academic Writing
    Google Cloud Platform
    Dask
    Apache Spark
    Research Paper Writing
    Apache Flink
    Kubernetes
    Python
    Java
  • $55 hourly
    I focus on data engineering, software engineering, ETL/ELT, SQL reporting, high-volume data flows, and development of robust APIs using Java and Scala. I prioritize three key elements: reliability, efficiency, and simplicity. I hold a Bachelor's degree in Information Systems from Pontifícia Universidade Católica do Rio Grande do Sul as well as graduate degrees in Software Engineering from Infnet/FGV and Data Science (Big Data) from IGTI. In addition to my academic qualifications I have acquired a set of certifications: - Databricks Certified Data Engineer Professional - AWS Certified Solutions Architect – Associate - Databricks Certified Associate Developer for Apache Spark 3.0 - AWS Certified Cloud Practitioner - Databricks Certified Data Engineer Associate - Academy Accreditation - Databricks Lakehouse Fundamentals - Microsoft Certified: Azure Data Engineer Associate - Microsoft Certified: DP-200 Implementing an Azure Data Solution - Microsoft Certified: DP-201 Designing an Azure Data Solution - Microsoft Certified: Azure Data Fundamentals - Microsoft Certified: Azure Fundamentals - Cloudera CCA Spark and Hadoop Developer - Oracle Certified Professional, Java SE 6 Programmer My professional journey has been marked by a deep involvement in the world of Big Data solutions. I've fine-tuned my skills with Apache Spark, Apache Flink, Hadoop, and a range of associated technologies such as HBase, Cassandra, MongoDB, Ignite, MapReduce, Apache Pig, Apache Crunch and RHadoop. Initially, I worked extensively with on-premise environments but over the past five years my focus has shifted predominantly to cloud based platforms. I've dedicated over two years to mastering Azure and I’m currently immersed in AWS. I have a great experience with Linux environments as well as strong knowledge in programming languages like Scala (8+ years) and Java (15+ years). In my earlier career phases, I had experience working with Java web applications and Java EE applications, primarily leveraging the WebLogic application server and databases like SQL Server, MySQL, and Oracle.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Scala
    Apache Solr
    Apache Kafka
    Apache Spark
    Bash Programming
    Elasticsearch
    Java
    Progress Chef
    Apache Flink
    Apache HBase
    Apache Hadoop
    MapReduce
    MongoDB
    Docker
  • $70 hourly
    ✅ AWS Certified Solutions Architect ✅ Google Cloud Certified Professional Data Engineer ✅ SnowPro Core Certified Individual ✅ Upwork Certified Top Rated Professional Plus ✅ The author of Python package for cryptocurrency market Currency.com (python-currencycom) Specializing in Business Intelligence Development, ETL Development, and API Development with Python, Apache Spark, SQL, Airflow, Snowflake, Amazon Redshift, GCP, and AWS. Accomplished lots of complicated and not very projects like: ✪ Highly scalable distributed applications for real-time analytics ✪ Designing data Warehouse and developing ETL Pipelines for multiple mobile apps ✪ Cost optimization for existing cloud infrastructure But the main point: I have a responsibility for the final result.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Data Scraping
    Snowflake
    ETL
    BigQuery
    Amazon Redshift
    Big Data
    Data Engineering
    Cloud Architecture
    Google Cloud Platform
    ETL Pipeline
    Python
    Amazon Web Services
    Apache Airflow
    SQL
    Apache Spark
  • $60 hourly
    I focus on data warehousing, SQL reporting, ETL/ELT, high-volume data flows, API development, and data visualization. My work emphasizes reliability, efficiency, and simplicity. Previous assignments have involved most major programming languages and databases.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Data Analysis
    TensorFlow
    Ecommerce Website Development
    Amazon Web Services
    Node.js
    MongoDB
    Tableau
    Prompt Engineering
    ChatGPT
    Amazon Redshift
    PostgreSQL
    ETL
    API Integration
    SQL
    Python
  • $10 hourly
    Hello! I'm Arslan, a seasoned Full Stack Developer with a 7+ years industry experience. My core strengths lie in Python and Node.js development, coupled with a deep understanding of data engineering principles. I thrive on creating dynamic web applications and implementing robust security measures. **Skills:** - **Web Frameworks:** - Python: Django, Flask, Pyramid - Node.js: Express.js, Nest.js - PHP: Experience in building web applications using PHP. - WordPress: Developing custom themes and plugins for WordPress websites. - **Frontend Technologies:** - React.js, Angular, Vue.js - **Databases and Cloud Architecture:** - SQL-based: MySQL, PostgreSQL - NoSQL: MongoDB, Cassandra, Redis - Cloud Platforms: AWS, Azure, Google Cloud - **DevOps Excellence:** - Jenkins, Travis CI, Kubernetes - Docker, Continuous Integration/Continuous Deployment (CI/CD) - **Web Scraping and Automation:** - Python Libraries: BeautifulSoup, Scrapy - Automation Tools for Development and Deployment - **Python Backend API Mastery:** - Flask: Custom API development with endpoints, authentication, and serialization - **Data Engineering Proficiency:** - Python: Pyspark, ETL processes - Big Data: Hadoop, Spark - Data Warehousing, SQL, Tableau - **Cybersecurity and Pentesting:** - Penetration Testing, Firewall Implementation - Secure Authentication, Data Encryption - **Effective Communication:** - Slack, Microsoft Teams for real-time collaboration - **Version Control and Collaboration:** - Git, GitHub, Bitbucket for seamless collaboration - **Third-Party Integration:** - Integration of APIs such as Google Maps, Stripe **Client-Centric Approach:** With 7 years of experience encompassing backend, team lead, and project manager roles, I'm committed to understanding your unique needs and delivering tailored solutions. **Why Choose Me? Data-Driven Solutions with Ironclad Security:** - **Data and Security Integration:** Unique combination of Data Engineering and Cybersecurity expertise for efficient and secure projects. - **Innovative Problem Solver:** Passionate about finding creative solutions to complex challenges. - **Timely Delivery:** Committed to delivering high-quality work within agreed timelines. - **Client Collaboration:** Dedicated to keeping you involved throughout the project, providing updates and insights. **Client Offers: Custom Solutions for Your Data and Security Needs:** - **Comprehensive Data Analysis:** Tailored approach to understand your data requirements and objectives. - **Efficient Data Pipelines:** Design and implementation of efficient ETL processes and data pipelines. - **Robust Cybersecurity Measures:** State-of-the-art cybersecurity measures for data integrity and protection. - **Data Visualization:** Insightful visualizations for informed decision-making. - **Continuous Security Monitoring:** Ongoing security monitoring to detect and address potential threats proactively. - **Training and Consultation:** Training on best practices in data management and cybersecurity. **Let's collaborate to bring your projects to new heights! Reach out for a discussion on how I can contribute to your success.**
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    WordPress
    PHP
    Back-End Development Framework
    RESTful API
    RESTful Architecture
    Flask
    API Framework
    API
    Django
    API Integration
    API Testing
    API Development
    ETL Pipeline
    Python
    Docker
  • $35 hourly
    I am a data engineer expert with over than 5 years experience in data ingestion, integration and manipulation. Till date, I have done many projects in data engineering and big data. I worked on business analytics and telco analytics, i used multy data platforms and framework such as Cloudera data platform, Nifi, R studio, Spark, Hadoop, Kafka ... If this is what you want, then get in touch with me
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Cloud Engineering
    Cloudera
    Apache Hadoop
    Data Warehousing
    Apache NiFi
    Linux
    Apache Spark
    Data Lake
    Data Analysis
    SQL
    Big Data
    Business Intelligence
    Scala
    Apache Hive
    Python
  • $50 hourly
    DataOps Leader with 20+ Years of Experience in Software Development and IT Expertise in a Wide Range of Cutting-Edge Technologies * Databases: NoSQL, SQL Server, SSIS, Cassandra, Spark, Hadoop, PostgreSQL, Postgis, MySQL, GIS Percona, Tokudb, HandlerSockets (nosql), CRATE, RedShift, Riak, Hive, Sqoop * Search Engines: Sphinx, Solr, Elastic Search, AWS cloud search * In-Memory Computing: Redis, memcached * Analytics: ETL, Analytic data from few millions to billions of rows and analytics on it, Sentiment analysis, Google BigQuery, Apache Zeppelin, Splunk, Trifacta Wrangler, Tableau * Languages & Scripting: Python, php, shell scripts, Scala, bootstrap, C, C++, Java, Nodejs, DotNet * Servers: Apache, Nginx, CentOS, Ubuntu, Windows, distributed data, EC2, RDS, and Linux systems Proven Track Record of Success in Leading IT Initiatives and Delivering Solutions * Full lifecycle project management experience * Hands-on experience in leading all stages of system development * Ability to coordinate and direct all phases of project-based efforts * Proven ability to manage, motivate, and lead project teams Ready to Take on the Challenge of DataOps I am a highly motivated and results-oriented IT Specialist with a proven track record of success in leading IT initiatives and delivering solutions. I am confident that my skills and experience would be a valuable asset to any team looking to implement DataOps practices. I am excited about the opportunity to use my skills and experience to help organizations of all sizes achieve their data goals.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Python
    Scala
    ETL Pipeline
    Data Modeling
    NoSQL Database
    BigQuery
    Apache Spark
    Sphinx
    Linux System Administration
    Amazon Redshift
    PostgreSQL
    ETL
    MySQL
    Database Optimization
    Apache Cassandra
  • $20 hourly
    I’m a developer with experience in BigData/AI and Spring framework . 1. I’m experienced in pyspark/hbase/redis/kafka/spark/flink. 6years + 2. I’m experienced in tensorflow/pytorch. 2 years+ 3. I’m experienced in html/spring mvc. 1 year + 4.I’m experienced in aws componet, such as emr/ec2/s3/code deploy; 1 year 5.I’m experienced in gcp componet, such as cloud storage/dataproc/bigquery; 1 year
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Google Cloud Platform
    PySpark
    ETL Pipeline
    Artificial Intelligence
    Spring MVC
    Apache Flink
    Apache Spark
    AWS Application
    Data Science
    Java
    Big Data
    Recommendation System
    Scala
    Python
    Data Mining
  • Want to browse more freelancers?
    Sign up

How it works

1. Post a job (it’s free)

Tell us what you need. Provide as many details as possible, but don’t worry about getting it perfect.

2. Talent comes to you

Get qualified proposals within 24 hours, and meet the candidates you’re excited about. Hire as soon as you’re ready.

3. Collaborate easily

Use Upwork to chat or video call, share files, and track project progress right from the app.

4. Payment simplified

Receive invoices and make payments through Upwork. Only pay for work you authorize.

Trusted by

How do I hire a Pyspark Developer on Upwork?

You can hire a Pyspark Developer on Upwork in four simple steps:

  • Create a job post tailored to your Pyspark Developer project scope. We’ll walk you through the process step by step.
  • Browse top Pyspark Developer talent on Upwork and invite them to your project.
  • Once the proposals start flowing in, create a shortlist of top Pyspark Developer profiles and interview.
  • Hire the right Pyspark Developer for your project from Upwork, the world’s largest work marketplace.

At Upwork, we believe talent staffing should be easy.

How much does it cost to hire a Pyspark Developer?

Rates charged by Pyspark Developers on Upwork can vary with a number of factors including experience, location, and market conditions. See hourly rates for in-demand skills on Upwork.

Why hire a Pyspark Developer on Upwork?

As the world’s work marketplace, we connect highly-skilled freelance Pyspark Developers and businesses and help them build trusted, long-term relationships so they can achieve more together. Let us help you build the dream Pyspark Developer team you need to succeed.

Can I hire a Pyspark Developer within 24 hours on Upwork?

Depending on availability and the quality of your job post, it’s entirely possible to sign up for Upwork and receive Pyspark Developer proposals within 24 hours of posting a job description.

Schedule a call