Hire the best Pyspark developers

Check out Pyspark developers with the skills you need for your next job.
Clients rate Pyspark developers
Rating is 4.7 out of 5.
4.7/5
based on 169 client reviews
  • $30 hourly
    🏆 Expert in creating robust, scalable and cost-effective solutions using Big Data technologies for past 9 years. 🏆 The main areas of expertise are: 📍 Big data - Apache Spark, Spark Streaming, Hadoop, Kafka, Kafka Streams, HDFS, Hive, Solr, Airflow, Sqoop, NiFi, Flink 📍 AWS Cloud Services - AWS S3, AWS EC2, AWS Glue, AWS RedShift, AWS SQS, AWS RDS, AWS EMR 📍 Azure Cloud Services - Azure Data Factory, Azure Databricks, Azure HDInsights, Azure SQL 📍 Google Cloud Services - GCP DataProc 📍 Search Engine - Apache Solr 📍 NoSQL - HBase, Cassandra, MongoDB 📍 Platform - Data Warehousing, Data lake 📍 Visualization - Power BI 📍 Distributions - Cloudera 📍 DevOps - Jenkins 📍 Accelerators - Data Quality, Data Curation, Data Catalog
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    SQL
    AWS Glue
    PySpark
    Apache Cassandra
    ETL Pipeline
    Apache Hive
    Apache NiFi
    Apache Kafka
    Big Data
    Apache Hadoop
    Scala
    Apache Spark
  • $100 hourly
    I have over 4 years of experience in Data Engineering (especially using Spark and pySpark to gain value from massive amounts of data). I worked with analysts and data scientists by conducting workshops on working in Hadoop/Spark and resolving their issues with big data ecosystem. I also have experience on Hadoop maintenace and building ETL, especially between Hadoop and Kafka. You can find my profile on stackoverflow (link in Portfolio section) - I help mostly in spark and pyspark tagged questions.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    MongoDB
    PySpark
    Data Migration
    Apache Airflow
    Python
    Data Warehousing
    Data Scraping
    Data Visualization
    ETL
    Apache Kafka
    Apache Hadoop
    Apache Spark
  • $15 hourly
    Hello. Greetings!😊. Thanks very much for looking into my profile. I am a passionate, proactive, Microsoft Certified Data Scientist having 4+ years of industry experience in analytics, machine learning, deep learning and natural language processing in different domains like Telecom, Transportation, Customer Success, etc. I have exceptional skills in following areas: ✔️ Data Processing & Manipulation ✔️ Data Cleaning ✔️ Data Visualization ✔️ Predictive Analytics ✔️ Text Analytics ✔️ Log Analytics ✔️ Time Series Analysis ✔️ Statistical Analysis ✔️ Big Data Analytics (Spark) Some of the services I offer include: ✔️ Fully automated real-time Power BI dashboards with relevant charts and KPIs to help business monitor their performance and make data-driven decisions. ✔️ AI Powered applications leveraging LangChain and LLMs (OpenAI & Open Source models) ✔️ Data Analysis and Visualisation to draw meaningful insights that drive business growth. ✔️ Set MLFlow in Databricks for MLOps pipeline ✔️ Data Processing- Data Cleaning, Data Transformation and Data Integration. ✔️ Azure Databricks, Azure Data Explorer(ADX) and Azure Machine Learning solutions. ✔️ Task automation using Python. ✔️ Univariate and Multivariate Forecasting (Demand, Inventory, Sales, etc) ✔️ Python, KQL and SQL support Technical Skills: ✔️ Python ✔️ Spark ✔️ SQL ✔️ KQL (Kusto Query Language) ✔️ NLP ✔️ LangChain ✔️ Python libraries like Pandas, Numpy, Sickit-Learn, Flask, Lightgbm, XGboost, Pycaret, Fbprophet, etc ✔️ Azure Databricks ✔️ Power BI ✔️ Azure Data Explorer (ADX) ✔️ Azure Machine Learning ✔️ Streamlit With a proven track record of delivering high-quality results on challenging projects within tight timelines, I am dedicated to being accessible and collaborative 24/7. I am committed to effective communication and brainstorming sessions to ensure your needs are met efficiently and effectively. And it is my passion and responsibility to do your task in the best and efficient way that will satisfy you and build your trust. I am flexible and negotiable about budget or cost. See you soon. NB: Example works are provided in the portfolio section.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Time Series Analysis
    Data Analysis
    Data Processing
    Data Visualization
    LangChain
    Generative AI
    PySpark
    Kusto Query Language
    Microsoft Power BI
    Databricks Platform
    Machine Learning
    Natural Language Processing
    Python
  • $40 hourly
    I am a data engineer expert with over than 5 years experience in data ingestion, integration and manipulation. Till date, I have done many projects in data engineering and big data. I worked on business analytics and telco analytics, i used multy data platforms and framework such as Cloudera data platform, Nifi, R studio, Spark, Hadoop, Kafka ... If this is what you want, then get in touch with me
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Cloud Engineering
    Cloudera
    Apache Hadoop
    Data Warehousing
    Apache NiFi
    Linux
    Apache Spark
    Data Lake
    Data Analysis
    SQL
    Big Data
    Business Intelligence
    Scala
    Apache Hive
    Python
  • $55 hourly
    I have more than seven years of hands-on experience in data engineering. My specialities are building data platforms and data pipelines with different sources. I'm keen to work on an end-to-end data pipeline building on AWS or GCP. I can fix your time and resource-killing data pipeline issues. Share your gig. Feel the difference. Also I have an expertise in: - Full Stack Web Application Development / Database Design / API development & Integration - DevOps/Linux Server Administration/Deployment/Migrations/Hosting - Web Automation / Scraping / Crawlers / Bots
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    PySpark
    API
    AWS Lambda
    Amazon Web Services
    ETL Pipeline
    Apache Spark
    Python
    Scrapy
    Amazon S3
    Data Mining
    AWS Glue
    Apache Airflow
    DevOps
    Docker
    Data Migration
  • $30 hourly
    Data Engineer with over 5 years of experience in developing Python-based solutions and leveraging Machine Learning algorithms to address complex challenges. I have a strong background in Data Integration, Data Warehousing, Data Modelling, and Data Quality. I excel at implementing and maintaining both batch and streaming Big Data pipelines with automated workflows. My expertise lies in driving data-driven insights, optimizing processes, and delivering value to businesses through a comprehensive understanding of data engineering principles and best practices. KEY SKILLS Python | SQL | PySpark | JavaScript | Google cloud platform (GCP) | Azure | Amazon web services (AWS) | TensorFlow | Keras | ETL | ELT | DBT | BigQuery | BigTable | Redshift | Snowflake | Data warehouse | Data Lake | Data proc | Data Flow | Data Fusion | Data prep | Pubsub | Looker | Data studio | Data factory | Databricks | Auto ML | Vertex AI | Pandas | Big Data | Numpy | Dask | Apache Beam | Apache Airflow | Azure Synapse | Cloud Data Loss Prevention | Machine Learning | Deep learning | Kafka | Scikit Learn | Data visualisation | Tableau | Power BI | Django | Git | GitLab
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Data Engineering
    dbt
    ETL
    Chatbot
    CI/CD
    Kubernetes
    Docker
    Apache Airflow
    Apache Kafka
    Exploratory Data Analysis
    PySpark
    Python
    SQL
    Machine Learning
    BigQuery
  • $20 hourly
    Data Scientist having an experience of around 6 years. Worked on projects like price optimization,space optimization for Retail clients. Worked for Bank of America as ML Engineer and currently serving Telecom in their ML and Data Science things
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Microsoft Power BI
    Statistics
    Apache Cassandra
    MySQL Programming
    MySQL
    PySpark
    PostgreSQL
    Predictive Analytics
    Tableau
    Machine Learning
    Classification
    Natural Language Processing
    Logistic Regression
    Data Science
    Python
  • $75 hourly
    Tool-oriented data science professional with extensive experience supporting multiple clients in Hadoop and Kubernetes environments, deployed with Cloudera Hadoop on-premise and Databricks in AWS. My passion is client adoption and success, with a focus on usability. With my computer science and applied math background, I have been able to fill the gap between platform engineers and users, continuously pushing for product enhancements. As a result, I have continued to create innovative solutions for clients in an environment where use-cases continue to evolve every day. I find fulfillment in being able to drive the direction of a solution in a way that allows both client and support teams to have open lanes of communication, creating success and growth. I enjoy working in a diverse environment that pushes me to learn new things. I'm interested in working on emerging solutions as data science continues to evolve.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    R
    Serverless Stack
    React
    Apache Hadoop
    Java
    Cloudera
    AWS Lambda
    Apache Impala
    R Hadoop
    Bash Programming
    PostgreSQL
    Apache Spark
    Python
    AWS Development
    Apache Hive
  • $40 hourly
    DataOps Leader with 20+ Years of Experience in Software Development and IT Expertise in a Wide Range of Cutting-Edge Technologies * Databases: NoSQL, SQL Server, SSIS, Cassandra, Spark, Hadoop, PostgreSQL, Postgis, MySQL, GIS Percona, Tokudb, HandlerSockets (nosql), CRATE, RedShift, Riak, Hive, Sqoop * Search Engines: Sphinx, Solr, Elastic Search, AWS cloud search * In-Memory Computing: Redis, memcached * Analytics: ETL, Analytic data from few millions to billions of rows and analytics on it, Sentiment analysis, Google BigQuery, Apache Zeppelin, Splunk, Trifacta Wrangler, Tableau * Languages & Scripting: Python, php, shell scripts, Scala, bootstrap, C, C++, Java, Nodejs, DotNet * Servers: Apache, Nginx, CentOS, Ubuntu, Windows, distributed data, EC2, RDS, and Linux systems Proven Track Record of Success in Leading IT Initiatives and Delivering Solutions * Full lifecycle project management experience * Hands-on experience in leading all stages of system development * Ability to coordinate and direct all phases of project-based efforts * Proven ability to manage, motivate, and lead project teams Ready to Take on the Challenge of DataOps I am a highly motivated and results-oriented IT Specialist with a proven track record of success in leading IT initiatives and delivering solutions. I am confident that my skills and experience would be a valuable asset to any team looking to implement DataOps practices. I am excited about the opportunity to use my skills and experience to help organizations of all sizes achieve their data goals.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Python
    Scala
    ETL Pipeline
    Data Modeling
    NoSQL Database
    BigQuery
    Apache Spark
    Sphinx
    Linux System Administration
    Amazon Redshift
    PostgreSQL
    ETL
    MySQL
    Database Optimization
    Apache Cassandra
  • $20 hourly
    I am a Data Engineering and Data Science professional with 3+ years of experience. I have Masters of Science in Data Analytics and Bachelor of Engineering in Computer Science. In the past I have worked as a SME on multiple projects in Analytics domain. I have successfully delivered projects where I was responsible to build Data pipelines, perform Data wrangling, Data analysis using ML algorithms, build Dynamic dashboards, etc. I will perform End to End analysis from ETL, to analysis and reporting. I will gather every ounce of information from your data and back the insights generated scientifically with Statistical tests and ML Algorithms. I have experience of working with AWS services such as AWS Lambda, AWS ECR, AWS S3, AWS Step Function, AWS EC2, AWS Batch, AWS Fargate, AWS EFS, AWS Glue, AWS EMR, AWS IAM, AWS RDS/Aurora, AWS Secrets Manager, AWS DynamoDB, AWS Redshift. Technically I am sound in Python, SQL, Machine Learning, PySpark, AWS, Statistical tests, Power BI and Tableau. I have experience working in tech startups and MN-Cs alike. I am no stranger to hard work and pride in having a sincere attitude and learning on the fly as needed to get the job done. I offer: 1 - 100% satisfaction 2 - Multiple Revisions till satisfaction 3 - 24/7 support 4- followup post-delivery
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    AWS Glue
    ETL
    Amazon Web Services
    Data Mining
    PySpark
    AWS Lambda
    Machine Learning
    Python
    Apache Spark
    Data Analysis
    SQL
  • $60 hourly
    𝗖𝗲𝗿𝘁𝗶𝗳𝗶𝗲𝗱 𝗗𝗮𝘁𝗮 𝗣𝗿𝗼𝗳𝗲𝘀𝘀𝗶𝗼𝗻𝗮𝗹 big data engineer with 𝟯+ 𝘆𝗲𝗮𝗿𝘀 of experience and have been working extensively on building ETL pipelines, designing and synchronizing elegant dashboards and developing and maintaining end-to-end big data solutions for customers based of various big data tools and technologies as such Apache Spark, Hadoop, Kudu, Hive, Apache Nifi and more. 𝙄 𝙝𝙖𝙫𝙚 𝙚𝙭𝙥𝙚𝙧𝙞𝙚𝙣𝙘𝙚 𝙞𝙣 𝙩𝙝𝙚 𝙛𝙤𝙡𝙡𝙤𝙬𝙞𝙣𝙜 𝙖𝙧𝙚𝙖𝙨, 𝙩𝙤𝙤𝙡𝙨 𝙖𝙣𝙙 𝙩𝙚𝙘𝙝𝙣𝙤𝙡𝙤𝙜𝙞𝙚𝙨: ► BIG DATA & DATA ENGINEERING Apache Spark, Hadoop, MapReduce, YARN, Pig, Hive, Kudu, HBase, Impala, Delta Lake, Oozie, NiFi, Kafka, Airflow, Kylin, Druid, Flink, Presto, Drill, Phoenix, Ambari, Ranger, Cloudera Manager, Zookeeper, Spark-Streaming, Streamsets, Snowflake ► CLOUD AWS -- EC2, S3, RDS, EMR, Redshift, Lambda, VPC, DynamoDB, Athena, Kinesis, Glue GCP -- BigQuery, Dataflow, Pub/Sub, Dataproc, Cloud Data Fusion Azure -- Data Factory, Synapse. HDInsight ► ANALYTICS, BI & DATA VISUALIZATION Tableau, Power BI, QuickSight, SSAS, SSMS, Superset, Grafana, Looker ► DATABASE SQL, NoSQL, Oracle, SQL Server, MySQL, PosgreSQL, MongoDB, PL/SQL, HBase, Cassandra ► OTHER SKILLS & TOOLS Docker, Kubernetes, Ansible, Pentaho, Python, Scala, Java, C, C++, C#
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    PostgreSQL
    Apache Hive
    Apache Hadoop
    Amazon Athena
    Amazon Redshift
    Amazon S3
    BigQuery
    AWS Lambda
    AWS Glue
    ETL Pipeline
    Python
    Apache Spark
    Apache Impala
    Apache NiFi
    Big Data
  • $80 hourly
    ✅ AWS Certified Solutions Architect ✅ Google Cloud Certified Professional Data Engineer ✅ SnowPro Core Certified Individual ✅ Upwork Certified Top Rated Professional Plus ✅ The author of Python package for cryptocurrency market Currency.com (python-currencycom) Specializing in Business Intelligence Development, ETL Development, and API Development with Python, Apache Spark, SQL, Airflow, Snowflake, Amazon Redshift, GCP, and AWS. Accomplished lots of complicated and not very projects like: ✪ Highly scalable distributed applications for real-time analytics ✪ Designing data Warehouse and developing ETL Pipelines for multiple mobile apps ✪ Cost optimization for existing cloud infrastructure But the main point: I have a responsibility for the final result.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Data Scraping
    Snowflake
    ETL
    BigQuery
    Amazon Redshift
    Big Data
    Data Engineering
    Cloud Architecture
    Google Cloud Platform
    ETL Pipeline
    Python
    Amazon Web Services
    Apache Airflow
    SQL
    Apache Spark
  • $40 hourly
    🏆 Top Rated Plus ㅤㅤ✅ 9+ years of experience in Upwork ㅤㅤ✅ Expert in Python, AWS Cloud, ReactJS, NodeJS ㅤㅤ✅ Proven expertise in Web Scraping and Web Development ㅤㅤ✅ Available 7 days/week ㅤㅤ✅ Fluent English ㅤㅤ✅ Certified AWS Solution Architect Associate ㅤㅤ✅ Certified Agile Scrum Master ㅤㅤ✅ Long-term project support 🟢𝙒𝙃𝘼𝙏 𝙄 𝘾𝘼𝙉 𝘿𝙊 𝙁𝙊𝙍 𝙔𝙊𝙐: 🔸 Quickly build application prototype 🔸 Create scalable web applications using microservice architecture (need website that can handle 100k+ users? I've got you!) 🔸 Build minimum cost MVPs using serverless architecture (don't want to pay too much for MVP's infrastructure? We are here to help😊) 🔸 Scrape websites regularly 🔸 Provide long-term project support & maintenance 🔸 Large scale distributed scraping projects 🟢𝙔𝙊𝙐 𝙉𝙀𝙀𝘿 𝙈𝙀, 𝙄𝙁: 🔸 You have an idea but don't know how to turn it into a prototype 🔸 You need to develop a prototype cheap and fast to test your ideas 🔸 You need to maintain & develop a legacy web scraping / web application project 🔸 You want your legacy web application to scale automatically 🔸 You need to optimize for cost and speed for your web application / web scraper 🔸 You need long-term support of data feeds / web maintenance 🔸 High availability is important 🔸 You want to customize an open-source project 🔸 You want to build applications that can handle over 20M rows of data on a daily basis 🟢𝙔𝙊𝙐𝙍 𝘽𝙀𝙉𝙀𝙁𝙄𝙏𝙎 𝙊𝙁 𝙒𝙊𝙍𝙆𝙄𝙉𝙂 𝙒𝙄𝙏𝙃 𝙈𝙀: 🔸 I’ve worked on 40+ web scraping / web development projects 🔸 I’ve plenty AWS credits that can save you cost when developing a prototype 🔸 I have a large code base of components that I can reuse to save your time and cost 🔸 Commitment 🔸 Relevant project-related suggestions 🔸 Expert advice for best practices to optimize your software product 🕷️𝗠𝗬 𝗧𝗘𝗖𝗛-𝗦𝗧𝗔𝗖𝗞🕷️ Python, Scrapy, Puppeteer, PostgreSQL, MongoDB, DynamoDB, RedShift, Athena, AuroraDB, Selenium, BeautifulSoup, Requests, Proxy Rotation, ReactJS, NodeJS, NextJS, Django, Flask, AWS, Firebase, GCP, Docker, pySpark, ElasticSearch, Neptune, Neo4J 🕸️𝗦𝗔𝗠𝗣𝗟𝗘 𝗪𝗘𝗕 𝗦𝗖𝗥𝗔𝗣𝗜𝗡𝗚 𝗣𝗥𝗢𝗝𝗘𝗖𝗧𝗦 🕸️ Developed a social network and freelance marketplace platform with integrated payment system Developed a big data platform that updates data in real time and handles over 1M rows of data daily Develop a stock analysis and optimization tool using Scipy and Python Download and process over 25GB of trademark data and ingest into Elastic Search Scrape 50 eCommerce websites for drop shippers Scrape over 100 book sellers on Amazon daily for an eCommerce website Scrape and compare over 1000 products daily on Amazon and eBay for price arbitrage
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    TensorFlow
    MySQL
    Web Scraper
    Browser Extension
    Google Chrome Extension
    React
    Python Scikit-Learn
    Machine Learning
    Google Cloud Platform
    Docker
    Amazon Web Services
    PySpark
    Selenium WebDriver
    Python Numpy
    Python
  • $50 hourly
    Do you need to expand your data team with someone who can deliver results? Or do you need someone to build data processes and applications for driving your business forward? I have built ml/data-based pipelines and infrastructures across the globe for clients like Audible, BMGF, Amazon Channels, etc. I have a cumulative experience of 6+ years (in Data Engineer, Data Scientist roles) with a strong focus on building data pipelines and warehouses, Big Data Analytics, ETL Processes, machine learning models, visualization, and business intelligence. What I can do? - Process your raw data and present it in a meaningful way - this could be either be output from a machine learning pipeline or BI dashboard or intermediate table for you to work with. - Create new channels to update your existing systems with insights and KPIs derived from BI Projects. - Suggest the best course of action for your data processes moving forward - Suggest new possible revenue sources or cost reductions. I feel comfortable with most modern tech stack but these are the tech I have used in the past for various projects. DATABASES: Postgres, Redshift, MSSQL, Aurora, Redshift, BigQuery, Snowflake, Cassandra, Stored Procedures AI STACK: Keras, NLTK, Sklearn, Pandas, Seaborn, Matplotlib, Numpy, Scipy, Gensim ETL: Airbyte, Glue, Datafactory, Airflow, Dagster, DBT LANGUAGES & LIBRARIES: Python, Java, C++, R, Pyspark, NetworkX BI & ANALYTICS: Plotly, Streamlit, Data Studio, PowerBI, Tableau, Powerquery WEB DEV: Flask, Bootstrap, APIs, Authentication, Auth0, FastAPI CLOUD: AWS, GCP, Azure, EMR, EC2, Sagemaker Studio, Lambda Functions, SNS, IAM, SSO, ECR, RDS, Cloudformation, Databricks MISC: Docker & Custom Images, Kubernetes, docker-spawn, GIT, Bash, Github actions Want to discuss your project? Please drop me a message. Thanks. - Sagar
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    PostgreSQL
    AWS Lambda
    ETL Pipeline
    Docker
    Snowflake
    dbt
    PySpark
    Amazon Redshift
    Amazon S3
    AWS Glue
    Tableau
    BigQuery
    Amazon SageMaker
    Chatbot
    Databricks Platform
  • $60 hourly
    ⭐⭐⭐⭐⭐"Shivam is totally fantastic. His work had a tremendous impact on our software systems. He is very punctual, responsive, and professional in every way imaginable." I'm a senior ML Engineer, and have solved several data science challenges and made massive performance improvements for fortune 500 companies and well-known organizations, including the following to illustrate some of them: ✅ Nike ✅ Two Sigma ✅ MakeMyTrip ✅ IndiaMart ⭐Here's what I can help you with⭐ ✅Machine/Deep learning: - Extensive research experience in developing custom model architecture and training routines (complex losses etc) for high accuracy and fast execution - Computer Vision (Image and Video analytics, OCR, Industrial Automation), Deep RL, AutoML, NLP, Speech - End-to-end deployment routines for production environment including Cloud, Web, Mobile, IoT, or Edge devices. ✅Software Development: - Software design practices for high performance, optimal memory usage and modular codebase - Design and development of algorithms using a vectorized and parallel programming mindset - Able to take up any new technology and get things done ✅Expertise: Software Development, NLP(Natural Language Processing), CV(Computer Vision), Machine Learning, Artificial Intelligence, Python, Tensorflow, Keras, Pytorch. ⭐ Why you should choose me over other freelancers ⭐ ✅ Client Reviews: I focus on providing value to all of my clients and earning their TRUST. ✅ Over-Delivering: this is core to my work as a freelancer. My focus is on giving more than what I expect to receive. I take pride in leaving all of my clients saying "WOW" ✅ Responsiveness: being extremely responsive and keeping all lines of communication readily open with my clients. ✅ Resilience: reach out to any of my current or former clients and ask them about my resilience. For any issue that my clients face, I attack them and find a solution. ✅ Kindness: one of the main aspects of my life that I implement in every facet. Treating everyone with respect, understanding all situations with empathy, and genuinely wanting to improve my client's situations.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    AWS Application
    OCR Algorithm
    Data Engineering
    Computer Science
    PySpark
    Data Analysis
    Natural Language Processing
    TensorFlow
    Machine Learning
    PyTorch
    GPT-3
    ChatGPT
    Computer Vision
    Recommendation System
  • $65 hourly
    🎖 TOP RATED PLUS (Top 3% 🥇 of GLOBAL talent on Upwork) I am passionate about making data useful for insights , have overall 10+ years of experience , I can provide scalable, cost effective solution for building your data platform .I am well versed in building end to end data analytics solution from ingestion , ETL/ELT pipelines, warehousing to building effective dashboard . I have worked on large scale data engineering with good exposure working on AWS/GCP cloud services. ✅ Proficient in ● Data Lake / Data Warehouse / Data Migration to cloud / event streaming to Lake etc ● Delta Lakes ● Bigquery ● SnowFlake ● Redshift ● DuckDB,MotherDuck ● Airflow ● MWAA - AWS managed Airflow ● DBT ● AWS GLUE ● AWS LakeFormation ● AWS Lamda ● GCP Data Flow ● GCP Data Proc ● GCP Data Fusion ● AirByte ● Looker /LookML ● Looker Studio ● Tableau ● Power BI ● AWS ● GCP ● Python ● SQL ● Streamlit ● Java ● Pyspark ● Kafka ● Kinesis ● Lightdash ✅ I am a computer engineering graduate with overall 10+ years of experience and 4 + years fulltime with companies like Tata Consultancy Services , Ericsson etc. Whether you are a start-up looking to setup your data analytics platform , I can help designing the system from scratch with tools required which are efficient and cost friendly . OR If you are large enterprise facing challenges with your data platform I can jump in to suggest the best solution to implement the modern data stack. Looking forward to working on interesting data engineering challenges
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Amazon Web Services
    Data Warehousing
    Amazon Redshift
    Google Cloud Platform
    LookML
    AWS Lambda
    Business Intelligence
    Java
    BigQuery
    ETL Pipeline
    Python
    Apache Airflow
    Data Integration
    Looker
    AWS Glue
  • $300 hourly
    ⭐⭐⭐⭐⭐ 5-Star Expert-Vetted 𝗧𝗼𝗽 𝗥𝗮𝘁𝗲𝗱 𝗣𝗹𝘂𝘀 Data Professional (Top 1% on Upwork) ❗FREE 15 MINUTE CONSULTATION❗ 💡𝕊𝕥𝕣𝕒𝕥𝕖𝕘𝕪 & ℂ𝕠𝕟𝕤𝕦𝕝𝕥𝕒𝕥𝕚𝕠𝕟💡 👉 ⭐"Ismail was awesome and extremely knowledgeable and helpful! He saved me and my team not only money but time! Words can’t describe how great he is, we will definitely work with him again."⭐ 👉 ⭐"Ismail is someone we will continue to involve in our companies most critical initiatives"⭐ ⭐"Ismail is an outstanding partner to work collaborate with. We continue to work with him again and again."⭐ ⭐"Ismail is great and very patient and responsive. He really took a large chunk of data and improved how we analyze medical evaluations. High recommend!"⭐ ⭐"Ismail is a polished professional; friendly, punctual, clear communicator, & best-in-class ability to translate technical terms & concepts into layman's terms. Highly recommend!"⭐ 🔨𝕀𝕞𝕡𝕝𝕖𝕞𝕖𝕟𝕥𝕒𝕥𝕚𝕠𝕟🔨 ⭐"He was able to interpret our desires quickly and rollout an amazing MVP based on our first discovery call. He is someone we will continue to work with as his skill set provided an end result far beyond our expectations."⭐ ⭐"Ismail was nothing short of remarkable. He was prompt, responsive, and just an incredible pro. Can't possibly give this gentleman a higher stamp of approval. Looking forward to working with him again in the future. Amazing guy!"⭐ ⭐"Ismail was a great resource in freelancer! He worked very hard and communicated to a high ability. He was very dependable and I always knew that he would get back to me and answer questions in a timely matter"⭐ ⭐"Ismail was great to work with. He was very responsive and provided a quick turn-around on the deliverable. He was thorough by adding notes and references to the job which made it helpful when going through the analysis. Ismail's approach to solving the project was methodical and very clean. I really appreciated the way he posed questions because it gave me assurance that he took the time to review the project and understand it to provide the best output. Thank you Ismail! I hope to work with you again!"⭐ ⭐"It was a great collaboration - Ismail was able to provide contextualized & quality output in short timings. Highly recommended for dependable & high quality contribution to your projects."⭐ ✨ℝ & ℝ 𝕊𝕙𝕚𝕟𝕪✨ ⭐"We worked with Ismail for his R and Shiny expertise. He was able to efficiently, effectively and reliably work on our project. Ismail is very easy to work with, and able to independently work on required tasks with little oversight to get the job done to high standards. We would certainly work with Ismail again on future projects."⭐ ⭐"Excellent quality of work and very quick turnaround. Will absolutely use again."⭐ ⭐"Ismail is a pro. Built us a solid, client-facing, Shiny app quickly. Will definitely hire him again."⭐ ⭐"Ismail was excellent and very professional - understood what we needed and was able to adjust for the additional customization request"⭐ ⭐"Ismail was excellent to work with. Really quick turnaround and great trouble-shooting when issues arose. Wrote super efficient code that did exactly what I needed. Highly recommend."⭐ ⭐"Excellent assignment! The articles delivered were high quality, insightful, and did a good job of walking through some complex technical topics. I look forward to working with Ismail in the future on similar engagements!"⭐ 🛑𝗠𝘆 𝗗𝗮𝘁𝗮 𝗦𝗰𝗶𝗲𝗻𝗰𝗲 𝗮𝗻𝗱 𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗰𝗼𝘂𝗿𝘀𝗲 𝗼𝗻 𝗨𝗱𝗲𝗺𝘆 𝗵𝗮𝘀 1️⃣0️⃣0️⃣0️⃣+ 𝗿𝗲𝘃𝗶𝗲𝘄𝘀 𝗮𝗻𝗱 8️⃣6️⃣⸴0️⃣0️⃣0️⃣+ 𝗿𝗲𝗴𝗶𝘀𝘁𝗲𝗿𝗲𝗱 𝘀𝘁𝘂𝗱𝗲𝗻𝘁𝘀🛑
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Data Modeling
    Database
    Cloud Engineering Consultation
    Data Warehousing
    ETL Pipeline
    Data Science Consultation
    SQL
    Data Visualization
    Data Scraping
    Data Extraction
    Data Analysis
    Data Science
    Machine Learning
    Microsoft Excel
    R
  • $50 hourly
    Hello, I am a machine learning and natural language processing developer having wast experience of more than 8 years in following tools and skills Data Visualization Machine Learning (Python, Keras, TensorFlow, Scikit-Learn, Graph Lab, Spark MLLib) NLP (Bert, Transformer, Word2vec, GloVe, textacy, spacy, rasa.ai, fasttext, IBM watson, wit.ai) Deep Learning(RNN, LSTM, GRU, CNN, Auto-Endcoders, GANs) Transformers(BERT, Elmo etc) Data Analytics Web Scraping Docker, Vagrant AWS, GCP Web development Mobile app development Frameworks I have been working on: TensorFlow , Pytorch, Keras, Pandas, Numpy, Spacy, Transformers, Scikit-Learn, Gensim, Scrapy, OpenCV, Fastapi, Plotly, Elasticsearch
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Web Application
    Deep Learning Modeling
    BERT
    Supervised Learning
    Data Science
    Natural Language Processing
    Amazon SageMaker
    PyTorch
    Deep Neural Network
    Python
    Deep Learning
    Tesseract OCR
    Machine Learning
    scikit-learn
    Python Scikit-Learn
  • $25 hourly
    I'm Databricks and Azure certified Data Engineer and currently working as FTE in one of leading Analytics IT firm. I have expertise in pyspark,python,SQL on technology like Databricks,Azure,Snowflake,SQL server. So looking forward to leverages my expertise for your requirements
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Snowflake
    pandas
    PySpark
    NumPy
    Databricks Platform
    Microsoft SQL Server
    MySQL
    Microsoft Azure
    Python
    Scala
  • $80 hourly
    Expert-Vetted! LLMs, Generative AI, Machine Learning with 6 years of Industry Experience. "🌟🌟🌟🌟🌟 Extremely professional data scientist and very friendly guy. Handling all the problems with maturity and delivering extraordinary results, I would suggest Ashmit to anyone who is in need of an experienced data scientist. Working with him was very pleasant and would hire him again." - Generative AI, LLMs, ChatGPT, Prompt Engineering,Machine Learning, NLP, Image Processing, Statistical Reports - Python and R, Data Modelling and Data Mining, Regressive, Predictive - Topic Modelling, Question-Answer Models, Computer Vision More than 6 years of Industry Experience with Python and Data Science: - Machine Learning :Keras, Tensorflow, Pytorch, Gensim, Scikit-learn, OpenAI, Langchain - Experience : * Projects: Sentiment Analysis, Semantic Clustering, ChatBots, E-Commerce Product Matching (Image and Text), Graph Data Mining, etc * Skills : Sequence Prediction, Text Classification, Text Generation, Summarization, Computer Vision and Image Processing * Architectures: Transformers, Bidirectional LSTM, Graph Neural Networks, Attention Models, Reinforcement Learning Personal Research Domain: Unearthing Text Associations in Deep Learning models for Euclid and Non-Euclid data, Semantic Associations in Text Data, Image Sentiment Analysis Other Experiences: Web-scraping and Development : Beautifulsoup,Scrapy,Selenium,Requests, Django, Flask, Falcon. DevOps: AWS Sagemaker, AWS, Nginx, Apache 2.4, mod_wsgi, uwsgi, Apache Storm (Python), Pyspark, Redis, Celery, ElasticSearch Databases: SQL, Postgres SQL, MongoDB, Cassandra, Azure DB, Pinecone DB Languages: R, C++, JavaScript, Shell(all) # Note: No Fixed Price projects please
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    AI Text-to-Image
    AI Text-to-Speech
    Hugging Face
    AI Chatbot
    LLM Prompt Engineering
    Large Language Model
    OpenAI Codex
    Prompt Engineering
    AI Consulting
    Sentiment Analysis
    ChatGPT
    Apache Spark
    Deep Learning
    Python
    Machine Learning
  • $40 hourly
    • 9+ years in IT having extensive and diverse experience in Microsoft Azure Cloud Computing, Data Discovery & Analysis, Data Modelling, ETL Design, Development, Testing, Implementation and Troubleshooting in the field of Data warehousing and Application Development. • Hands - on experience in Azure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis services, Application Insights, Azure Monitoring, Key Vault, Azure Data Lake. • Have Data warehousing experience in Business Intelligence Technologies and Database with Extensive Knowledge in Data analysis, TSQL queries, ETL & ELT Process, Reporting Services (using SSRS, Power BI) and Analysis Services. • Has good experience working with 11 to 12 TB ‘s SQL databases. • Used SQL Azure extensively for database needs in various applications. • Have good experience designing cloud-based solutions in Azure by creating Azure SQL database, setting up Elastic pool jobs. • Have extensive experience in creating pipeline jobs, schedule triggers using Azure data factory. • Experience on Migrating SQL database to Azure data Lake, Azure SQL Database, Azure SQL Managed Instance and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure storages using Azure Data factory. • An Expert in ETL Tool SSIS & SQL Server, specializing in RDBMS, ETL Concepts, Strategy, Design & Architecting, Development & Testing. • Comfortable interacting with senior management, project teams, business users and product vendors. • Extensive background in Data base design and development creating Index, Views, Stored Procedure, Triggers, and user defined Functions. • Experience in data transform from one server to other servers using tools like DTS/SSIS. Involved in large Data Migrations and transfers (Loading) using DTSX package. • Experience in performance tuning, query optimization and maintaining and optimizing a data warehouse. • Designing and Development of Reports for the End-user using SSRS. • Strong skills in visualization tools Power BI • Involved in major Migrations projects. • Good knowledge in Insurance, Healthcare and Logistics & Transportation domain, Ability to work with minimum guidance. • Perfect Team Player Always Volunteers to take responsibilities. • Ability to work collaboratively and understanding of business requirements and system architecture. • Goal-oriented and possess strong desire to work with huge data sets Skills: • Microsoft Azure - Cloud Services (PaaS & IaaS) Active Directory, Azure Data Factory, Data Bricks, Azure Synapse & Azure Storages (Azure Data Lake, Blob, Table, Queue & File Storages), Key Vault and SQL Azure, Azure Devops, Azure SQL Managed Instance, Azure SQL Database & Azure Snowflake SQL, Azure Logic Apps, Azure Function Apps. • MSBI (SSIS, SSRS, SSAS), SQL SERVER and PowerBI • Python, PowerShell, C# & VB.net • Data Migration • Project Analysis • Requirement Gathering • Client Relationship Management • Analytic Problem-Solving
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Data Integration
    Python Script
    ExpertKnowledge Synapse
    Databricks Platform
    Snowflake
    Microsoft Power BI
    Business Intelligence
    Microsoft Azure
    Microsoft Power BI Data Visualization
    Microsoft Azure SQL Database
    Data Warehousing
    Data Modeling
    ETL Pipeline
    Microsoft SQL Server Programming
    SQL Server Integration Services
  • $55 hourly
    I am a data engineer with 7 years of experience designing and implementing big data solutions. My expertise lies in Apache Spark, the Hadoop ecosystem, NoSQL databases, and writing ETLs using Python, Scala, and Java. Also, an avid Go developer who specializes in creating microservices using this programming language. I have experience in migrating large enterprises from legacy software to Go-based microservices My educational background is in Computer Science, and I began my career as a software developer before transitioning to big data. In my current role as a Data Engineer, I am responsible for building data pipelines, processing and analyzing large-scale data sets, and creating data models to support data-driven decision making. I have a strong understanding of distributed computing and extensive experience in optimizing data processing workflows for performance and scalability. I have worked on a variety of projects, including real-time data processing, data warehousing, and building streaming data pipelines. My technical skills include expertise in programming languages such as Python, Scala, Java, and technologies such as Apache Spark, Hadoop, and NoSQL databases like MongoDB and Cassandra.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Software Architecture & Design
    Amazon Web Services
    Golang
    Microservice
    Play Framework
    API Integration
    Scala
    ETL Pipeline
    Serverless Computing
    Big Data
    Apache Spark
    Apache Hadoop
    Python
  • $36 hourly
    Data Engineer with 5+ years of experience in designing and implementing scalable and cost-effective data engineering solutions using Python, DataBricks, and AWS. Expertise in: ⚙️ Python (Pyspark, Airflow) ⚙️ DataBricks ⚙️ AWS Data Analytics and AWS Serverless ⚙️ SQL (MySQL, PostgreSQL, SnowSQL) ⚙️ ETL, Data Management, and Data Integration ⚙️ Data Analytics and Engineering ⚙️ Talend open studio ⚙️ Talend data integration Proven ability to: ✅ Design and implement robust ETL processes to extract, transform, and load data from various sources, ensuring data quality and integrity throughout the pipeline. ✅ Manage and integrate data across various systems, ensuring seamless flow and accessibility. ✅ Build and maintain scalable data platforms and pipelines to support data-driven applications and analytics. ✅ Analyze complex datasets to extract meaningful insights and inform business decisions. ✅ Looking to collaborate with clients on challenging and innovative data engineering projects.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Data Analysis
    Database Management
    Amazon API Gateway
    Amazon Web Services
    SQL Programming
    ETL
    Automation
    Data Warehousing & ETL Software
    Amazon S3
    Selenium
    AWS Lambda
    Data Scraping
    Python
    SQL
    Data Integration
    AWS Glue
  • $25 hourly
    Hello, I'm Anup Patil, an accomplished Python developer with 8 years of extensive experience in web scraping, data automation, and API integration. I specialize in utilizing powerful tools and technologies like Scrapy, BeautifulSoup (bs4), Selenium, and more to extract valuable data efficiently. My expertise extends to cloud-based solutions with AWS, Django, Flask, and FastAPI. Here's what sets me apart: 🌟 8 Years of Proven Experience: With a track record of 8 years and 291 successful projects, I have honed my skills to deliver top-notch solutions. 🔧 Comprehensive Skill Set: I excel in web scraping, crawling, data mining, and data extraction using Scrapy, BeautifulSoup (bs4), and Selenium. 🔄 API Integration: I am proficient in integrating various APIs seamlessly to automate processes and gather data. 💽 Database Management: My expertise includes working with MySQL and SQL databases for efficient data storage and retrieval. 🔒 Proxy and CAPTCHA Handling: I have hands-on experience in bypassing security mechanisms, working with proxies, and solving CAPTCHA challenges. ☁️ Cloud Expertise: I am well-versed in AWS cloud services, ensuring scalable and reliable solutions for your projects. 🌐 Web Frameworks: I can develop web applications using Django, Flask, and FastAPI, making your data accessible through user-friendly interfaces. 🕒 24/7 Support: I am committed to providing continuous support and prompt communication to ensure project success. 🔐 Security Conscious: I am well-versed in security measures, ensuring data integrity and protection. 🤖 Familiar with ChatGPT and OpenAI: I have experience incorporating AI-powered solutions, such as ChatGPT, into projects. 🌐 Apify and Zyte Integration: I can integrate Apify and Zyte (formerly ScrapingHub) to enhance data extraction capabilities. My goal is to leverage my skills and experience to provide you with efficient and reliable solutions that meet your specific needs. Whether it's scraping complex websites, automating data retrieval, or building data-driven applications, I'm here to help you succeed. Let's discuss your project requirements and explore how I can contribute to your success. Contact me today, and let's get started on your next project!
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Web Crawler
    Django
    Data Collection
    Web Crawling
    ASP.NET MVC
    Data Science
    Data Extraction
    Data Scraping
    Python Numpy
    Data Entry
    Data Mining
    Web Scraper
    Python
  • $35 hourly
    Over 4 years of working experience in data engineering, ETL, AWS, and python. AWS data analytics and machine learning certified.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Amazon ECS
    AWS Lambda
    Amazon Redshift
    Amazon S3
    PySpark
    Amazon Web Services
    Analytics
    PostgreSQL
    SQL
    pandas
    AWS Glue
    Python
  • $25 hourly
    Highly-skilled experience of more than 6+ years in software development, testing and integration, cross platform application using Cloudera Distribution Hadoop(CDH), Google Cloud Platform(GCP), Spark, Scala, Hive, MySQL, Sqoop. Having 4 years of experience in (Big Data Echo System) Hadoop technologies Such as Apache Spark, HDFS, Hive, HBase, Sqoop, Streaming, querying, processing and analysis of big data. Knowledge of Big data Technology using Hadoop and Spark framework. Analytical and skilled in understanding business problems to develop systems that improve functionality.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    ETL Pipeline
    Big Data
    Data Management
    GitHub
    Amazon S3
    SQL
    Hive
    Scala
    Apache Spark
    Apache Kafka
  • $40 hourly
    Seeking for challenging task in design and development of scalable backend infrastructure solutions I work in following domains. 1. ETL Pipelines 2. Data Analysis 3. AWS (Amazon Web Services) and GCP (Google Cloud Development) deployment I mainly design all solutions in Python. I have 9 years of experience in Python. I have extensive experience in following frameworks/libraries - Flask, Django, Pandas, Numpy Regarding ETL Pipelines, I mainly provide end to end data pipelines using AWS/GCP/Custom Frameworks. I have more than 7+ years of experience in this domain. I have strong command in Scrapy and have done more than 300+ crawlers till date. Regarding Data Warehousing, I have extensive experience in Google BigQuery and AWS RedShift. I have hands on experience in handling millions of data and analyze them using GCP and AWS data warehousing solutions. I have 4+ years of experience in designing Serverless Applications using AWS and GCP. In addition to this, I am hands on bunch of services on GCP and AWS Cloud and provide efficient and cost effective solution over there.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Data Analysis
    Apache Spark
    PySpark
    ChatGPT
    Generative AI
    AWS Glue
    Google Cloud Platform
    BigQuery
    Snowflake
    Kubernetes
    Django
    Docker
    Serverless Stack
    Python
    Scrapy
    Data Scraping
    ETL Pipeline
  • Want to browse more freelancers?
    Sign up

How it works

1. Post a job (it’s free)

Tell us what you need. Provide as many details as possible, but don’t worry about getting it perfect.

2. Talent comes to you

Get qualified proposals within 24 hours, and meet the candidates you’re excited about. Hire as soon as you’re ready.

3. Collaborate easily

Use Upwork to chat or video call, share files, and track project progress right from the app.

4. Payment simplified

Receive invoices and make payments through Upwork. Only pay for work you authorize.

Trusted by

How do I hire a Pyspark Developer on Upwork?

You can hire a Pyspark Developer on Upwork in four simple steps:

  • Create a job post tailored to your Pyspark Developer project scope. We’ll walk you through the process step by step.
  • Browse top Pyspark Developer talent on Upwork and invite them to your project.
  • Once the proposals start flowing in, create a shortlist of top Pyspark Developer profiles and interview.
  • Hire the right Pyspark Developer for your project from Upwork, the world’s largest work marketplace.

At Upwork, we believe talent staffing should be easy.

How much does it cost to hire a Pyspark Developer?

Rates charged by Pyspark Developers on Upwork can vary with a number of factors including experience, location, and market conditions. See hourly rates for in-demand skills on Upwork.

Why hire a Pyspark Developer on Upwork?

As the world’s work marketplace, we connect highly-skilled freelance Pyspark Developers and businesses and help them build trusted, long-term relationships so they can achieve more together. Let us help you build the dream Pyspark Developer team you need to succeed.

Can I hire a Pyspark Developer within 24 hours on Upwork?

Depending on availability and the quality of your job post, it’s entirely possible to sign up for Upwork and receive Pyspark Developer proposals within 24 hours of posting a job description.

Schedule a call