Hire the best Pyspark Developers in Pune, IN

Check out Pyspark Developers in Pune, IN with the skills you need for your next job.
  • $40 hourly
    As a Senior Data Engineer with 9 years of extensive experience in the Data Engineering with Python ,Spark, Databricks, ETL Pipelines, Azure and AWS services, develop PySpark scripts and store data in ADLS using Azure Databricks. Additionally, I have created data pipelines for reading streaming data from MongoDB and developed Neo4j graphs based on stream-based data. I am well-versed in designing and modeling databases using Neo4j and MongoDB. I am seeking a challenging opportunity in a dynamic organization that can enhance my personal and professional growth while enabling me to make valuable contributions towards achieving the company's objectives. • Utilizing Azure Databricks to develop PySpark scripts and store data in ADLS. • Developing producers and consumers for stream-based data using Azure Event Hub. • Designing and modeling databases using Neo4j and MongoDB. • Creating data pipelines for reading streaming data from MongoDB. • Creating Neo4j graphs based on stream-based data. • Visualizing data for supply-demand analysis using Power BI. • Developing data pipelines on Azure to integrate Spark notebooks. • Developing ADF pipelines for a multi-environment and multi-tenant application. • Utilizing ADLS and Blob storage to store and retrieve data. • Proficient in Spark, HDFS, Hive, Python, PySpark, Kafka, SQL, Databricks, and Azure, AWS technologies. • Utilizing AWS EMR clusters to execute Hadoop ecosystems such as HDFS, Spark, and Hive. • Experienced in using AWS DynamoDB for data storage and caching data on Elasticache. • Involved in data migration projects that move data from SQL and Oracle to AWS S3 or Azure storage. • Skilled in designing and deploying dynamically scalable, fault-tolerant, and highly available applications on the AWS cloud. • Executed transformations using Spark, MapReduce, loaded data into HDFS, and utilized Sqoop to extract data from SQL into HDFS. • Proficient in working with Azure Data Factory, Azure Data Lake, Azure Databricks, Python, Spark, and PySpark. • Implemented a cognitive model for telecom data using NLP and Kafka cluster. • Competent in big data processing utilizing Hadoop, MapReduce, and HDFS.
    Featured Skill Pyspark
    Microsoft Azure SQL Database
    SQL
    MongoDB
    Data Engineering
    Microsoft Azure
    Apache Kafka
    Apache Hadoop
    AWS Glue
    PySpark
    Databricks Platform
    Hive Technology
    Apache Spark
    Azure Cosmos DB
    Apache Hive
    Python
  • $50 hourly
    Having a hands on experience on developing Analytics and Machine Learning, Data Science, Big Data and AWS Solutions.
    Featured Skill Pyspark
    Apache Cordova
    Cloud Services
    Analytics
    PySpark
    Data Science
    Python
    Apache Spark
    Machine Learning
  • $35 hourly
    Welcome to my profile! I'm a dedicated freelancer specializing in API development, data analytics, and web scraping. With a strong expertise in Python, Django, Flask, Docker, Kubernetes, Java, and Angular, I offer a wide range of specialized services to cater to your specific needs. My Services: ✔ Building robust REST APIs using Flask, Django, Angular, AngularJS, and Java SpringBoot. ✔ Web Scraping using Selenium and Beautiful Soup to efficiently extract data from websites. ✔ Data visualization using Python, R, Apache Superset, and Kibana (ELK stack) to transform complex data into actionable insights. ✔ Data analysis using Python and R to uncover valuable patterns and trends. ✔ Data modeling using SQL and Hive Query Language (HQL) for structured data stored in Hadoop. ✔ Identifying and formulating Key Performance Indicators (KPIs) tailored to your domain. ✔ Creating visually appealing dashboards to provide real-time data-driven decision-making capabilities. ✔ Web scraping and performing basic descriptive statistics on tabular data. ✔ Implementing CI/CD pipelines for efficient software development and deployment. ✔ Leveraging Natural Language Processing (NLP) techniques to extract insights from textual data. ✔ Applying Machine Learning and Data Science algorithms to drive predictive analytics. ✔ Data engineering using Spark and PySpark for big data processing. I have successfully delivered solutions based on data analytics, predictive analytics using various Machine Learning techniques (especially regression and tree-based models), data visualization, and dashboard creation across diverse industries including retail, marketing, manufacturing, e-commerce, and more. My expertise also extends to web scraping using Python, where I utilize powerful tools such as Selenium and Beautiful Soup to efficiently extract data from websites. Whether it's scraping product information, news articles, or any other data source, I can deliver accurate and reliable results to meet your requirements. I guarantee high-quality work at an affordable price, ensuring 100% accuracy in all my deliverables. Your satisfaction is my top priority, and I am committed to exceeding your expectations. If you're seeking a reliable and skilled freelancer to elevate your business with cutting-edge APIs, data analytics, visualization solutions, or web scraping capabilities, feel free to reach out to me. Let's collaborate and achieve your objectives! Contact me today to discuss your project requirements in detail. Thank you for visiting my profile! Sameer A
    Featured Skill Pyspark
    JSON
    Ruby on Rails
    API
    Data Mining
    Angular
    Android App
    Artificial Intelligence
    Django
    PySpark
    Flask
    Machine Learning
    Data Science
    Python
    Java
  • $40 hourly
    Selected Achievements: #Spearheaded the creation of a comprehensive Bill of Materials for the Pharma Med Tech Vision Care sectors, leading to a 20% increase in inventory efficiency. This involved detailed analysis and categorization of over 3,000 individual components, ensuring compliance with industry standards and streamlining supply chain processes. #Developed and optimized Supplier Volume Data for key business partners, enhancing the procurement strategy. This involved aggregating and analysing data from more than 50 suppliers, leading to a 15% reduction in costs and a 10% improvement in supplier delivery times. #Managed and executed an end-to-end ETL pipeline, integrating data from multiple sources to various destinations. This project involved the processing of over 5TB of data monthly, resulting in a 30% improvement in data processing efficiency and a 25% reduction in related errors. #As the lead of the Data Engineering team, successfully delivered numerous high-impact projects, resulting in a significant increase in team productivity and a 35% improvement in data quality. This was achieved through implementing best practices in data management and fostering a collaborative team environment. Ms. Mangal is a distinguished Data Engineer with 6.5 years of experience in diverse sectors like banking and healthcare. Known for quick learning and problem-solving, she specializes in ETL/ELT pipeline development using Azure technologies and crafting complex database queries, enhancing efficiency. Her career reflects a deep passion for innovation and delivering high-quality data solutions. Ms. Mangal's career is marked by a passion for innovation and a commitment to delivering high-quality data solutions. She specialises in - Azure Data Factory , Azure Data Bricks , Azure Logic Apps Azure Data Lake Storage, Azure Blob Storage Azure Key Vault, Azure App Directory SQL Server, Azure SQL Database, Oracle, Mongo DB, SQL/PL-SQL Python, PySpark ETL/ELT Pipelines Development using ADF, ASA Migration Using ARM Templates Jenkins, CI/CD Pipelines, SonarQube Docker GIT, Bit Bucket Tableau Together, let's unlock the mysteries of your data, propelling your business towards uncharted territories of success. The data universe is vast, and I'm here to help you navigate its stars! 🌌 Let's make data magic happen! 🌌
    Featured Skill Pyspark
    PySpark
    Python
    SQL
    Oracle
    Data Flow Diagram
    Databricks Platform
    Azure App Service
    Microsoft Azure SQL Database
    Data Integration
    Oracle PLSQL
    Data Lake
    Microsoft Azure
    Data Engineering
    Microsoft SQL Server
    Tableau
  • $45 hourly
    As a highly experienced Data Engineer with over 10+ years of expertise in the field, I have built a strong foundation in designing and implementing scalable, reliable, and efficient data solutions for a wide range of clients. I specialize in developing complex data architectures that leverage the latest technologies, including AWS, Azure, Spark, GCP, SQL, Python, and other big data stacks. My extensive experience includes designing and implementing large-scale data warehouses, data lakes, and ETL pipelines, as well as data processing systems that process and transform data in real-time. I am also well-versed in distributed computing and data modeling, having worked extensively with Hadoop, Spark, and NoSQL databases. As a team leader, I have successfully managed and mentored cross-functional teams of data engineers, data scientists, and data analysts, providing guidance and support to ensure the delivery of high-quality data-driven solutions that meet business objectives. If you are looking for a highly skilled Data Engineer with a proven track record of delivering scalable, reliable, and efficient data solutions, please do not hesitate to contact me. I am confident that I have the skills, experience, and expertise to meet your data needs and exceed your expectations.
    Featured Skill Pyspark
    Snowflake
    ETL
    PySpark
    MongoDB
    Unix Shell
    Data Migration
    Scala
    Microsoft Azure
    Amazon Web Services
    SQL
    Apache Hadoop
    Cloudera
    Apache Spark
  • $20 hourly
    - Senior Software Engineer with 8+ years of experience in building data-intensive applications and tackling challenging architectural/ scalability problems. - Showcasing excellence in delivering analytical and technical solutions in accordance with the customer requirements - Hands-on experience in data engineering functions including but not limited to data extraction, transformation, loading, and integration in support of enterprise data infrastructures that include data warehouse, operational data stores, and master data management - Taking ownership in terms of delivery, coordinating with relevant stakeholders, updating the status to the client on a daily basis; coordinating with the testing team and other teams for fixing bugs - Relevant experience of 3+ years in Big-Data analytics and Big-data handling using Hadoop Ecosystem tools such as Hive, HDFS, Spark, Sqoop, and Yarn - Extensive experience in working on Cloud Platform such as AWS with hands-on experience in using AWS Services – S3, Glue, Lambda & Step Function, RDS (Aurora DB), and Redshift. - Applied knowledge in AWS - EMR, EC2, SNS, SQS, and CloudWatch. - More than 3 years of experience in handling and processing Unstructured and Structured data using Python, Pyspark, and SQL. - Comprehensive knowledge of query building and expertise in handling RDBMS systems – MS SQL, MySQL, PostgreSQL, and Oracle (PL/SQL). - Strong experience in creating ETL pipelines using tools like Talend Studio DI and Big Data platform - Proficient in analyzing requirements and architecture specifications to create detailed designs and providing technical advice, training, and mentoring other associates in a lead capacity - Good understanding of Machine Learning and Statistical Analysis - Having sound knowledge in the Retail domain. With beginner-level knowledge, learning more about Insurance Domain. - Participating in deployment releases and release readiness reviews, and maintaining release repository - Interacting with clients for getting requirements, participating in PI planning, and ensuring the timely completion of projects; analyzing & designing the project requirements, various modules, and their functionality - Hands-on experience in handling the On-site applications & interacting with client review meetings & Brainstorm sessions with the technical team, Team Lead & Product Delivery Manager. - Excellent Analytical skills often lead to discovering requirement gaps at an early stage which ultimately helps in timely delivery and avoiding production issues.
    Featured Skill Pyspark
    Data Extraction
    Data Scraping
    Amazon Redshift
    MongoDB
    Generative AI
    LLM Prompt Engineering
    PostgreSQL Programming
    Data Warehousing & ETL Software
    Big Data
    Amazon Web Services
    PySpark
    Databricks Platform
    Machine Learning
    Python
    Apache Hadoop
    SQL
    Talend Open Studio
    Data Migration
  • $30 hourly
    Data Architect with Sound knowledge on Data Analytics, Architect and Engineering with proficient in Cloud Technologies, Python, Spark and BigData ecosystems, with experience of 8+ years in problem solving and consulting, Also I am proficient in Design and implementation of End to end architecture for data driven approaches with optimised performance and efficiency in design
    Featured Skill Pyspark
    Data Analysis
    Machine Learning
    PySpark
    Big Data
    Apache Spark
    Databricks Platform
    Python
  • $64 hourly
    I'm a cloud enthusiast with experience in building AWS cloud native apps as well as migrating existing code/app to AWS cloud. I am also certified AWS architect with AWS - Associate Architect as well as AWS- Professional Architect. Whether you are thinking of moving your existing app to cloud or planning to build a new one directly on cloud-I can help! I'll manage the project brief from start to finish.
    Featured Skill Pyspark
    AWS CodePipeline
    AWS CodeDeploy
    AWS Development
    Amazon Athena
    AWS Glue
    AWS CloudFormation
    PySpark
    Amazon S3
    AWS Cloud9
    Amazon RDS
    AWS Application
    Amazon EC2
    Cloud Computing
    Python
  • $5 hourly
    Nidhi Sharma 2x Certified Microsoft Azure Data Engineering Associate DP-203 2x Certified in Microsoft Power Bi Associate PL-300 Certified in Microsoft Azure Data Fundamentals DP-900 Certified in Microsoft Azure Fundamentals AZ 900 Professional Summary A Data engineering consultant with 4 years of experience in Python programming, Azure data engineering and Machine learning. A good grasp on SQL, azure data factory, data warehousing concepts, data bricks and machine learning. Have an active involvement in data ingestion, transformation, and warehousing. Also worked on automation and manual testing which involved creating the test cases and testing it with PyCharm and Data Analytics Studio and used JIRA X-ray reports and test plans for its proper storage and implementation. Have good expertise and experience in communicating with clients and stakeholders and helping our team in release activities and proper documentations. A good communicator with good analytical and
    Featured Skill Pyspark
    Data Engineering
    Data Analytics & Visualization Software
    Data Analysis
    PySpark
    Microsoft Azure
    Databricks Platform
    Python
  • $50 hourly
    Proficient Palantir developer with experience in azure databricks, azure datafactory, Microsoft azure,Palantir foundry application along with python , Mysql, Typescript,pyspark,
    Featured Skill Pyspark
    Microsoft Power BI
    Power Query
    Databricks Platform
    Big Data
    PySpark
    Microsoft Azure
    TypeScript
    pandas
    SQL
    Python
  • $15 hourly
    I am Big Data analyst and data scientist having 3+ years of financial industry experience. I have done my PG Diploma in Big Data analysis. Skilled in Python, PySpark, Java, SQL, Streamlit, Data visualisation and Data manipulation along with AI/ML Done projects in Automation of risk scorecard development along with deployment pipelines and model development.
    Featured Skill Pyspark
    YOLO
    PyQt
    Selenium
    Web Scraping
    API
    AI Chatbot
    Data Science
    Streamlit
    Front-End Development
    Java
    PySpark
    SQL
    Python
    Data Analysis
  • $60 hourly
    * Data Engineer having around 15 + years of extensive experience of analyzing requirements and designing Data solutions for Credit card & Insurance & Banking and healthcare and retail companies. * Designed and architected and developed data ingestion framework to ingest terabytes of data for top major retail clients with company Toshiba Global Commrece Solutions. * Well versed with technologies like Azure Databricks Pyspark ETL and other related Azure services like Azure Blob storage, Azure key vault, Azure Postgres SQL DB, Azure Synapse. * Experience in Implementation Medallion architecture with databricks delta tables to Extract and transfrom and load huge amounts of data in size of terabytes in different layers like Bronze/Silver/Gold using databricks autoloader utility. * Experience in implementing real time streams using spark.
    Featured Skill Pyspark
    Scala
    Python
    Apache Spark
    PySpark
    Databricks Platform
    Ab Initio
  • $40 hourly
    Seasoned Data Engineer and Generative AI Specialist with extensive experience in building scalable, end-to-end data solutions. - Proficient in Databricks, Snowflake, AWS, and Azure, - I design and optimize data pipelines, cloud architectures, and AI-driven applications. - My expertise spans advanced data engineering, Generative AI, and cloud-native solutions to drive business insights and innovation. Let's collaborate to transform your data challenges into efficient, impactful results.
    Featured Skill Pyspark
    Golang
    Data Analytics
    NoSQL Database
    Apache Spark
    AWS Glue
    Amazon Web Services
    Elixir
    Python
    Machine Learning
    Generative AI
    ETL Pipeline
    ETL
    PySpark
    Snowflake
    Databricks Platform
  • $50 hourly
    Experience: 11.7 Years Seasoned Senior Data Engineer with over 11 years of proven expertise in the Information Technology industry, specializing in delivering robust, scalable, and innovative data solutions across diverse domains including Banking, Finance, and Telecom. Big Data Expertise: 9+ years of comprehensive experience in Big Data ecosystems, proficient in Hadoop, MapReduce, Spark-Scala & PySpark, Hive, Kafka, Sqoop, Apache Pig, and Oozie. Cloud Proficiency: Hands-on expertise in leading cloud platforms, including Google Cloud Platform (GCP) with tools like BigQuery, DataProc, and Cloud Composer, and Azure with Databricks, Azure Data Lake (ADLS), and Azure Data Factory (ADF). Demonstrated ability to design and implement solutions leveraging Delta Tables and Databricks for PoCs. Data Engineering & Processing: Adept in data cleansing, curation, migration, and ingestion.
    Featured Skill Pyspark
    BigQuery
    Google Cloud Platform
    Big Data File Format
    Data Engineering
    dbt
    Sqoop
    Apache Hive
    Apache Kafka
    Apache Airflow
    Apache Hadoop
    Python
    Scala
    Apache Spark
    PySpark
    Big Data
  • $70 hourly
    • Expertise in Cloud Platforms: Cloudera, AWS, GCP and Azure • Expertise in designing end to end solutions for Data Governance, Data Warehousing, Business Intelligence (BI), Data Modelling, Data Integration, Data Replication, MDM, Data Quality and Data Migration projects. • Expertise on Big Data Hadoop development including Design architecture, development, system integration, and infrastructure readiness. • Extensively worked with Teradata utilities like BTEQ, Fast Export, Fast Load, Multi Load to export and load data to/from different source systems including flat files. • Good Understanding of GCP managed services e.g., Dataproc, Dataflow, pub/sub, Cloud functions, Cloud composer, Big Query, Big Table • Good Understanding of GCP core services like Google cloud storage, Google compute engine, Cloud SQL, Cloud IAM. • Good experience in ADB, Azure Data Lake, and Azure Synapse. • Experience in continuous delivery through CI/CD pipelines, containers and orchestration technologies. • Expert knowledge of Agile approaches software development and able to put key Agile principles into practice to deliver solutions incrementally. • Expertise Architect Reporting and Analytics solutions
    Featured Skill Pyspark
    Teradata
    Technical Support
    Microsoft Power BI Data Visualization
    Data Migration
    AWS Application
    Cloudera
    PySpark
    dbt
    Snowflake
    Data Engineering
    Data Extraction
    Machine Learning Model
    ETL
    Artificial Intelligence
    Data Analysis
  • $35 hourly
    Total work experience 8+ KEY SKILLS Spark Hive Impala Snowflake Snowpipe Airflow Apache Nifi Pyspark EMR Delta AWS Python Data Warehousing Snowpark Azure Databricks ETL Data Modeling Kafka Azure Data Factory GIT Spinnaker SQL Server Amazon Redshift Data Build Tool Power BI SCALA SQL Hadoop Hdfs Shell Scripting Data Analytics Big Data HBase Azure Data Lake System Design PROFILE SUMMARY Experienced Big Data Engineer with 8 years skilled in developing Big Data applications using a diverse tech stack including Spark, Scala, Hive, Impala,Azure data lake,Databricks,ADF, Python, AWS Services, ETL, Pyspark, Snowflake, SQL, Scope, TSQL,Cosmos, and PowerBI. Expertise in understanding project requirements and creating high-level designs. Proficient in writing low-level code based on established designs. Experienced in setting up Bitbucket, GIT, and Spinnaker pipelines (CICD) across multiple environments.
    Featured Skill Pyspark
    Apache Airflow
    Apache Hadoop
    Docker
    SQL Server Integration Services
    Java
    Scala
    Snowflake
    Dashboard
    SQL
    Azure App Service
    AWS Application
    PySpark
    Data Extraction
    Data Analysis
    ETL
  • $35 hourly
    I'm Yogendra, a hands-on technology executive with 19+ years of experience building AI-native platforms, enterprise data systems, and cloud-scale infrastructure. I’ve led global engineering teams at MSCI and BNY Mellon, built data platforms handling billions of daily events, and delivered low-latency APIs over petabyte-scale datasets. As founder of Colrows, I designed a proprietary SQL engine and orchestrated LLMs (ChatGPT, Claude, Gemini) using Graph RAG and agentic AI to turn natural language into accurate, actionable insights. Expertise LLM integration, prompt engineering, Graph RAG Cloud-native architecture (AWS, Azure, GCP) Data lakes, real-time systems, and platform scalability Team building, roadmap ownership, GTM support I work with startups and enterprises as a fractional CTO, technical advisor, or platform architect—helping scale products, modernize data infra, or bring AI into production. Let’s build something impactful.
    Featured Skill Pyspark
    Apache Spark
    Data Engineering
    Stream Processing Framework
    MongoDB
    Vector Database
    PySpark
    Apache Cassandra
    Elasticsearch
    LangChain
    LLM Prompt Engineering
    Apache Kafka
    Java
    Artificial Intelligence
    Machine Learning Model
    ETL
  • $40 hourly
    Experienced Data Engineer | Python • Spark • AWS • Elasticsearch • EMR I’m a results-driven Data Engineer with a strong background in designing and building scalable data pipelines for big data environments. I specialize in: Python scripting and automation Distributed data processing with Apache Spark Search and analytics using Elasticsearch / OpenSearch Real-time caching with Redis Cloud-native workflows using AWS EMR and S3 Whether you need to process large datasets, build ETL/ELT pipelines, or integrate search into your applications — I can manage projects end-to-end with clean, efficient code and best practices. 💬 I value clear and consistent communication and always keep clients updated throughout the project. Let’s connect and discuss how I can help bring your data project to life!
    Featured Skill Pyspark
    Redis
    PySpark
    Apache Kafka
    Java
    Elasticsearch
    Python
  • $35 hourly
    Results oriented Data Engineer with 8+ years of experience in building scalable data pipelines, and Business Intelligence solutions for products, using batch or real-time ETL, ELT and Big Data frameworks like Apache Spark. Skilled in Python scripting, Databases, Dimensional Data Modeling, leveraging SQL & dbt for data transformations, and deploying solutions on cloud platforms such as AWS, Snowflake, and Databricks.
    Featured Skill Pyspark
    Amazon Redshift
    ETL Pipeline
    Data Modeling
    PySpark
    ETL
    Data Warehousing & ETL Software
    Apache Superset
    dbt
    Snowflake
    Databricks Platform
    AWS Glue
    Data Visualization
    Data Engineering
    Python
    SQL
  • $25 hourly
    🚀 Skyrocket Your Business with Cutting-Edge AI and ML Solutions! 🚀 ⭐ Fortune 500 | 🤖 AI & Data Science and Analytics Expert | 🤖 LLM, AI Automation & Business Intelligence Consultant | Natural Language Processing | Generative AI | ✅ 15+ Years of Experience In today’s fast-evolving landscape of AI, Data Science, and Business Intelligence (BI), I understand the challenge of finding reliable expertise. My clients from Fortune 500 companies, worldwide startups, and various sectors have partnered with me to navigate these complexities. I have streamlined processes, built scalable AI models, and transformed their data into actionable strategies for growth and efficiency. My approach focuses on selecting the right tools tailored to each project’s needs, rather than chasing every new trend. My expertise: ✅ AI & Machine Learning – I specialize in developing AI-driven predictive analytics, anomaly detection, and natural language processing (NLP) solutions to automate processes and uncover valuable insights. By leveraging advanced technologies such as TensorFlow, PyTorch, Huggingface, Scikit-learn, GPT-4, LangChain, LangGraph and OpenAI API, I empower businesses to optimize decision making insights and improve operational efficiency. ✅ Data Analysis & Big Data – I specialize in processing and analyzing large-scale structured and unstructured datasets for both real-time and batch analytics. My proficiency in SQL, BigQuery, Snowflake, Apache Spark and Databricks empowers businesses to build scalable cloud-based analytics platforms that enhance decision-making capabilities. ✅ ETL & Data Engineering – I specialize in building efficient, automated ETL/ELT pipelines to enable fast, scalable data transformation. I specialize in Apache Airflow, AWS Glue, and Databricks, ensuring businesses have clean, structured, and reliable data pipelines for analytics, AI, and reporting. ✅ Business Intelligence (BI) & Data Visualization – I design real-time, interactive dashboards that simplify complex data and enhance decision-making. Using Power BI (DAX, Power Query), Tableau, Streamlit app, Shiny app, and Google Data Studio, I create custom BI solutions integrated range of data such as ERP, CRM and SaaS systems to deliver seamless analytics. ✅ AI-Driven Automation & RPA – I streamline manual workflows and enhance efficiency with AI-powered automation. Using Microsoft Power Automate I design end-to-end automation workflows, AI-driven RPA solutions, and API integrations that optimize business processes. Why Work With Me? ✅ Data-Driven Results – Leveraged Industry 4.0 technologies to enhance manufacturing processes and operational efficiency, reducing operational costs by $0.5 million annually per factory. ✅ Scalable AI, ML & BI Solutions – Predictive analytics resulting in manufacturing operational cost by 27%. AI-powered solutions integrated in data products achieving efficiency gains of ~$4M annually. ✅ Enterprise-Grade Data Infrastructure – Optimized ETL pipelines reducing data processing time by 30%
    Featured Skill Pyspark
    ETL
    PySpark
    Data Visualization
    Tableau
    R
    SQL
    Python
    Microsoft Power BI
    Industry 4.0
    Databricks Platform
    Machine Learning
    OpenAI API
    Natural Language Processing
    Generative AI
    Data Analytics
  • $40 hourly
    I'm a highly results-driven Automation and Data Engineering expert with 15+ years of experience building and scaling robust, efficient, and cost-effective solutions for businesses of all sizes. I specialize in cloud computing (Azure, AWS, GCP), automation, and data engineering, with a strong background in full-stack development. My passion lies in leveraging cutting-edge technologies to solve complex challenges and deliver tangible business value. I'm eager to collaborate with you on your next project and help you achieve your goals. **Core Competencies:** * **Data Engineering & Analytics:** I design, build, and maintain high-performance data pipelines using Azure Data Engineering tools, PySpark, and Databricks. I have extensive experience in data warehousing, ETL processes, data modeling, and business intelligence. For example, I recently developed a PySpark-based ETL pipeline that reduced data processing time by 60% for a financial services client, resulting in significant cost savings and improved reporting accuracy. I'm also proficient in Big Data technologies like Hadoop, MapReduce, and Pig. * **DevOps & Cloud:** I excel at implementing and managing CI/CD pipelines using Jenkins, Azure DevOps, GCP Cloud Build, and AWS CodePipeline. I'm highly experienced with containerization and orchestration technologies like Docker, Kubernetes (AKS, EKS, GKE), and OpenShift. I'm proficient in infrastructure as code (Terraform) and cloud platforms (Azure, GCP, and AWS). In a recent project, I automated the deployment process for a SaaS application using Kubernetes and Terraform, reducing deployment time from 2 days to 2 hours and increasing release frequency by 50%. I also specialize in monitoring and logging with Splunk and the ELK stack (Elasticsearch, Kibana, Logstash). * **Automation:** I have a deep understanding of Robotic Process Automation (RPA) using UiPath (Studio, Orchestrator, and RE Framework). I have a proven track record of automating complex business processes, resulting in increased efficiency and reduced operational costs. For instance, I automated a manual invoice processing workflow for a manufacturing company, saving them 15 hours per week and eliminating human error. I also leverage Python scripting for various automation tasks. * **Full Stack Development:** I'm a proficient Java full-stack developer with experience building robust and scalable web applications. I'm skilled in front-end technologies (HTML, CSS, JavaScript, React) and back-end frameworks (Spring Boot, REST APIs). I recently developed a web application for an e-commerce startup using Spring Boot and React, which handled over 10,000 transactions per day with 99.9% uptime. **Technical Skills (Upwork Keywords):** * DevOps * Data Engineering * Cloud Computing (AWS, Azure, GCP) * Kubernetes (AKS, EKS, GKE) * Docker * Terraform * Jenkins * CI/CD * RPA (UiPath) * Python * PySpark * Java * Full Stack Development * Web Development (HTML, CSS, JavaScript, React) * SQL (MSSQL, MySQL, PostgreSQL) * Big Data (Hadoop, MapReduce, Pig) * ELK Stack (Elasticsearch, Kibana, Logstash) * Splunk * Agile * JIRA * AzDo Boards * Git * GitHub * GitLab * Business Process Automation * REST APIs * API Development * AWS Lambda * Azure Functions * GCP Cloud Functions * AWS Fargate * Azure Container Registry * Cloud Foundry * Red Hat OpenShift * Amazon ECS * Google Kubernetes Services * Data Warehousing * ETL * Data Modeling * Business Intelligence * Databricks **Availability:** Full-time **Contact:** I'm available for consultations and eager to discuss your project requirements. Let's connect and explore how I can help you achieve your business objectives. **Cheers!**
    Featured Skill Pyspark
    Automation
    Scala
    Web Scraping
    Data Analytics
    Data Analytics & Visualization Software
    Microsoft Power Automate
    Big Data
    ETL
    Analytics
    PySpark
    Data Engineering
    Terraform
    UiPath
    SQL
    Python
  • $90 hourly
    *******Certified Apache Airflow Developer******* Having more than 7+ years of professional experience, I have done masters of Engineering in Information Technology. Currently working full time as Senior Consultant with one of a multi-national companies, I'm into a Data Engineering role working mostly on Python, PySpark, Airflow, Palantir Foundry, Collibra, SQL. In my past professional years I have also worked as Full Stack Developer building REST API's & UI functionalities. Also have mobile development experience using Flutter, Android & Xojo(for iOS). Please consider me if you want your work be done in time.
    Featured Skill Pyspark
    Amazon Web Services
    RabbitMQ
    Node.js
    Amazon S3
    JavaScript
    PySpark
    Databricks Platform
    Apache Airflow
    SQL
    Python
    ETL Pipeline
    Kubernetes
    Docker
    Java
    Apache Spark
  • $40 hourly
    Professional Summary Versatile Solution Architect and Developer with 17 years of extensive IT experience across multiple domains including healthcare, manufacturing, telecom, banking, insurance, retail, e-commerce, energy, government, and education. Proven track record delivering enterprise-grade solutions in data engineering, cloud infrastructure, and application development. Expert in AWS, Azure, GCP, Oracle, SQL Server, Snowflake, Python, Java, Spark, Databricks, and modern AI/ML technologies. I provide comprehensive end-to-end solutions with meticulous attention to performance optimization and scalability, ensuring detailed documentation and adherence to industry best practices. Core Competencies Data Engineering & Analytics Expert in Apache Spark/PySpark, Databricks, AWS Glue, Azure Data Factory, Informatica, Kafka, NiFi, Airflow, and Python data stack (pandas, numpy, polars, scikit-learn). Delivered enterprise-scale ETL pipelines and real-time data streaming solutions across multiple domains. Data Warehousing & Database Technologies Extensive experience with modern data platforms including Snowflake, Redshift, BigQuery, Synapse, Oracle, SQL Server, PostgreSQL, MySQL, MongoDB, Cassandra, Redis, DynamoDB, time-series databases (InfluxDB, Timestream), graph databases (Neo4j, Neptune), and vector databases (Pinecone, Weaviate). Expert in advanced data modeling, MPP optimization, and SQL performance tuning for analytical workloads. Cloud & Infrastructure Certified cloud architect with hands-on expertise across AWS (EMR, Glue, Athena, Redshift, Lambda, S3), Azure (Databricks, Synapse, Data Lake, Functions, Cosmos DB), and GCP (BigQuery, Dataflow, Dataproc, Spanner). Proficient in Infrastructure as Code (Terraform, CloudFormation), serverless architectures, and multi-cloud strategy planning. AI/ML Implementation Implemented end-to-end ML solutions using TensorFlow, PyTorch, and JAX frameworks with MLOps pipelines (MLflow, Kubeflow, SageMaker). Experience in feature engineering, LLM fine-tuning, and production model deployment across healthcare analytics, fraud detection systems, recommendation engines, and predictive maintenance applications. Development & Programming Proficient in multiple programming languages (Python, Java, JavaScript/TypeScript, C#, SQL, Bash/PowerShell) with expertise in web technologies (REST/GraphQL APIs, microservices), DevOps practices (CI/CD, Docker, Kubernetes), and comprehensive testing frameworks. Developed and maintained mission-critical applications across diverse industry verticals. Service Offering As a Solution Architect and Developer, I provide: Comprehensive solution design documents Production-ready code implementation Performance optimization and tuning Testing strategy and implementation Deployment automation Scalability planning Knowledge transfer and documentation Domain Expertise Over 17 years of implementing technology solutions across diverse industries: Healthcare: Patient data management, claims processing, clinical analytics Banking & Finance: Transaction processing, fraud detection, regulatory compliance Telecom: Customer data management, network analytics, billing systems Manufacturing: Supply chain optimization, IoT integration, predictive maintenance Retail & E-commerce: Inventory management, recommendation engines, customer analytics Insurance: Risk assessment, claims processing, policy management Energy & Utilities: Smart grid solutions, consumption analytics Government: Secure data management, compliance reporting Transportation & Logistics: Route optimization, fleet management Approach I prioritize understanding business requirements before recommending technology solutions. My process involves thorough requirements gathering, architecture design, implementation planning, and continuous validation to ensure solutions meet both current needs and future scalability requirements. Ready to tackle challenging projects as a contractor, consultant, or project-based freelancer. I deliver production-ready solutions with comprehensive documentation, knowledge transfer, and ongoing support. Available for remote collaboration worldwide with flexible scheduling options.
    Featured Skill Pyspark
    Data Integration
    Data Ingestion
    Java
    Perl
    Redis
    PySpark
    Databricks Platform
    Amazon Redshift
    Snowflake
    MLOps
    SQL
    Microsoft Azure
    Amazon Web Services
    Informatica Cloud
    Python
  • $20 hourly
    Very well understand your bussiness need. Also find Problem in your bussiness using your past data. Find new way or create new way for problem solution.
    Featured Skill Pyspark
    Snowflake
    PySpark
    Databricks Platform
    Weka
    Apache Spark MLlib
    Data Science
    Data Mining
    Oracle PLSQL
    Apache Kafka
    Scala
    Python
    SQL
    Microsoft SQL Server
    Spring Framework
    Apache Spark
  • $15 hourly
    Dedicated Data Engineer with 3.5 years of hands-on experience in building and maintaining data systems. Skilled in using SQL, Python, and ETL tools to handle and process large volumes of data. Experienced with cloud services like Azure, and knowledgeable in data warehousing and real-time data processing. Strong problem-solving abilities with a focus on ensuring data quality, security, and efficiency. Committed to supporting data-driven decisions and enhancing business intelligence through reliable data solutions
    Featured Skill Pyspark
    Databricks Platform
    Data Model
    Interactive Data Visualization
    Data Warehousing & ETL Software
    ETL
    Data Warehousing
    Data Modeling
    SQL
    Python
    PySpark
    Microsoft Power BI Development
    Microsoft Power BI Data Visualization
    Microsoft Power BI
  • $15 hourly
    I am a dedicated Data Engineer with over 6+ years of experience in designing and building data pipelines, data warehouses, and data lakes. My expertise includes ETL processes using AWS services, Databricks, and other cutting-edge technologies. I have a proven track record of transforming complex business requirements into scalable and efficient data solutions. I am proficient in Python, SQL, and cloud platforms like AWS and Azure, and have extensive experience in optimizing data workflows and ensuring data quality. Key Skills: - ETL & ELT Processes: Expertise in extracting, transforming, and loading data using modern tools and techniques. - Data Warehousing: Proficient in designing and managing data warehouses for large-scale data storage and retrieval. - AWS Services: Skilled in using AWS services like S3, EC2, Lambda, Redshift, Athena, and Glue for various data engineering tasks. - Databricks & Airflow: Experienced in building and orchestrating data workflows using Databricks and Airflow. - Programming Languages: Proficient in Python, SQL, and PySpark for data manipulation and analysis. - Cloud Platforms: Extensive experience with AWS and Azure for cloud-based data solutions. - Big Data Technologies: Knowledgeable in big data tools and technologies for handling large datasets. - Process Improvement: Skilled in using tools like GitHub and Bitbucket for version control and process improvement. Key Projects: Item Maintenance: - Created a real-time process to gather and transform item-related data from Oracle UCM using Databricks and AWS services. - Technologies: AWS S3, Databricks, Elasticsearch. Visibility Report: - Developed a real-time ERP process to gather supplier and item data, applied business rules, and exposed data to Elasticsearch and Unity Catalog for dashboard creation. - Technologies: AWS S3, Databricks, PowerBI. Oasis Claims Data Analytics: - Built custom queries and modules to process patient medical data, optimize Spark jobs, and create ETL pipelines using Airflow. - Technologies: PySpark, SQL, AWS Athena. Education: - M.Sc. Computer Science - University of Pune, 2020 (9.45 CGPA) - B.Sc. Computer Science - University of Pune, 2018 (81%) I am passionate about leveraging data to drive business insights and look forward to collaborating on projects that require innovative data solutions. Let’s work together to turn your data into a strategic asset!
    Featured Skill Pyspark
    Microsoft Power BI
    Looker Studio
    Amazon RDS
    AWS CloudFormation
    AWS Lambda
    DevOps
    Big Data
    GitHub
    Databricks Platform
    PySpark
    Apache Airflow
    Python
    SQL
    AWS Glue
    ETL Pipeline
  • $17 hourly
    Welcome to my profile! With over 5 years of hands-on experience in cloud technology, specializing in AWS and Azure, I am a dedicated Data Engineer passionate about transforming complex data into actionable insights. I thrive on designing and implementing scalable solutions that drive organizational growth and efficiency. As a Cloud Data Engineer, I possess a deep understanding of cloud architecture, data storage, and processing frameworks. My expertise extends to AWS services, such as EC2, S3, Redshift, Glue, and Lambda, as well as Azure services, including Azure Data Factory, Azure Databricks, and Azure SQL Database. I leverage these tools and technologies to build robust data pipelines, optimize data ingestion, and ensure data integrity. Throughout my career, I have successfully executed end-to-end data engineering projects, collaborating with cross-functional teams to deliver high-quality solutions. I have a proven track record of designing and implementing data warehouses, data lakes, and ETL processes to enable efficient data management and analysis. In previous engagements, I have tackled complex challenges, such as data integration across multiple systems, real-time data processing, and implementing scalable architectures to handle large volumes of data. I am skilled in transforming raw data into meaningful insights using SQL, Python, and other relevant programming languages. My commitment to delivering excellence is complemented by my ability to understand business requirements and translate them into technical solutions. I prioritize performance, security, and cost optimization in every project, ensuring that my clients achieve their desired outcomes while maximizing ROI. Client satisfaction is at the core of my work philosophy. I communicate effectively, maintain regular progress updates, and actively seek client feedback to ensure alignment and exceed expectations. I am committed to fostering long-term partnerships and providing ongoing support to my clients. I hold a Certification in AWS and continually expand my knowledge through professional development initiatives, staying up-to-date with the latest advancements in cloud technology and data engineering. If you are seeking a dedicated Cloud Data Engineer who can drive your data initiatives forward, I am ready to collaborate with you. Let's discuss your project requirements and how I can leverage my expertise to deliver exceptional results. Contact me now to get started!"
    Featured Skill Pyspark
    Apache Spark
    Databricks Platform
    Data Analysis
    Git
    Microsoft Azure
    AWS Glue
    Database Modeling
    Data Cleaning
    AWS IoT Analytics
    PySpark
    AWS Lambda
    Spreadsheet Software
    Amazon Redshift
    Apache Kafka
    Data Scraping
    Amazon S3
    Microsoft Azure SQL Database
    Amazon EC2
    Data Lake
    SQL
    Python
  • Want to browse more freelancers?
    Sign up

How hiring on Upwork works

1. Post a job

Tell us what you need. Provide as many details as possible, but don’t worry about getting it perfect.

2. Talent comes to you

Get qualified proposals within 24 hours, and meet the candidates you’re excited about. Hire as soon as you’re ready.

3. Collaborate easily

Use Upwork to chat or video call, share files, and track project progress right from the app.

4. Payment simplified

Receive invoices and make payments through Upwork. Only pay for work you authorize.

Trusted by

How do I hire a Pyspark Developer near Pune, on Upwork?

You can hire a Pyspark Developer near Pune, on Upwork in four simple steps:

  • Create a job post tailored to your Pyspark Developer project scope. We’ll walk you through the process step by step.
  • Browse top Pyspark Developer talent on Upwork and invite them to your project.
  • Once the proposals start flowing in, create a shortlist of top Pyspark Developer profiles and interview.
  • Hire the right Pyspark Developer for your project from Upwork, the world’s largest work marketplace.

At Upwork, we believe talent staffing should be easy.

How much does it cost to hire a Pyspark Developer?

Rates charged by Pyspark Developers on Upwork can vary with a number of factors including experience, location, and market conditions. See hourly rates for in-demand skills on Upwork.

Why hire a Pyspark Developer near Pune, on Upwork?

As the world’s work marketplace, we connect highly-skilled freelance Pyspark Developers and businesses and help them build trusted, long-term relationships so they can achieve more together. Let us help you build the dream Pyspark Developer team you need to succeed.

Can I hire a Pyspark Developer near Pune, within 24 hours on Upwork?

Depending on availability and the quality of your job post, it’s entirely possible to sign up for Upwork and receive Pyspark Developer proposals within 24 hours of posting a job description.