Hire the best Pyspark developers

Check out Pyspark developers with the skills you need for your next job.
Clients rate Pyspark developers
Rating is 4.7 out of 5.
4.7/5
based on 169 client reviews
  • $35 hourly
    Seasoned data engineer with over 11 years of experience in building sophisticated and reliable ETL applications using Big Data and cloud stacks (Azure and AWS). TOP RATED PLUS . Collaborated with over 20 clients, accumulating more than 2000 hours on Upwork. ๐Ÿ† Expert in creating robust, scalable and cost-effective solutions using Big Data technologies for past 9 years. ๐Ÿ† The main areas of expertise are: ๐Ÿ“ Big data - Apache Spark, Spark Streaming, Hadoop, Kafka, Kafka Streams, Trino, HDFS, Hive, Solr, Airflow, Sqoop, NiFi, Flink ๐Ÿ“ AWS Cloud Services - AWS S3, AWS EC2, AWS Glue, AWS RedShift, AWS SQS, AWS RDS, AWS EMR ๐Ÿ“ Azure Cloud Services - Azure Data Factory, Azure Databricks, Azure HDInsights, Azure SQL ๐Ÿ“ Google Cloud Services - GCP DataProc ๐Ÿ“ Search Engine - Apache Solr ๐Ÿ“ NoSQL - HBase, Cassandra, MongoDB ๐Ÿ“ Platform - Data Warehousing, Data lake ๐Ÿ“ Visualization - Power BI ๐Ÿ“ Distributions - Cloudera ๐Ÿ“ DevOps - Jenkins ๐Ÿ“ Accelerators - Data Quality, Data Curation, Data Catalog
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    SQL
    AWS Glue
    PySpark
    Apache Cassandra
    ETL Pipeline
    Apache Hive
    Apache NiFi
    Apache Kafka
    Big Data
    Apache Hadoop
    Scala
    Apache Spark
  • $150 hourly
    I'm a seasoned Data Engineer with 10+ years of experience specializing in end-to-end data platform development and AI integration solutions. My expertise spans: ๐Ÿ—๏ธ Building scalable data platforms using AWS, GCP, Snowflake, and Databricks โšก Optimizing ETL/ELT pipelines that process billions of records ๐Ÿง  Integrating AI/ML solutions with enterprise data systems ๐Ÿ’ก Advising startups on cost-effective, scalable data strategies Notable Achievements: ๐ŸŒŸ โšก Improved ETL performance by 60-70% using 3-phase optimization strategy ๐Ÿ“ˆ Migrated ~8 billion Tracking Click Events via Spark Jobs & AWS EMR ๐Ÿค– Developed end-to-end Data Platform Medallion Architecture using Delta Lake ๐Ÿ’ช Increased team productivity by 30% through Python automation I don't just write code โ€“ I architect complete solutions considering performance, cost, and reliability. My approach includes: ๐Ÿ“‹ Production-ready code with proper architecture ๐Ÿ“š Comprehensive documentation and knowledge transfer ๐Ÿ› ๏ธ One week post-deployment support ๐Ÿ’ฌ Regular communication and progress updates Whether you need data pipeline optimization, AI integration, or strategic guidance, I deliver solutions that scale with your business. Technical Stack ๐Ÿ› ๏ธ: Python, Spark, dbt, Airflow, AWS, GCP, Snowflake, Databricks, Docker, Kubernetes, ChatGPT/LLM Integration Let's discuss how I can help transform your data challenges into opportunities! ๐Ÿš€
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Fivetran
    Data Engineering
    Databricks Platform
    Snowflake
    Data Lake
    Data Annotation
    Data Integration
    ETL Pipeline
    Data Analysis
    Machine Learning
    Big Data
    Python
    Apache Kafka
    Amazon Web Services
    Apache Spark
  • $30 hourly
    I am a certified AWS Machine Learning Specialist and have 7 years of experience in building data engineering pipelines and machine learning models.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Data Warehousing & ETL Software
    Cloud Computing
    Amazon Web Services
    Microsoft SQL Server
    dbt
    Snowflake
    PySpark
    SQL
    ETL Pipeline
    Apache Spark
    Apache Airflow
    Python
    Data Engineering
    Databricks Platform
    AWS Glue
  • $60 hourly
    ๐Ÿ… Top 1% Expert Vetted Talent ๐Ÿ… 5โ˜… Service, 100% Customer Satisfaction, Guaranteed FAST & on-time delivery ๐Ÿ† Experience building enterprise data solutions and efficient cloud architecture ๐Ÿ… Expert Data Engineer with over 13 years of experience As an Expert Data Engineer with over 13 years of experience, I specialize in turning raw data into actionable intelligence. My expertise lies in Data Engineering, Solution Architecture, and Cloud Engineering, with a proven track record of designing and managing multi-terabyte to petabyte-scale Data Lakes and Warehouses. I excel in designing & developing complex ETL pipelines, and delivering scalable, high-performance, and secure data solutions. My hands-on experience with data integration tools in AWS, and certifications in Databricks ensure efficient and robust data solutions for my clients. In addition to my data specialization, I bring advanced proficiency in AWS and GCP, crafting scalable and secure cloud infrastructures. My skills extend to full stack development, utilizing Python, Django, ReactJS, VueJS, Angular, and Laravel, along with DevOps tools like Docker, Kubernetes, and Jenkins for seamless integration and continuous deployment. I have collaborated extensively with clients in the US and Europe, consistently delivering high-quality work, effective communication, and meeting stringent deadlines. A glimpse of a recent client review: โญโญโญโญโญ "Abdulโ€™s deep understanding of business logic, data architecture, and coding best practices is truly impressive. His submissions are invariably error-free and meticulously clean, a testament to his commitment to excellence. Abdulโ€™s proficiency with AWS, Apache Spark, and modern data engineering practices has significantly streamlined our data operations, making them more efficient and effective. In conclusion, Abdul is an invaluable asset โ€“ a fantastic data engineer and solution architect. His expertise, dedication, and team-oriented approach have made a positive impact on our organization." โญโญโญโญโญ "Strong technical experience, great English communications skills. Realistic project estimates." โญโญโญโญโญ "Qualified specialist in his field. Highly recommended." โœ… Certifications: โ€” Databricks Certified Data Engineer Professional โ€” Databricks Certified Associate Developer for Apache Spark 3.0 โ€” CCA Spark and Hadoop Developer โ€” Oracle Data Integrator 12c Certified Implementation Specialist โœ… Key Skills and Expertise: โšก๏ธ Data Engineering: Proficient in designing multi-terabyte to petabyte-scale Data Lakes and Warehouses, utilizing tools like Databricks, Spark, Redshift, Hive, Hadoop, Snowflake. โšก๏ธ Cloud Infrastructure & Architecture: Advanced skills in AWS and GCP, delivering scalable and secure cloud solutions. โšก๏ธ Cost Optimization: Implementing strategies to reduce cloud infrastructure costs significantly. โœ… Working Hours: - 4AM to 4PM (CEST) - 7PM to 7AM (PDT) - 10PM - 10AM (EST) โœ… Call to Action: If you are looking for a dedicated professional to help you harness the power of AWS and optimize your cloud infrastructure, I am here to help. Let's collaborate to achieve your technological goals.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Amazon Web Services
    Apache Hive
    Apache Hadoop
    Microsoft Azure
    Snowflake
    BigQuery
    Apache Kafka
    Data Warehousing
    Apache Spark
    Django
    Databricks Platform
    Python
    ETL
    SQL
  • $35 hourly
    Seasoned Senior Data Engineer with 10 years' expertise crafting and implementing sophisticated data enrichment solutions. Proficient in developing and architecting robust data systems within production environments, utilizing an array of data engineering tools such as Python, SQL, Pyspark, Scala, and more. Specialized in constructing top-tier ETL Pipelines leveraging airflow, AWS Glue, and Apache Spark for seamless data processing. Proficiency in building and managing CI/CD pipelines, automating deployment workflows, and ensuring seamless integration and delivery of data engineering solutions. Extensive proficiency in leveraging cloud-based technologies within the AWS ecosystemโ€”expertise spans S3, Glue, EMR, Athena, Redshift, Lambda functions, and RDS. Proficiently design and extract data from diverse sources, optimizing it for Data Scientists' use in constructing machine learning models to predict various customer-centric scenarios. Adept at remote work environments, delivering consistent excellence in collecting, analyzing, and interpreting extensive datasets. Skilled in data pipeline development using Spark, managing data across DWH, Data Marts, and Data Cubes within SQL, NO-SQL, and Hadoop-based systems. Proficient in building Python scrapers via Scrapy and Beautiful Soup to streamline data acquisition processes. Extensive freelance experience has broadened my expertise, enabling me to collaborate with diverse clients on challenging data engineering projects. This exposure has strengthened my capabilities and equipped me to tackle any forthcoming challenges as a seasoned data engineer.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    API Integration
    Amazon Athena
    Data Modeling
    AWS Lambda
    Amazon Web Services
    ETL Pipeline
    Amazon Redshift
    ETL
    Data Ingestion
    PySpark
    AWS Glue
    Apache Spark
    Python
    Apache Kafka
    SQL
  • $20 hourly
    ๐Ÿ‘‹ Greetings, With 5 years of experience, I specialize in delivering high-quality, scalable AI and data science solutions, especially in Azure Cloud. As a top-rated Upwork freelancer, Iโ€™ve spent the last year empowering small businesses by turning complex data into actionable insights that drive growth. My expertise spans key industries, including Telecom, Customer Success, and Healthcare. I thrive on solving intricate challenges, transforming raw data into innovative, AI-powered solutions that deliver real business value. โœ”๏ธ ๐—ฃ๐—ฟ๐—ผ๐˜ƒ๐—ฒ๐—ป ๐—˜๐˜…๐—ฝ๐—ฒ๐—ฟ๐˜๐—ถ๐˜€๐—ฒ: ๐— ๐—ฎ๐—ฐ๐—ต๐—ถ๐—ป๐—ฒ ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด ๐—ฆ๐—ผ๐—น๐˜‚๐˜๐—ถ๐—ผ๐—ป๐˜€: CSAT score prediction, Customer churn prediction, Propensity to buy, Product price prediction, Campaign recommendation. ๐—•๐—œ ๐——๐—ฎ๐˜€๐—ต๐—ฏ๐—ผ๐—ฎ๐—ฟ๐—ฑ๐˜€: Developing insightful and interactive dashboards with Power BI to visualize business KPIs and data trends effectively. ๐—”๐—œ ๐—–๐—ต๐—ฎ๐˜๐—ฏ๐—ผ๐˜๐˜€ ๐—ณ๐—ผ๐—ฟ ๐——๐—ผ๐—บ๐—ฎ๐—ถ๐—ป-๐—ฆ๐—ฝ๐—ฒ๐—ฐ๐—ถ๐—ณ๐—ถ๐—ฐ ๐——๐—ฎ๐˜๐—ฎ๐—ฏ๐—ฎ๐˜€๐—ฒ๐˜€: Pull insights out of complex databases using Natural Language. ๐—”๐—œ ๐—”๐˜€๐˜€๐—ถ๐˜€๐˜๐—ฎ๐—ป๐˜๐˜€ ๐—ณ๐—ผ๐—ฟ ๐—ฆ๐˜๐—ฟ๐˜‚๐—ฐ๐˜๐˜‚๐—ฟ๐—ฒ๐—ฑ & ๐—จ๐—ป๐˜€๐˜๐—ฟ๐˜‚๐—ฐ๐˜๐˜‚๐—ฟ๐—ฒ๐—ฑ ๐——๐—ผ๐—ฐ๐˜€: Simplifying interactions with complex texts and scanned documents. ๐—œ๐—ป๐—ณ๐—ผ๐—ฟ๐—บ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—˜๐˜…๐˜๐—ฟ๐—ฎ๐—ฐ๐˜๐—ถ๐—ผ๐—ป ๐—”๐—ฝ๐—ฝ๐—น๐—ถ๐—ฐ๐—ฎ๐˜๐—ถ๐—ผ๐—ป๐˜€: Automating extraction from detailed documents. ๐—ฉ๐—ถ๐—ฟ๐˜๐˜‚๐—ฎ๐—น ๐—”๐˜€๐˜€๐—ถ๐˜€๐˜๐—ฎ๐—ป๐˜๐˜€: Supporting customer service centers and consumers. ๐—”๐—œ-๐—•๐—ฎ๐˜€๐—ฒ๐—ฑ ๐—–๐—ผ๐—ป๐˜ƒ๐—ฒ๐—ฟ๐˜€๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—ฆ๐˜‚๐—บ๐—บ๐—ฎ๐—ฟ๐—ถ๐˜‡๐—ฒ๐—ฟ & ๐—ฅ๐—ฒ๐—ฐ๐—ผ๐—บ๐—บ๐—ฒ๐—ป๐—ฑ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—˜๐—ป๐—ด๐—ถ๐—ป๐—ฒ ๐—”๐—œ-๐——๐—ฟ๐—ถ๐˜ƒ๐—ฒ๐—ป ๐——๐—ฎ๐˜๐—ฎ ๐—ฉ๐—ถ๐˜€๐˜‚๐—ฎ๐—น๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป: Interactive charts powered by natural language queries. ๐—ฅ๐—ฒ๐—ฎ๐—น-๐—ง๐—ถ๐—บ๐—ฒ ๐—”๐—ป๐—ผ๐—บ๐—ฎ๐—น๐˜† ๐——๐—ฒ๐˜๐—ฒ๐—ฐ๐˜๐—ถ๐—ผ๐—ป: Identifying log data anomalies using Azure Data Explorer. โœ”๏ธ ๐—ฆ๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฐ๐—ฒ๐˜€ ๐—œ ๐—ข๐—ณ๐—ณ๐—ฒ๐—ฟ: ๐——๐—ฎ๐˜๐—ฎ ๐—ฆ๐—ฐ๐—ถ๐—ฒ๐—ป๐—ฐ๐—ฒ & ๐— ๐—ฎ๐—ฐ๐—ต๐—ถ๐—ป๐—ฒ ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด ๐—ฆ๐—ผ๐—น๐˜‚๐˜๐—ถ๐—ผ๐—ป๐˜€: Unlock your dataโ€™s potential with tailored, high-impact machine learning solutions. ๐—”๐—œ ๐—ฃ๐—ฟ๐—ผ๐—ฑ๐˜‚๐—ฐ๐˜ ๐——๐—ฒ๐˜ƒ๐—ฒ๐—น๐—ผ๐—ฝ๐—บ๐—ฒ๐—ป๐˜: From AI ideas to production-ready applications, I build POCs, MVPs, and virtual assistants to enhance customer experience and business growth. ๐——๐—ฎ๐˜๐—ฎ ๐—”๐—ป๐—ฎ๐—น๐˜†๐˜๐—ถ๐—ฐ๐˜€ & ๐—ฉ๐—ถ๐˜€๐˜‚๐—ฎ๐—น๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป: Actionable insights and data storytelling through advanced visualizations with Power BI and Google Looker Studio. ๐——๐—ฎ๐˜๐—ฎ ๐—–๐—น๐—ฒ๐—ฎ๐—ป๐—ถ๐—ป๐—ด & ๐—ง๐—ฟ๐—ฎ๐—ป๐˜€๐—ณ๐—ผ๐—ฟ๐—บ๐—ฎ๐˜๐—ถ๐—ผ๐—ป: Ensure clean, structured, and ready-for-analysis data. ๐—ง๐—ถ๐—บ๐—ฒ ๐—ฆ๐—ฒ๐—ฟ๐—ถ๐—ฒ๐˜€ ๐—™๐—ผ๐—ฟ๐—ฒ๐—ฐ๐—ฎ๐˜€๐˜๐—ถ๐—ป๐—ด & ๐—”๐—ป๐—ผ๐—บ๐—ฎ๐—น๐˜† ๐——๐—ฒ๐˜๐—ฒ๐—ฐ๐˜๐—ถ๐—ผ๐—ป: Anticipate trends and mitigate risks with precise forecasts and anomaly detection. โœ”๏ธ ๐—–๐—ผ๐—ฟ๐—ฒ ๐—ฆ๐—ธ๐—ถ๐—น๐—น๐˜€: Data Analytics & Visualization: Microsoft Power BI, Google Looker Studio Machine Learning: Python, PySpark, Pandas, Numpy, Scikit-learn, LightGBM Big Data Analytics: Azure Databricks, ADLS Gen2, Delta/Parquet tables. Text Analytics & NLP Time Series Analysis & Forecasting Cloud Platforms: Azure Frontend & Backend Development: Django, Next.js, React, Node.js, FastAPI โœ”๏ธ ๐—ง๐—ฒ๐—ฐ๐—ต๐—ป๐—ผ๐—น๐—ผ๐—ด๐˜† ๐—ฆ๐˜๐—ฎ๐—ฐ๐—ธ: AI/ML Tools: LangChain, LlamaIndex, OpenAI APIs, LightGBM, XGBoost, MLFlow DevOps: Docker, Kubernetes Programming: Python, PySpark, JavaScript, SQL, KQL APIs: REST, GraphQL, WebSockets Why Choose Me? I consistently deliver beyond expectations, working collaboratively to achieve your business goals with accessibility, effective communication, and attention to detail. I'm flexible on budget and open to negotiations to ensure long-term success. Looking forward to creating solutions that drive your business forward! NB: Examples of my work are available in the portfolio section.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Time Series Analysis
    OpenAI API
    Data Visualization
    LangChain
    Generative AI
    PySpark
    Kusto Query Language
    Chatbot
    Databricks Platform
    Python
    Microsoft Power BI
    Data Processing
    Machine Learning
    Data Analysis
    Data Science
  • $100 hourly
    I have over 4 years of experience in Data Engineering (especially using Spark and pySpark to gain value from massive amounts of data). I worked with analysts and data scientists by conducting workshops on working in Hadoop/Spark and resolving their issues with big data ecosystem. I also have experience on Hadoop maintenace and building ETL, especially between Hadoop and Kafka. You can find my profile on stackoverflow (link in Portfolio section) - I help mostly in spark and pyspark tagged questions.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    MongoDB
    Data Warehousing
    Data Scraping
    ETL
    Data Visualization
    PySpark
    Python
    Data Migration
    Apache Airflow
    Apache Spark
    Apache Kafka
    Apache Hadoop
  • $45 hourly
    As a highly experienced Data Engineer with over 10+ years of expertise in the field, I have built a strong foundation in designing and implementing scalable, reliable, and efficient data solutions for a wide range of clients. I specialize in developing complex data architectures that leverage the latest technologies, including AWS, Azure, Spark, GCP, SQL, Python, and other big data stacks. My extensive experience includes designing and implementing large-scale data warehouses, data lakes, and ETL pipelines, as well as data processing systems that process and transform data in real-time. I am also well-versed in distributed computing and data modeling, having worked extensively with Hadoop, Spark, and NoSQL databases. As a team leader, I have successfully managed and mentored cross-functional teams of data engineers, data scientists, and data analysts, providing guidance and support to ensure the delivery of high-quality data-driven solutions that meet business objectives. If you are looking for a highly skilled Data Engineer with a proven track record of delivering scalable, reliable, and efficient data solutions, please do not hesitate to contact me. I am confident that I have the skills, experience, and expertise to meet your data needs and exceed your expectations.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Snowflake
    ETL
    PySpark
    MongoDB
    Unix Shell
    Data Migration
    Scala
    Microsoft Azure
    Amazon Web Services
    SQL
    Apache Hadoop
    Cloudera
    Apache Spark
  • $35 hourly
    Over 5 years of working experience in data engineering, ETL, AWS, ML and python. AWS data analytics and machine learning certified.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    OpenAI Embeddings
    Docker
    Terraform
    Amazon ECS
    AWS Lambda
    Amazon Redshift
    Amazon S3
    Amazon Web Services
    Analytics
    PostgreSQL
    PySpark
    SQL
    pandas
    AWS Glue
    Python
  • $70 hourly
    - 10+ years experience at Google/Amazon/Twitter and 5+ years experience at UpWork - Technical skills: Python programming (OOP and functional), Scala, SQL, Spark (PySpark), AWS/GCP/Snowflake and Tableau - Well experienced with the AWS stack (Redshift, PostgreSQL/RDS, DynamoDB, Athena, EMR, Glue, Kinesis, SQS, S3, EC2) and GCP stack (BigQuery etc) for analytics, batch and real time data pipelines, orchestration (Airflow, DBT) and data visualisations - Data Science and Machine Learning expertise (model building, analytics) - NYC/US based, UK born, open to working with international clients
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Business Intelligence
    BigQuery
    Salesforce
    Data Engineering
    API Integration
    AWS Glue
    PySpark
    Amazon Redshift
    AWS Lambda
    Google Cloud Platform
    Amazon Web Services
    SQL
    Tableau
    Python
    Apache Spark
  • $55 hourly
    Unlock Scalable Solutions with a Seasoned Data Engineer & Multi-Cloud Architect Are you looking for a professional who can transform complex data challenges into efficient, scalable solutions across various cloud platforms? With over 10 years of experience in data engineering and cloud architecture, I specialize in creating robust infrastructures that propel businesses forward, whether on AWS, Azure, Google Cloud, or others. ๐Ÿš€ What I Bring to the Table: - Multi-Cloud Mastery (AWS, Azure, GCP): Expert in designing and deploying scalable architectures using leading cloud providers. I optimize cloud environments for performance and cost-efficiency, tailored to your preferred platform. - Advanced Python Development: Proficient in building high-performance applications and automation scripts. My Python expertise ensures your projects are delivered with clean, maintainable code. - API Development Expertise: Skilled in creating efficient and secure APIs that enable seamless integration and communication between your services and applications. - Containerization & Orchestration: Experienced with Docker and Kubernetes, I deploy, scale, and manage applications effortlessly across clusters for optimal performance. - In-Memory Data Solutions: Implementing caching and in-memory data storage to accelerate application responsiveness and handle high-throughput workloads effectively. - Machine Learning Pipelines: Proficient in integrating machine learning models into production environments, helping businesses make data-driven decisions with scalable ML workflows. ๐ŸŒŸ Why Work With Me: - Versatile Problem Solver: My multi-cloud expertise allows me to tackle complex challenges with innovative solutions, regardless of the platform. - Independent & Collaborative: Whether leading a project solo or collaborating with your team, I adapt to meet your needs effectively. - Transparent Communication: I believe in keeping you informed at every stage, ensuring transparency and trust throughout our collaboration. - Results-Oriented: My focus is on delivering tangible results that align with your business objectives. --- ๐Ÿ“ˆ Let's Turn Your Vision into Reality Ready to elevate your project's infrastructure and performance across any cloud platform? Let's have a conversation about how my expertise can contribute to your success. Click the 'Invite' button, and let's get started!
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    PySpark
    API
    AWS Lambda
    Amazon Web Services
    ETL Pipeline
    Apache Spark
    Python
    Scrapy
    Amazon S3
    Data Mining
    AWS Glue
    Apache Airflow
    DevOps
    Docker
    Data Migration
  • $20 hourly
    Experienced Software Engineer with a demonstrated history of working in the information technology and services industry. Skilled in Java, Spring Boot, DevOps, Jenkins, Ansible Eureka, React, and groovy
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Apache NiFi
    Docker
    Linux
    Apache Spark MLlib
    DevOps
    Ansible
    Apache Hadoop
    Big Data
    Apache Spark
    Elasticsearch
    Python
    Cloud Computing
    JavaScript
    Java
  • $40 hourly
    Data Engineer with over 5 years of experience in developing Python-based solutions and leveraging Machine Learning algorithms to address complex challenges. I have a strong background in Data Integration, Data Warehousing, Data Modelling, and Data Quality. I excel at implementing and maintaining both batch and streaming Big Data pipelines with automated workflows. My expertise lies in driving data-driven insights, optimizing processes, and delivering value to businesses through a comprehensive understanding of data engineering principles and best practices. KEY SKILLS Python | SQL | PySpark | JavaScript | Google cloud platform (GCP) | Azure | Amazon web services (AWS) | TensorFlow | Keras | ETL | ELT | DBT | BigQuery | BigTable | Redshift | Snowflake | Data warehouse | Data Lake | Data proc | Data Flow | Data Fusion | Data prep | Pubsub | Looker | Data studio | Data factory | Databricks | Auto ML | Vertex AI | Pandas | Big Data | Numpy | Dask | Apache Beam | Apache Airflow | Azure Synapse | Cloud Data Loss Prevention | Machine Learning | Deep learning | Kafka | Scikit Learn | Data visualisation | Tableau | Power BI | Django | Git | GitLab
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Data Engineering
    dbt
    ETL
    Chatbot
    CI/CD
    Kubernetes
    Docker
    Apache Airflow
    Apache Kafka
    PySpark
    Machine Learning
    Exploratory Data Analysis
    Python
    SQL
    BigQuery
  • $18 hourly
    Greetings! I'm a data enthusiast with over 6+ years of experience and a skill set that combines the art of data extraction, transformation, and loading (ETL) with the power of business intelligence and data modeling. With a passion for turning raw data into actionable insights, I'm here to help businesses navigate the complex world of data. ๐Ÿ”ฌ Expertise Areas ๐Ÿ”ฌ - Data Scrapping - Data Engineering - Business Intelligence - Data Analytics ๐Ÿ’ฌ Languages and Tools ๐Ÿ’ฌ I'm proficient in Python, SQL, PySpark, DAX, and advanced Excel functions. My toolkit includes Power BI, Looker Studio, SSIS, Azure Databricks, Redshift, Snowflake, and more. โ˜๏ธ Cloud Technologies โ˜๏ธ I'm well-versed in Microsoft Azure, AWS, and GCP, making me proficient in cloud-based data solutions. ๐Ÿ’ก Skills Spectrum ๐Ÿ’ก My expertise spans various domains, including automation, consulting strategy, data cleansing, and data assurance for business domains like manufacturing and production, Ed-tech, aerospace, travel and hospitality, and e-commerce. ๐Ÿค Partnering for Success ๐Ÿค If you are invested in optimizing your data, require modern data solutions, and aim to reduce manual dependencies, I'm here to help. I specialize in developing scalable, dynamic, and optimized solutions that reduce time to value and ensure long-term value from your data. Let's connect and explore how we can work together to enhance your data strategies and make informed decisions efficiently. If it sounds like we may be a good fit, feel free to book a consultation with me, explore the packages that fit your requirements, or message me. I would love to learn more about what you are working on and how I can add the most value for you. Thanks and Regards Yash
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Microsoft Power BI Data Visualization
    Data Scraping
    Data Analysis Consultation
    Data Engineering
    Artificial Intelligence
    ETL
    Data Visualization
    Data Science Consultation
    Data Ingestion
    Business Intelligence
    Microsoft Azure
    Python
    SQL
    Node.js
    JavaScript
  • $35 hourly
    Welcome to my Upwork digital portfolio! ๐Ÿ’ฒ๐Ÿ’ฒ๐Ÿ’ฒ UPWORK INSIGHTS ๐Ÿ’ฒ๐Ÿ’ฒ๐Ÿ’ฒ ๐Ÿฅ‡ 20+ Upwork Enterprise Clients ๐Ÿฅ‡ $1,70,000+ Earnings ๐Ÿฅ‡ 5000+ Work hours ๐Ÿฅ‡ 270+ completed jobs ๐Ÿ“ขI am available to discuss requirements/project scope on a Zoom call and I am flexible to work in your preferred TIMEZONE ๐Ÿ“ข For quick insights and to save you time, I have summarized key points of my Business Intelligence, Data Engineering, Data Analysis, and Data Visualization career categorized as follows: 1- Azure Data Factory 2- Data Engineering 3- Data Analytics, Data Analysis 4- Data Visualization 5- Microsoft PowerBI 6- Data Modeling 7- Data Extraction, Data Transformation, Data Loading | ETL | ETL Pipeline 8- Data Warehousing | Data Warehouse | Azure, AWS, GCP, Snowflake 9- Business Intelligence 10- Data Integration, Data Transformation, Data Migration, Database Desing, ETL Pipeline, Database, Database Architecture, Database Administration, Microsoft Excel 11- SQL Programming, Stored Procedures, Adhoc Queries, Correlated queries, CTE, T-SQL, MySQL, SQL Server, SQL 12- Microsoft Azure, AWS, GCP 13- Big data management 14- Data Architecture | Data infrastructure | Data quality assurance My expertise is as follows: โœ… 6+ years of Azure experience โœ”๏ธ Azure Data Warehouse | Azure SQL databases โœ”๏ธ Azure Data Factory โœ”๏ธ Azure Synapse Analytics โœ”๏ธ Microsoft PowerBI | Microsoft Fabric โœ”๏ธ Azure Lakehouse | Azure Function Apps โœ”๏ธ Azure Blob Storage | Azure Data Lake Storage (ADLS) | Azure Logic Apps โœ”๏ธ Azure Virtual Machine | Azure Dedicated SQL Pool | Azure Key Vault | Azure Active Directory โœ”๏ธ MS PowerBI | PowerBI | PowerBI Embedded | PowerBI Pro Hobbies: ๐Ÿ“– Books Reading ๐ŸŽฎ Snooker ๐Ÿ“Table Tennis ๐Ÿ“ขI am available to discuss requirements/project scope on a Zoom call and I am flexible to work in your preferred TIMEZONE ๐Ÿ“ข
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Data Modeling
    Microsoft Power BI Data Visualization
    Microsoft Power BI
    SQL
    Python
    Data Migration
    Data Ingestion
    ETL Pipeline
    ETL
    Data Warehousing
    Database
    Microsoft SQL Server
    Microsoft Azure SQL Database
    Data Warehousing & ETL Software
    Data Engineering
  • $85 hourly
    I work with ambitious innovators and entrepreneurs to design, prototype and build AI powered applications, fast. After leading the AI solutions engineering team at a Silicon Valley AI Startup, backed by Google Ventures, I know how to take your vision and turn it into AI solutions that result in real profits. After working in development for 10 years, Iโ€™ve seen 3 problems with how other providers typically run projects... 1. They overwhelm clients with technical jargon 2. They say "yes" to everything, over-promise, under deliver and overrun 3. They delay projects by focusing on shiny objects instead of real business results When this happens: - Projects take way longer than planned - Projects cost way more than quoted - Time is wasted on alignment In reality, finding a partner with the end to end practical experience needed to design and launch AI products is harder than youโ€™d think... But, when your AI project succeeds the pay off is immense I've seen it first hand... Here's a few reasons that I'm the right person to turn your great idea into an roaring success: โœ… Top 1% on Upwork as a Top-Rated Plus and Expert Vetted Freelancer โœ… 9+ Years in Full Stack Mobile and Web Development โœ… Helped 26+ Businesses Scale with my Solutions โœ… Google Ventures Startup Experience: 2+ Years of Cutting-Edge AI Work โœ… IBM Certified Expertise in Enterprise Design Thinking โœ… London Based And here's some practical examples of the outcomes my solutions have achieved: The last startup I was at raised over $50m from Google and Sequoia and was valued at $400m.ย  They just got bought by Steve Wozniak's company (Apple co-Founder) Since then I have: + Helped 2 startups raise 7-figures with clickable web app prototypes + Built an entire warehouse management platform for a US logistics company expanding to Europe. It syncs with their US operations and shipping providers + Prototyped an app for Canada's leading security tech company. They used it to land the biggest contract in company history. Then, I built them a full stack web application to visualise 1.8m crimes + Built a web app for a Supermajor energy trading team to monitor 136 petrochemical sites with connected car data and alert them to shutdowns that would move commodity prices + Developed an AI pipeline to detect deforestation across the whole of Indonesia (1.9m sqkm) for the worlds biggest CPG company + Developed a web app for an industrial insurance company to analyse thousands of invoices using OCR and a fine tuned AI model, freeing up 200+ hours per claim + Built a web MVP for a startup that uses gen AI to create thousands of personalised videos with AI avatars in multiple languages + Engineered a PDF AI extraction internal web platform for a US real estate law firm. Freeing them time to focus on more important tasks + Worked with a Japanese Bank to develop an application using cellular footfall data and financial inputs, to optimise how much cash to leave in US ATMs + Built an app for a London property company that automated their legal letter generation and streamlined surveys saving 60+ hours per week And, thereโ€™s more. Lots more... Feel free to have a look at my case studies and customer reviews below. With 10 years in engineering, and 2.5 years leading the AI solutions engineering team at a Google Ventures startup... I've delivered multi-million dollar projects for Fortune 500 companies So, why does this matter for your project? I know how to turn your idea into a functional AI application, fast. This means you get a solution that works In the shortest time possible Without any surprises Plus, I bring industry leading Silicon Valley expertise to your team What's more, I can prototype your idea in just 5 days... Why? Because nothing helps you get buy-in, investment and your first customers like a beautifully crafted, clickable prototype Here is how I work with clients: 1. You get hours of research before I even give you a proposal 2. You get access to top silicon valley talent, immediately 3. You get business results, fast Typically the first step is a quick call In the call we'll talk through your project goals and some discovery questions After the call, I'll give you a full project plan, scope and timelines and you can decide if you want to move forward If you're interested, let me know and I will share my calendar to find a time that works for us to meet.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    ETL Pipeline
    Data Lake
    Data Warehousing
    Apache Airflow
    Data Engineering
    Flask
    FastAPI
    Django
    Databricks Platform
    Apache Spark
    Apache Kafka
    Amazon Web Services
    Artificial Intelligence
    API Integration
    Python
  • $20 hourly
    Data Scientist having an experience of around 6 years. Worked on projects like price optimization,space optimization for Retail clients. Worked for Bank of America as ML Engineer and currently serving Telecom in their ML and Data Science things
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Microsoft Power BI
    Statistics
    Apache Cassandra
    MySQL Programming
    MySQL
    PySpark
    PostgreSQL
    Predictive Analytics
    Tableau
    Machine Learning
    Classification
    Natural Language Processing
    Logistic Regression
    Data Science
    Python
  • $70 hourly
    ๐ŸŽ“ ๐—–๐—ฒ๐—ฟ๐˜๐—ถ๐—ณ๐—ถ๐—ฒ๐—ฑ ๐——๐—ฎ๐˜๐—ฎ ๐—ฃ๐—ฟ๐—ผ๐—ณ๐—ฒ๐˜€๐˜€๐—ถ๐—ผ๐—ป๐—ฎ๐—น with ๐Ÿฒ+ ๐˜†๐—ฒ๐—ฎ๐—ฟ๐˜€ of experience and hands-on expertise in Designing and Implementing Data Solutions. ๐Ÿ”ฅ 4+ Startup Tech Partnerships โญ๏ธ 100% Job Success Score ๐Ÿ† In the top 3% of all Upwork freelancers with Top Rated Plus ๐Ÿ† โœ… Excellent communication skills and fluent English If youโ€™re reading my profile, youโ€™ve got a challenge you need to solve and you are looking for someone with a broad skill set, minimal oversight and ownership mentality, then Iโ€™m your go-to expert. ๐Ÿ“ž Connect with me today and let's discuss how we can turn your ideas into reality with creative and strategic partnership.๐Ÿ“ž โšก๏ธInvite me to your job on Upwork to schedule a complimentary consultation call to discuss in detail the value and strength I can bring to your business, and how we can create a tailored solution for your exact needs. ๐™„ ๐™๐™–๐™ซ๐™š ๐™š๐™ญ๐™ฅ๐™š๐™ง๐™ž๐™š๐™ฃ๐™˜๐™š ๐™ž๐™ฃ ๐™ฉ๐™๐™š ๐™›๐™ค๐™ก๐™ก๐™ค๐™ฌ๐™ž๐™ฃ๐™œ ๐™–๐™ง๐™š๐™–๐™จ, ๐™ฉ๐™ค๐™ค๐™ก๐™จ ๐™–๐™ฃ๐™™ ๐™ฉ๐™š๐™˜๐™๐™ฃ๐™ค๐™ก๐™ค๐™œ๐™ž๐™š๐™จ: โ–บ BIG DATA & DATA ENGINEERING Apache Spark, Hadoop, MapReduce, YARN, Pig, Hive, Kudu, HBase, Impala, Delta Lake, Oozie, NiFi, Kafka, Airflow, Kylin, Druid, Flink, Presto, Drill, Phoenix, Ambari, Ranger, Cloudera Manager, Zookeeper, Spark-Streaming, Streamsets, Snowflake โ–บ CLOUD AWS -- EC2, S3, RDS, EMR, Redshift, Lambda, VPC, DynamoDB, Athena, Kinesis, Glue GCP -- BigQuery, Dataflow, Pub/Sub, Dataproc, Cloud Data Fusion Azure -- Data Factory, Synapse. HDInsight โ–บ ANALYTICS, BI & DATA VISUALIZATION Tableau, Power BI, SSAS, SSMS, Superset, Grafana, Looker โ–บ DATABASE SQL, NoSQL, Oracle, SQL Server, MySQL, PostgreSQL, MongoDB, PL/SQL, HBase, Cassandra โ–บ OTHER SKILLS & TOOLS Docker, Kubernetes, Ansible, Pentaho, Python, Scala, Java, C, C++, C# ๐™’๐™๐™š๐™ฃ ๐™ฎ๐™ค๐™ช ๐™๐™ž๐™ง๐™š ๐™ข๐™š, ๐™ฎ๐™ค๐™ช ๐™˜๐™–๐™ฃ ๐™š๐™ญ๐™ฅ๐™š๐™˜๐™ฉ: ๐Ÿ”ธ Outstanding results and service ๐Ÿ”ธ High-quality output on time, every time ๐Ÿ”ธ Strong communication ๐Ÿ”ธ Regular & ongoing updates Your complete satisfaction is what I aim for, so the job is not complete until you are satisfied! Whether you are a ๐—ฆ๐˜๐—ฎ๐—ฟ๐˜๐˜‚๐—ฝ, ๐—˜๐˜€๐˜๐—ฎ๐—ฏ๐—น๐—ถ๐˜€๐—ต๐—ฒ๐—ฑ ๐—•๐˜‚๐˜€๐—ถ๐—ป๐—ฒ๐˜€๐˜€ ๐—ผ๐—ฟ ๐—น๐—ผ๐—ผ๐—ธ๐—ถ๐—ป๐—ด ๐—ณ๐—ผ๐—ฟ your next ๐— ๐—ฉ๐—ฃ, you will get ๐—›๐—ถ๐—ด๐—ต-๐—ค๐˜‚๐—ฎ๐—น๐—ถ๐˜๐˜† ๐—ฆ๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฐ๐—ฒ๐˜€ at an ๐—”๐—ณ๐—ณ๐—ผ๐—ฟ๐—ฑ๐—ฎ๐—ฏ๐—น๐—ฒ ๐—–๐—ผ๐˜€๐˜, ๐—š๐˜‚๐—ฎ๐—ฟ๐—ฎ๐—ป๐˜๐—ฒ๐—ฒ๐—ฑ. I hope you become one of my many happy clients. Reach out by inviting me to your project. I look forward to it! All the best, Anas โญ๏ธโญ๏ธโญ๏ธโญ๏ธโญ๏ธ ๐Ÿ—ฃโ Muhammad is really great with AWS services and knows how to optimize each so that it runs at peak performance while also minimizing costs. Highly recommended! โž โญ๏ธโญ๏ธโญ๏ธโญ๏ธโญ๏ธ ๐Ÿ—ฃโ You would be silly not to hire Anas, he is fantastic at data visualizations and data transformation. โž ๐Ÿ—ฃโ Incredibly talented data architect, the results thus far have exceeded our expectations and we will continue to use Anas for our data projects. โž โญ๏ธโญ๏ธโญ๏ธโญ๏ธโญ๏ธ ๐Ÿ—ฃโ The skills and expertise of Anas exceeded my expectations. The job was delivered ahead of schedule. He was enthusiastic and professional and went the extra mile to make sure the job was completed to our liking with the tech that we were already using. I enjoyed working with him and will be reaching out for any additional help in the future. I would definitely recommend Anas as an expert resource. โž โญ๏ธโญ๏ธโญ๏ธโญ๏ธโญ๏ธ ๐Ÿ—ฃโ Muhammad was a great resource and did more than expected! I loved his communication skills and always kept me up to date. I would definitely rehire again. โž โญ๏ธโญ๏ธโญ๏ธโญ๏ธโญ๏ธ ๐Ÿ—ฃโ Anas is simply the best person I have ever come across. Apart from being an exceptional tech genius, he is a man of utmost stature. We blasted off with our startup, high on dreams and code. We were mere steps from the MVP. Then, pandemic crash. Team bailed, funding dried up. Me and my partner were stranded and dread gnawed at us. A hefty chunk of cash, Anas and his team's livelihood, hung in the balance, It felt like a betrayal. We scheduled a meeting with Anas to let him know we were quitting and request to repay him gradually over a year, he heard us out. Then, something magical happened. A smile. "Forget it," he said, not a flicker of doubt in his voice. "The project matters. Let's make it happen!" We were floored. This guy, owed a small fortune, just waved it away? Not only that, he offered to keep building, even pulled his team in to replace our vanished crew. As he spoke, his passion was a spark that reignited us. He believed. In us. In our dream. In what he had developed so far. That's the day Anas became our partner. Not just a contractor, but a brother in arms. Our success story owes its spark not to our own leap of faith, but from the guy who had every reason to walk away. Thanks, Anas, for believing when we couldn't.โž
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Solution Architecture Consultation
    AWS Lambda
    ETL Pipeline
    Data Management
    Data Warehousing
    AWS Glue
    Apache Spark
    Amazon Redshift
    ETL
    Python
    SQL
    Marketing Analytics
    Big Data
    Data Visualization
    Artificial Intelligence
  • $40 hourly
    ๐Ÿ† Top Rated Plus ใ…คใ…คโœ… 9+ years of experience in Upwork ใ…คใ…คโœ… Expert in Python, AWS Cloud, ReactJS, NodeJS ใ…คใ…คโœ… Proven expertise in Web Scraping and Web Development ใ…คใ…คโœ… Available 7 days/week ใ…คใ…คโœ… Fluent English ใ…คใ…คโœ… Certified AWS Solution Architect Associate ใ…คใ…คโœ… Certified Agile Scrum Master ใ…คใ…คโœ… Long-term project support ๐ŸŸข๐™’๐™ƒ๐˜ผ๐™ ๐™„ ๐˜พ๐˜ผ๐™‰ ๐˜ฟ๐™Š ๐™๐™Š๐™ ๐™”๐™Š๐™: ๐Ÿ”ธ Quickly build application prototype ๐Ÿ”ธ Create scalable web applications using microservice architecture (need website that can handle 100k+ users? I've got you!) ๐Ÿ”ธ Build minimum cost MVPs using serverless architecture (don't want to pay too much for MVP's infrastructure? We are here to help๐Ÿ˜Š) ๐Ÿ”ธ Scrape websites regularly ๐Ÿ”ธ Provide long-term project support & maintenance ๐Ÿ”ธ Large scale distributed scraping projects ๐ŸŸข๐™”๐™Š๐™ ๐™‰๐™€๐™€๐˜ฟ ๐™ˆ๐™€, ๐™„๐™: ๐Ÿ”ธ You have an idea but don't know how to turn it into a prototype ๐Ÿ”ธ You need to develop a prototype cheap and fast to test your ideas ๐Ÿ”ธ You need to maintain & develop a legacy web scraping / web application project ๐Ÿ”ธ You want your legacy web application to scale automatically ๐Ÿ”ธ You need to optimize for cost and speed for your web application / web scraper ๐Ÿ”ธ You need long-term support of data feeds / web maintenance ๐Ÿ”ธ High availability is important ๐Ÿ”ธ You want to customize an open-source project ๐Ÿ”ธ You want to build applications that can handle over 20M rows of data on a daily basis ๐ŸŸข๐™”๐™Š๐™๐™ ๐˜ฝ๐™€๐™‰๐™€๐™๐™„๐™๐™Ž ๐™Š๐™ ๐™’๐™Š๐™๐™†๐™„๐™‰๐™‚ ๐™’๐™„๐™๐™ƒ ๐™ˆ๐™€: ๐Ÿ”ธ Iโ€™ve worked on 40+ web scraping / web development projects ๐Ÿ”ธ Iโ€™ve plenty AWS credits that can save you cost when developing a prototype ๐Ÿ”ธ I have a large code base of components that I can reuse to save your time and cost ๐Ÿ”ธ Commitment ๐Ÿ”ธ Relevant project-related suggestions ๐Ÿ”ธ Expert advice for best practices to optimize your software product ๐Ÿ•ท๏ธ๐— ๐—ฌ ๐—ง๐—˜๐—–๐—›-๐—ฆ๐—ง๐—”๐—–๐—ž๐Ÿ•ท๏ธ Python, Scrapy, Puppeteer, PostgreSQL, MongoDB, DynamoDB, RedShift, Athena, AuroraDB, Selenium, BeautifulSoup, Requests, Proxy Rotation, ReactJS, NodeJS, NextJS, Django, Flask, AWS, Firebase, GCP, Docker, pySpark, ElasticSearch, Neptune, Neo4J ๐Ÿ•ธ๏ธ๐—ฆ๐—”๐— ๐—ฃ๐—Ÿ๐—˜ ๐—ช๐—˜๐—• ๐—ฆ๐—–๐—ฅ๐—”๐—ฃ๐—œ๐—ก๐—š ๐—ฃ๐—ฅ๐—ข๐—๐—˜๐—–๐—ง๐—ฆ ๐Ÿ•ธ๏ธ Developed a social network and freelance marketplace platform with integrated payment system Developed a big data platform that updates data in real time and handles over 1M rows of data daily Develop a stock analysis and optimization tool using Scipy and Python Download and process over 25GB of trademark data and ingest into Elastic Search Scrape 50 eCommerce websites for drop shippers Scrape over 100 book sellers on Amazon daily for an eCommerce website Scrape and compare over 1000 products daily on Amazon and eBay for price arbitrage
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Amazon Web Services
    Selenium WebDriver
    PySpark
    MySQL
    Docker
    Browser Extension
    React
    Google Cloud Platform
    Google Chrome Extension
    Machine Learning
    Python
    Python Scikit-Learn
    TensorFlow
  • $150 hourly
    As a Data and Business Intelligence Engineer, I strive to deliver consulting and freelance data engineering services, with a focus on overseeing and executing projects in alignment with customer needs. With services encompassing the full data journey, I create and implement robust data foundations that streamline process development and enable leaders to execute rapid business decisions. Three categories of service include: โ€ข Consulting: Data Strategy Development, Data & Reporting Solution Architecture, and Process Development. โ€ข Data Products/Engineering: Dashboard Development & Reporting, Data Pipelines (ETL), Process Automation, and Data Collection. โ€ข Analytics: Key Performance Indicators (KPIs), Metrics, Data Analysis, and Business Process Analysis. Leveraging over eight years of experience in business intelligence, data visualization, business analysis, and requirements analysis, I build data pipelines and translate data into actionable insights that provide a competitive edge. Tools of choice include Amazon Web Services (AWS), Databricks, Snowflake, Kafka, Snowpipe Streams, Airflow, Tableau/PowerBI, SQL, NoSQL, APIs, Python, and Spark/PySpark. Let me know what I can do for YOU!
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    API
    Data Analysis
    Database
    Amazon Web Services
    Business Analysis
    Snowflake
    Databricks Platform
    ETL Pipeline
    Apache Spark
    Python
    Apache Airflow
    Dashboard
    Tableau
    SQL
  • $60 hourly
    Experienced professional with more than 10 plus years of work experience in cloud architecture(Data focused) on platforms (like AWS,Azure, GCP). - Architecting Distributed Database clusters & Data pipelines for Big Data Analytics and Data Warehousing using tech stacks which include but are not limited to Redshift, Spark, Kinesis, Trino/PrestoDB, Athena, Glue, Hadoop, Hive, S3 Data lake . - Python, Bash, and SQL scripting for database management and automation. - Architecting your next enterprise-level software solution - Linux Server administration for setup and maintenance of services on cloud and on-premise servers. - Creating scripts to automate tasks, web scraping, and so on. Proficient in scripting using Python, Bash and Powershell. Expert in deploying Presto/Trino via docker/kubernetes and on cloud Professional Certifications- AWS Certified Data Analytics Speciality AWS Certified Solutions Architect Associate Google Associate cloud Engineer Microsoft Azure Fundamentals Microsoft Azure Data Fundamentals Starburst Certified practitioner
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Amazon Web Services
    Apache Hadoop
    Big Data
    AWS Glue
    Amazon Athena
    Database Design
    Amazon Redshift
    PySpark
    AWS CloudFormation
    Amazon RDS
    AWS Lambda
    Data Migration
    ETL
    SQL
    ETL Pipeline
  • $140 hourly
    AWS RDS | MySQL | MariaDB | Percona | Semarchy xDM | AWS Glue | PySpark | dbt | SQL Development | Disaster Recovery | Business Continuity | ETL Development | Data Governance / Master Data Management | Data Quality Assessments | Appsheet | Looker Studio | Percona PMM *** Please see my portfolio below.*** I have over two decades of experience immersed in a variety of data systems oriented roles on both cloud-based and on-premise platforms. Throughout my career, I have served in senior-level roles as Data Architect, Data Engineer, Database Administrator, and Director of IT. My technology and platform specialties are diverse, including but not limited to AWS RDS, MySQL, MariaDB, Redshift, Percona XtraDB Cluster, PostgreSQL, Semarchy xDM, Apache Spark/PySpark, AWS Glue, Airflow, dbt, Amazon AWS, Hadoop/HDFS, Linux (Ubuntu, Red Hat). My Services Include: Business Continuity, High Availability, Disaster Recovery: Ensuring minimal downtime of mission-critical databases by utilizing database replication, clustering, and backup testing and validation. Performance Tuning: I can analyze the database configuration, errors and events, physical resources, physical table design, and SQL queries to address performance issues. Infrastructure Engineering: In the AWS environment I use a combination of Ansible, Python with the boto3 SDK, as well as the command line interface (CLI) to create and manage a variety of AWS services including EC2, RDS, S3, and more. System Monitoring: Maintaining historical performance metrics can be useful for proactive capacity planning, immediate outage detection, alerting, and analysis for optimization. I can use tools including Percona Monitoring & Management (PMM), and AWS tools such as Performance Insights and CloudWatch. ETL Development: I develop data processing pipelines using Python, Apache Spark/PySpark, and dbt. For process orchestration, I utilize AWS Glue or Airflow. I am experienced in integrating a variety of sources including AWS S3, REST API's, and all major relational databases. Data Governance / Master Data Management: I am experienced in all phases of development and adminstration on the Semarchy xDM Master Data Management Platform. - Building the infrastructure and installing the software in AWS. - Entity design. - Developing the UI components for use by the data stewards to view and manage master data. - Creating the internal procedures for data enrichment, validation, and duplicate consolidation. - Data ingestion (ETL) - Dashboard creation.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Database Management
    Looker Studio
    Data Lake
    Apache Airflow
    AWS Glue
    PySpark
    Amazon RDS
    dbt
    System Monitoring
    Master Data Management
    High Availability and Disaster Recovery
    MySQL
    MariaDB
    Database Administration
    SQL Programming
  • $40 hourly
    Seeking for challenging task in design and development of scalable backend infrastructure solutions I work in following domains. 1. ETL Pipelines 2. Data Engieering 3. DevOps & AWS (Amazon Web Services) and GCP (Google Cloud Development) deployment 4. Machine Learning I mainly design all solutions in Python. I have 10+ years of experience in Python. I have extensive experience in following frameworks/libraries - Flask, Django, Pandas, Numpy, Django, PyTorch, Scrapy and many more Regarding ETL Pipelines, I mainly provide end to end data pipelines using AWS/GCP/Custom Frameworks. I have more than 7+ years of experience in this domain. I have strong command in Scrapy and have done more than 300+ crawlers till date. Regarding Data Warehousing, I have extensive experience in Google BigQuery and AWS RedShift. I have hands on experience in handling millions of data and analyze them using GCP and AWS data warehousing solutions. I have 5+ years of experience in designing Serverless Applications using AWS and GCP. In addition to this, I am hands on bunch of services on GCP and AWS Cloud and provide efficient and cost effective solution over there.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Data Analysis
    Apache Spark
    PySpark
    ChatGPT
    Generative AI
    AWS Glue
    Google Cloud Platform
    BigQuery
    Snowflake
    Kubernetes
    Django
    Docker
    Serverless Stack
    Python
    Scrapy
    Data Scraping
    ETL Pipeline
  • $90 hourly
    Senior Data Engineer | ex-Google | ex-McKinsey Certifications: - Professional Data Engineer GCP - Databricks Certified Associate Developer for Apache Spark 3.0 Do you have complex data? Don't know what to do with your data? Don't you have any data? Do you have a complex problem? Do you want to optimize code? Do you need a quick, high quality and documented solution? If the answer to any of this question is "yes", then I am the person you are looking for. I am not an engineer that solves problem, I am a problem solver that does code. Since I was 10 years old I participated and won multiple mathematics and programming contests, nationally and internationally. I studied Mathematics and teach competitive math/programming since I was 16! (HackerRank profile: hec10r). Even more, I have worked as a Senior Data Engineer consultant at McKinsey and Google. I do have *proven* experience solving the hardest problems for some of the most important companies in the world. I have experience in multiple industries (insurance, O&G, agriculture, procurement) performing different Data Engineering tasks. I have strong communication and leadership skills that I have developed working as a consultant and leading Data teams. I feel comfortable working with different Data Engineer tools and frameworks. I have expertise with: Cloud providers: - Google Cloud Platform (Cloud Storage, BigQuery, Pub/Sub, Cloud SQL, Cloud Spanner, Composer, Dataflow, Looker) - Microsoft Azure (Data Factory, Databricks, SQL Server, Analysis Services, Blob Storage, PowerBI) - Amazon Web Services (S3, RDS, Lambda) Programming languages - Python (Pandas, NumPy, Kedro, Poetry, Anaconda) - Spark (PySpark) - SQL/NoSQL (BigQuery, Oracle, Postgres, MySQL, SQL Server, GraphQL) - JavaScript (NodeJs) Visualization tools - Python (Dash, Matplotlib) - Power BI (DAX guru :)) - Looker
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Google Dataflow
    API
    Data Processing
    Looker
    Data Analysis
    Business Intelligence
    Data Visualization
    Google Cloud Platform
    Microsoft Power BI
    Databricks Platform
    PySpark
    SQL
    Python
    Data Migration
    ETL Pipeline
  • $30 hourly
    I am a talented Data scientist with extensive experience in working with Big data, statistics, and creating machine learning models. I have worked with various domains data, and get insights using statistical approach (hypothesis testing), and machine learning approaches (as support vector machine, random forest). Moreover, I have comfortable using sklearn, Keras, and PyTorch, which are machine, and deep learning frameworks. Finally, I also have a good skillset in NLP problems as text classification, question answering, summarization, using LSTMs, BiLSTMs, and Transformers (eg. BERT). I have created many projects such as Regression, Classification, Sentiment analysis, Text Summarization, and Entity Extraction. Areas of expertise: โ‡๏ธ Data Wrangling (e.g. Tabular Data, JSON, Text Data ) using pandas, NumPy, spacy, NLTK. โ‡๏ธ Data Visualization (e.g. Dashboards, Matplotlib, Plotly). โ‡๏ธ Data Modeling such as: ๐Ÿ‘‰Traditional Models: Linear regression, SVM, KNN, Kmeans, Naive Bayes, Random Forest, XGBoost. ๐Ÿ‘‰ Deep learning models: CNNs, RNNs (including LSTMs, GRU), Transformers (e.g. BERT, Roberta, and other language and generative models as gpt-3, and openai models). โ‡๏ธ Model Deployment: Creating endpoints using FastAPI, and deploying machine learning models in Microsoft Azure. I'm target oriented person. I focus on utilizing all my expertise and tools to provide the best solution, based on the allocated time.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    ChatGPT API
    Generative AI
    Jupyter Notebook
    Data Visualization
    Docker
    Python
    Data Science
    pandas
    Machine Learning Model
    Python Scikit-Learn
    NLTK
    Azure Machine Learning
    Deep Learning
    NumPy
    Machine Learning
  • $110 hourly
    Distributed Computing: Apache Spark, Flink, Beam, Hadoop, Dask Cloud Computing: GCP (BigQuery, DataProc, GFS, Dataflow, Pub/Sub), AWS EMR/EC2 Containerization Tools: Docker, Kubernetes Databases: Neo4j, MongoDB, PostgreSQL Languages: Java, Python, C/C++
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    MapReduce
    Apache Kafka
    Cloud Computing
    Apache Hadoop
    White Paper Writing
    Academic Writing
    Google Cloud Platform
    Dask
    Apache Spark
    Research Paper Writing
    Apache Flink
    Kubernetes
    Python
    Java
  • $55 hourly
    I focus on data engineering, software engineering, ETL/ELT, SQL reporting, high-volume data flows, and development of robust APIs using Java and Scala. I prioritize three key elements: reliability, efficiency, and simplicity. I hold a Bachelor's degree in Information Systems from Pontifรญcia Universidade Catรณlica do Rio Grande do Sul as well as graduate degrees in Software Engineering from Infnet/FGV and Data Science (Big Data) from IGTI. In addition to my academic qualifications I have acquired a set of certifications: - Databricks Certified Data Engineer Professional - AWS Certified Solutions Architect โ€“ Associate - Databricks Certified Associate Developer for Apache Spark 3.0 - AWS Certified Cloud Practitioner - Databricks Certified Data Engineer Associate - Academy Accreditation - Databricks Lakehouse Fundamentals - Microsoft Certified: Azure Data Engineer Associate - Microsoft Certified: DP-200 Implementing an Azure Data Solution - Microsoft Certified: DP-201 Designing an Azure Data Solution - Microsoft Certified: Azure Data Fundamentals - Microsoft Certified: Azure Fundamentals - Cloudera CCA Spark and Hadoop Developer - Oracle Certified Professional, Java SE 6 Programmer My professional journey has been marked by a deep involvement in the world of Big Data solutions. I've fine-tuned my skills with Apache Spark, Apache Flink, Hadoop, and a range of associated technologies such as HBase, Cassandra, MongoDB, Ignite, MapReduce, Apache Pig, Apache Crunch and RHadoop. Initially, I worked extensively with on-premise environments but over the past five years my focus has shifted predominantly to cloud based platforms. I've dedicated over two years to mastering Azure and Iโ€™m currently immersed in AWS. I have a great experience with Linux environments as well as strong knowledge in programming languages like Scala (8+ years) and Java (15+ years). In my earlier career phases, I had experience working with Java web applications and Java EE applications, primarily leveraging the WebLogic application server and databases like SQL Server, MySQL, and Oracle.
    vsuc_fltilesrefresh_TrophyIcon Pyspark
    Scala
    Apache Solr
    Apache Kafka
    Apache Spark
    Bash Programming
    Elasticsearch
    Java
    Progress Chef
    Apache Flink
    Apache HBase
    Apache Hadoop
    MapReduce
    MongoDB
    Docker
  • Want to browse more freelancers?
    Sign up

How it works

1. Post a job

Tell us what you need. Provide as many details as possible, but donโ€™t worry about getting it perfect.

2. Talent comes to you

Get qualified proposals within 24 hours, and meet the candidates youโ€™re excited about. Hire as soon as youโ€™re ready.

3. Collaborate easily

Use Upwork to chat or video call, share files, and track project progress right from the app.

4. Payment simplified

Receive invoices and make payments through Upwork. Only pay for work you authorize.

Trusted by

How do I hire a Pyspark Developer on Upwork?

You can hire a Pyspark Developer on Upwork in four simple steps:

  • Create a job post tailored to your Pyspark Developer project scope. Weโ€™ll walk you through the process step by step.
  • Browse top Pyspark Developer talent on Upwork and invite them to your project.
  • Once the proposals start flowing in, create a shortlist of top Pyspark Developer profiles and interview.
  • Hire the right Pyspark Developer for your project from Upwork, the worldโ€™s largest work marketplace.

At Upwork, we believe talent staffing should be easy.

How much does it cost to hire a Pyspark Developer?

Rates charged by Pyspark Developers on Upwork can vary with a number of factors including experience, location, and market conditions. See hourly rates for in-demand skills on Upwork.

Why hire a Pyspark Developer on Upwork?

As the worldโ€™s work marketplace, we connect highly-skilled freelance Pyspark Developers and businesses and help them build trusted, long-term relationships so they can achieve more together. Let us help you build the dream Pyspark Developer team you need to succeed.

Can I hire a Pyspark Developer within 24 hours on Upwork?

Depending on availability and the quality of your job post, itโ€™s entirely possible to sign up for Upwork and receive Pyspark Developer proposals within 24 hours of posting a job description.

Schedule a call