Hire the best Pyspark Developers in Canada

Check out Pyspark Developers in Canada with the skills you need for your next job.
  • $25 hourly
    I am a Data Engineer and Tech Enthusiast experienced with multiple cloud platforms. Whatever your data centric need, I am the go to person! - Expert with Airflow, Python and Spark to develop end to end robust ETL solutions. - Experienced with advanced technical writing with several articles and a book already published. - Hobbyist turned professional IoT Project creator. Will support you throughout the process. - Regular communication is important to me, so let’s keep in touch.
    Featured Skill Pyspark
    Technical Writing
    MQTT
    Amazon Web Services
    Google Cloud Platform
    AWS IoT Core
    PySpark
    Apache Spark
    Apache Airflow
    Kubernetes
    OpenAI Embeddings
    Data Science
    Machine Learning
    Data Engineering
    Python
    Raspberry Pi
  • $80 hourly
    I have been working on variety of projects which involves project management, coding, machine learning, neural networks and data presentation. I am well versed with ML tools, Cloud based applications and data exploration.
    Featured Skill Pyspark
    Android Studio
    PostgreSQL Programming
    Artificial Intelligence
    IBM Watson
    SQLite Programming
    PySpark
    Django
    Deep Neural Network
    Flask
    Tableau
    Apache Spark
    Python
    Data Science
    Java
    Machine Learning Model
  • $85 hourly
    I have 7 years plus experience working in big data eco system both on prem and on cloud.Developed ETL framework to ease ingestion of complex data pipelines with minimal code for the end developer.Ingested streaming data using custom build spark sources and sinks.Created framework to deploy spark pipelines on k8s using argo workflows. Extensive knowledge on python and scala.Working knowledge on Mlops and Devops
    Featured Skill Pyspark
    Big Data
    Apache Spark
    Snowflake
    MLflow
    Kubernetes
    Python Scikit-Learn
    Apache Airflow
    Apache Kafka
    PySpark
    AWS Lambda
    pandas
    Python
  • $125 hourly
    You need a data scientist but finding the right one for your project can be laborious. What makes me trustworthy: 🔸15 years experience as a senior data scientist 🔸Strong work ethics 🔸Various industry experience (media measurement, banking sector, retail loyalty program (B2C), distribution and logistics (B2B), early stage startup) 🔸PhD in Physics 🔸High Performance Computing background 🔸Expertise in predictive analytics using supervised machine learning algorithms (regression and classification models) and prescriptive models (mixed-integer optimization, non-linear optimization, pricing models, procurement planning). 🔸Open Source solutions using Python, PySpark and Jupyter notebooks , Scikit Learn, Numpy ... consuming data from SQL databases or binary files (hdf5, netcdf ... ) or flat files (csv, txt ...) I'd like to learn more about your project and better understand your needs. Please reach out with your availability, and we can coordinate a Zoom meeting to discuss further. Examples of deliverable: 🔸One of my customers faced the challenge of optimizing its procurement strategy to maximize savings while managing stock levels. The company sources products from multiple suppliers, each offering various price breaks and conditions for free shipping based on minimum order quantities. Using mixed integer programming (MIP), we aimed to determine the optimal order quantities from each supplier that not only satisfy inventory requirements but also minimize overall purchasing costs. This MIP model involves defining decision variables for the quantity ordered from each supplier, constraints to maintain stock levels within specific limits, and an objective function that minimizes the total cost, including adjustments for suppliers' discounts and shipping costs. By implementing this solution, my customer saves thousands of dollars each week by optimizing their purchases. 🔸Designing and building custom forecast models for planning future demand and aligning resources accordingly. These models leverage advanced analytics to predict demand patterns accurately, enabling efficient resource allocation. From sales forecasting to inventory management, customized models empower businesses to stay agile and responsive to market fluctuations, ensuring optimal operational efficiency and customer satisfaction. 🔸Developed propensity models in Jupyter notebooks to predict lead conversion likelihood, enabling prioritization of leads that align with business objectives and optimize ROI. The models were seamlessly integrated into production on a SQL server. 🔸Created a price simulation tool to estimate the impact of various input parameters on profitability, accounting for non-linear effects such as churn rate based on margin and marketing cost to expand the customer base. 🔸 Implemented price optimization models to maximize revenue or profit, collaborating closely with sales and marketing teams to align with business objectives and revenue management rules. 🔸 Designed a customized optimization process to find the value of the parameters maximizing a business outcome within constraints. 🔸Generated synthetic tabular data using Generative Adversarial Networks (GANs) with the ctgan library. Designed a custom cost function to produce synthetic records that adhere to specific criteria, including distribution functions and goal metrics.
    Featured Skill Pyspark
    Generative Adversarial Network
    Price Optimization
    Pricing Research
    Pricing
    Statistics
    PySpark
    Operations Research
    Jupyter Notebook
    Predictive Analytics
    Python Scikit-Learn
    Machine Learning
    Python
    Data Science
    pandas
  • $140 hourly
    Mark is a senior data engineer with over 25 years of experience. His primary expertise is in the field of analytics / data warehousing / business intelligence with a specific focus on the Microsoft Fabric platform. He began his consulting career with Accenture and then founded IGENO in 1999. Mark has consulted in Canada, the United States, Australia and UAE and across a variety of industries including healthcare, e-commerce, real estate, retail, telecom, financial services, tourism and hospitality. He holds a Bachelor of Engineering (electrical) from the University of Victoria and an MBA from the University of Western Ontario (Ivey). He also served as an Adjunct Professor at the University of British Columbia (UBC) where he taught a course in data analytics.
    Featured Skill Pyspark
    MongoDB
    Data Engineering
    Microsoft Azure
    Azure DevOps
    React
    NodeJS Framework
    Microsoft Power BI Data Visualization
    Microsoft SQL Server Programming
    Data Science
    NoSQL Database
    JavaScript
    PySpark
    SQL
    Python
    Fabric
  • $75 hourly
    🔹 Welcome! I’m Rizwan, a Senior Data Engineer & AI Consultant with 10+ years of experience designing, optimizing, and deploying scalable data solutions, AI-driven analytics, and cloud architectures. I specialize in Big Data, ETL, AI-powered automation, and cloud-based data engineering, helping businesses process large-scale datasets, integrate AI models, and optimize cloud infrastructure for faster decision-making and business growth. 💡 Industries I’ve Worked With: Healthcare, FinTech, SaaS, E-commerce, Infrastructure, AI Startups 🚀 What I Offer: ✅ Data Engineering & ETL Pipelines – Optimized ETL/ELT workflows using Python, Apache Spark, dbt, and SQL for real-time data processing. ✅ Big Data & Cloud Solutions – Proficient in Hadoop, Databricks, Snowflake, Redshift, and BigQuery for scalable and cost-efficient analytics. ✅ AI & Machine Learning Integration – Expertise in LLMs, Retrieval-Augmented Generation (RAG), NLP, AI chatbots, and data-driven AI workflows. ✅ Database Design & Optimization – SQL & NoSQL databases (PostgreSQL, MongoDB, Pinecone, Faiss) for efficient querying and high-speed analytics. ✅ Cloud Data Engineering & API Development – Deploying AWS (Glue, Redshift, Lambda), GCP (BigQuery, Vertex AI), Azure Data Factory for secure & scalable AI+data solutions. 🔥 Why Hire Me? 🚀 AI-Driven Data Architect – Built high-performance AI+data pipelines processing billions of records. ⚡ Proven Track Record – Delivered enterprise-grade AI & data solutions across multiple industries. 💡 Scalable & Future-Proof Designs – Implementing highly efficient cloud-native architectures. 📢 Clear Communication & Transparency – Providing real-time updates and reports. ❓ FAQs 💾 What data engineering tools & frameworks do you use? Python, Spark, Hadoop, Databricks, Snowflake, PostgreSQL, MongoDB, Kafka, dbt, Airflow, FastAPI, Power BI, Looker. ⚡ Can you handle enterprise-level AI & big data projects? Absolutely! I specialize in AI-powered data pipelines, LLM integrations, and scalable cloud architectures for large-scale businesses. ☁️ Do you work with cloud AI & data platforms? Yes! Expertise in AWS (Redshift, Glue, EMR), GCP (BigQuery, Vertex AI), Azure Data Factory. 📊 Can you help optimize my data infrastructure for AI & analytics? Yes! I can design data pipelines, automate AI workflows, and optimize your cloud infrastructure to boost performance and reduce costs. 📩 Let’s Connect & Build Scalable AI-Powered Data Solutions! Looking to transform your business with AI, big data, and cloud automation? Let’s discuss how I can help optimize your data infrastructure, AI applications, and analytics workflows. 🚀 Let’s collaborate and scale your AI-driven success!
    Featured Skill Pyspark
    Data Migration
    BigQuery
    Big Data
    ETL
    Data Analysis
    Databricks Platform
    Apache Spark
    Back-End Development
    Django
    Django Stack
    ETL Pipeline
    Python
    PySpark
    Data Science
    Data Engineering
  • $25 hourly
    As a dedicated and skilled Data Engineer, I specialize in designing, building, and optimizing data pipelines and systems that empower businesses to make data-driven decisions. My expertise spans various technologies and tools, ensuring seamless data processing, storage, and integration tailored to meet unique business needs. What I Bring to the Table - Data Engineering Expertise: Proficient in SQL, Python, PySpark, SparkSQL, and modern data frameworks for efficient processing and transformation. - Cloud Solutions Mastery: Extensive experience with Azure services like Azure Data Factory, Azure Synapse Analytics, and Azure Blob Storage for building scalable cloud-based solutions. - Data Warehousing & Modeling: Expertise in developing high-performance data warehouses and implementing effective data models to support analytics and reporting. - End-to-End Workflow Automation: Skilled in orchestrating workflows, automating ETL/ELT processes, and integrating diverse data sources into unified systems. Why Work With Me? - Proven ability to deliver scalable, reliable, and secure data solutions tailored to business goals. - Focus on writing clean, efficient, and maintainable code. - Strong analytical and problem-solving skills to handle complex data challenges. - Transparent communication and commitment to delivering quality results on time. Services I Offer - Design and development of data pipelines and workflows. - Implementation of ETL/ELT processes for data integration and transformation. - Cloud-based data engineering with Azure tools and platforms. - Performance optimization for data processing jobs and systems. - Develop and manage data warehouses and models. Let’s Collaborate - Whether you're building a new data solution or optimizing your existing systems, I can help you unlock the full potential of your data. Let’s work together to create impactful, efficient, and reliable data solutions that drive insights and growth for your business.
    Featured Skill Pyspark
    Apache Hive
    Apache Spark
    Apache Hadoop
    Database Management System
    Sqoop
    Linux
    PySpark
    MySQL
    SQL Programming
    SQL
  • $20 hourly
    I'm a Software Engineer with specialisation in Python and Azure and 5+ years of experience. Certified as an Azure Developer Associate (AZ-203), I can build software using advanced python practices. Whether it's creating object-oriented backend solutions or navigating the cloud with Azure, I've got it covered. I have previously helped large organisations in FMCG, Manufacturing and Finance domains, and startups building digital transformation and backend development projects. Core Skills: - Orchestrating cloud infrastructure with experience in setting up Azure VMs/AWS EC2 instances, Cloud Storage, Azure Datalake, Cloud Data Warehousing resources, Log Analytics etc. - System design and architecture of data applications for both cloud and self hosted solutions - Data transformation in python using pandas, koalas and duckdb - Building modern data applications with interactive GUI in python using pyqt, tkinter, streamlit, panel - Production grade web scrapping solutions using beautiful soup and requests - Building backends and REST APIs in python using FastAPI and experience in automating postman tasks - Custom LLM and Gen AI solutions using FastAPI and Open AI - Experienced in both SQL and NoSQL databases including Microsoft SQL Server, Azure Cosmos DB, MySQL - CI/CD using git actions, azure devops and experienced in utilising git for complete version control efficiency - Hands on experience in implementing OOP and SOLID design principles in my projects - I have worked with multiple types of data and Databases, which has enabled me to develop an optimised solution for different use cases. I am always keen on developing re-useable and optimised solutions. - Deployment and maintenance of solutions on server/cloud like AWS, Azure, Heroku. I thrive in big team vibes, ensuring our projects hit the sweet spot of collaboration and innovation. Keywords: python, azure, API developer, web scrapping, data extractions, aws, pyspark, cloud developer, data developer, data ingestion, pandas, mongo db
    Featured Skill Pyspark
    Microsoft Azure
    Data Management
    Databricks Platform
    Apache Spark
    PySpark
    Microsoft Azure SQL Database
    Data Analysis
    Python
    SQL
    Microsoft Power BI
  • $45 hourly
    Pablo Guinea Benito - 🌐 Welcome to My World of Full Stack Development! 🔍 Who Am I? As a Computer Engineer specialised in AI & Robotics, I bring a rich blend of over 6 years in Full Stack Development and cutting-edge technological innovation. My journey in tech is grounded in a strong academic foundation, including a B.Sc. in Computer Science with specialisation in Robotics and a Postgraduate Diploma in Applied AI Solutions Development. 🔧 My Technical Strengths: Advanced Technologies: Mastery in Python, JavaScript, R, MATLAB, SQL, HTML, and CSS. Data Analytics & Machine Learning: Proficient in TensorFlow, Keras, PyTorch, Pandas, NumPy, and more. Web Development: Expertise in Vue.js, React, HTML5, CSS3. Business Intelligence & Cloud Technologies: Skilled in Tableau, Power BI, AWS, Azure, and GCP. 🏆 Accomplishments & Projects: Co-led the development of an innovative medical question-answering system with SyTaCa, a startup recognised in the European Startup Challenge. This project aimed to revolutionise patient-doctor interactions, targeting a 65% reduction in consultation time. Collaborated on the creation of a Decision Support System for early detection of Alzheimer's and Parkinson's diseases, showcasing my ability to apply AI in healthcare. 📚 Educational Background & Certifications: DeepLearnging.AI TensorFlow Developer Professional Certificate (2023) IBM Data Analytics Professional Certificate (2022). Machine Learning: Data Science in Python – Udemy (2020). Advanced Python for Data Scientists – LinkedIn (2021). ✨ Why Choose Me? Innovative Problem-Solver: Leveraging AI and robotics knowledge to develop advanced solutions. Effective Communicator: Fluent in both Spanish and English, ensuring clear and efficient communication. Proven Leadership: Successful track record in leading project teams and delivering high-impact solutions. 🔭 Looking Ahead: Ready to bring my diverse expertise to your project, I am committed to realizing your vision with top-notch quality and efficiency. Let’s connect and embark on a journey of innovation and excellence together!
    Featured Skill Pyspark
    Data Processing
    PySpark
    Apache Hadoop
    LLM Prompt Engineering
    Genetic Algorithm
    Trading Strategy
    Statistical Process Control
    Probability Theory
    Mathematical Modeling
    RESTful API
    Python Scikit-Learn
    Amazon Web Services
    Docker
    TensorFlow
    Flask
  • $28 hourly
    I am an experienced consultant and developer in Data Analytics (Big Data & AI) domain. As a part of my job responsibilities, I have gained proficiency in gathering, interpreting and understanding client requirements. Furthermore, I have designed and developed their big data analytics workflows/architectures on cloud in an optimal way by evaluating cost/performance trade-offs. Some of my successfully implemented tasks are: 💎 Experience in designing large scale Big Data pipelines using AWS, GCP & Azure. 💎 Developing Curation jobs in Apache Spark for data integration, cleaning, entity resolution and format conversion for optimized performance in complex data processing. 💎 Writing and optimizing Extract, Transform and Load (ETL) jobs in Spark under resource constraints to transform curated data into analysed data sets where it starts to add value to the business. 💎 Developing methods to store transformed full load/incremental data using Apache Hudi. 💎 Developing automated pipelines to export transformed data from cloud storage services to SQL and NoSQL databases for data warehousing and creating reports. 💎 Deploying and maintaining big data infrastructure as code. 💎 Designing strategies for testing data conformity, accuracy, duplication, consistency, validity, and completeness. 💎 Developing automated pipelines to migrate data files among multiple clouds. 💎 Using Databricks for data orchestration and ETL development and Azure Functions and Queue to push and store events in Redis for near real-time spark ETL job processing. 💎 Customer Sentiment Analysis (Natural Language Processing) project on Google Cloud Platform to check if customer reviews on the client’s products are positive, negative or neutral. 💎 Designed a Google Cloud Platform based architecture where Google Cloud Functions and Cloud Pub/Sub were used to automatically load incoming ’csv’ files from client’s ’gmail’ address attachments to Cloud Storage and then BigQuery for doing queries and analysis. 💎 Successfully delivered multiple proof-of-concepts (POC) in Big Data Analytics domain. 𝙃͟𝙞͟𝙜͟𝙝͟𝙡͟𝙞͟𝙜͟𝙝͟𝙩͟𝙚͟𝙙͟ 𝙎͟𝙠͟𝙞͟𝙡͟𝙡͟𝙨͟ 🔷 𝐀𝐦𝐚𝐳𝐨𝐧 𝐖𝐞𝐛 𝐒𝐞𝐫𝐯𝐢𝐜𝐞𝐬: EC2, EMR, S3, Step Functions, Cloud Formation, Lambda, AWS Kinesis, SNS, CloudWatch, CloudTrail, IAM, Redshift) 🔷 𝐆𝐨𝐨𝐠𝐥𝐞 𝐂𝐥𝐨𝐮𝐝 𝐏𝐥𝐚𝐭𝐟𝐨𝐫𝐦: Cloud Functions, Dataflow, Pub/Sub, Cloud Storage, Dataproc, BigQuery, Compute, AutoML, Natural Language. 🔷 𝐌𝐢𝐜𝐫𝐨𝐬𝐨𝐟𝐭 𝐀𝐳𝐮𝐫𝐞: Data Factory, Databricks, Blob Storage, Redis, Azure Function, Azure Queue, Azure Synapse 🔷 𝐁𝐢𝐠 𝐃𝐚𝐭𝐚: Hadoop, HDFS, Spark, Sqoop, MapReduce, Hudi, ETL, APIs 🔷 𝐃𝐚𝐭𝐚𝐛𝐚𝐬𝐞𝐬: PostgreSQL, SQL Server, AWS Redshift 🔷 𝐏𝐫𝐨𝐠𝐫𝐚𝐦𝐦𝐢𝐧𝐠: Python, C++, SQL, Spark, MATLAB, Assembly Language, ML(scikit-learn, keras), R 🔷 𝐎𝐭𝐡𝐞𝐫𝐬: Linux, GitHub, GitLab, TFS, Robotics, Microcontrollers, MS Project, MS Office, MS Visio In my work, I do my best to meet my client's expectations and deadlines. Looking forward to discussing your project together!
    Featured Skill Pyspark
    Data Processing
    Terraform
    PySpark
    Databricks Platform
    Data Engineering
    Microsoft SQL Server
    Data Analytics
    ETL
    Database
    Apache Spark
    Python
    SQL
    Microsoft Azure
    Google Cloud Platform
    Amazon Web Services
  • $95 hourly
    Data Engineering and Business Intelligence professional with 8+ years of experience specializing in SQL, Python, and Azure Cloud Services. Pioneered machine learning classifier project securing $300,000 in funding. Led data integration initiatives, improving operational efficiency by 30%. Expertise in cloud migration, data governance, business intelligence solutions, and Database design.
    Featured Skill Pyspark
    SQL
    Microsoft SQL Server
    SQL Server Integration Services
    Cloud Architecture
    Cloud Migration
    Microsoft Azure SQL Database
    ETL Pipeline
    PySpark
    Python
    Data Engineering
    Data Warehousing
    Data Modeling
    Business Intelligence
  • $40 hourly
    I am a passionate result-driven Big Data professional, with proven knowledge of Python, PySpark, Java, SQL, Hadoop, Kafka, Spark, Scala, Sqoop,R, Hive, Pig Latin, MySQL, NoSQL, HiveQL, Flume, and Oozie. I have utilized hands-on and academic knowledge to provide efficient solutions for Big Data problems with a willingness to learn and grow
    Featured Skill Pyspark
    Big Data
    Eclipse IDE
    Jupyter Notebook
    Microsoft Azure
    Data Analysis
    Java
    Python
    Microsoft Excel
    ETL Pipeline
    Scala
    SQL
    Apache Kafka
    Apache Spark
    PySpark
    Data Engineering
  • $10 hourly
    Statistician / data analyst interested in stochastic modelling, parameter estimation and forecasting.
    Featured Skill Pyspark
    Lasso
    Python Scikit-Learn
    pandas
    NumPy
    CI/CD
    SQL
    PySpark
    R
    Forecasting
    Experimental Music
    Python
  • $70 hourly
    As an experienced and highly skilled data scientist, I have consistently applied my knowledge in machine learning, product analytics, statistical analysis, online experiment, dashboard creation, and stakeholder communication in previous roles. I thrive on translating business needs into empirical data science projects that bring meaning and focus to overwhelming data and insights. If you are interested in a data science career or need to do any related data science projects, I can help.
    Featured Skill Pyspark
    Data Science
    Statistical Analysis
    Data Analysis
    A/B Testing
    Machine Learning
    Mode Analytics
    GitHub
    Apache Airflow
    Git
    BigQuery
    PySpark
    Python
    SQL
  • $45 hourly
    I am Data Professional passionate about leveraging data to provide valuable insights and analytics. I am a proactive problem solver, with solid organizational and time management skills.
    Featured Skill Pyspark
    Machine Learning
    Data Analysis
    Report Writing
    Microsoft PowerPoint
    Microsoft Excel
    PySpark
    Tableau
    SQL
    Python
  • $18 hourly
    Hello, I'm Paul, a Data Scientist passionate about transforming raw data into actionable insights. With expertise in: - Excel, Python (Pandas, NumPy, Scikit-learn), SQL, data visualization tools - Experience with machine learning algorithms for predictive modeling - Strong background in statistical analysis and data interpretation I specialize in data analysis and modeling for diverse applications. I believe in regular communication throughout the project to ensure your needs are met. Let's work together to harness the power of data for your business success.
    Featured Skill Pyspark
    SQL
    Data Modeling
    Microsoft Power BI Data Visualization
    Scientific Illustration
    Data Analysis
    Environment
    Climate Science
    Plotly
    PySpark
    Matplotlib
    NumPy
    Apache Spark
    Microsoft Excel
    Python
    Data Science
  • $80 hourly
    Data Engineering Specialist working with goverments, banks, transporation industry. I am a Data Engineering Specialist with over 10 years of experience building robust, scalable data pipelines and solutions for governments, banks, and the transportation industry. My expertise includes Python, MongoDB, Sybase, PostgreSQL, and Azure cloud technologies. I specialize in ETL processes, database optimization, and big data analytics to help businesses transform their data into actionable insights. Let’s collaborate to make your data work efficiently!
    Featured Skill Pyspark
    Snowflake
    PySpark
    Databricks Platform
    Microsoft Azure
    ETL Pipeline
    ETL
    Data Extraction
  • $75 hourly
    Senior Data Engineer avec plus de 6 ans d'expérience dans l'ingénierie des données et la science des données. Spécialisé dans l'optimisation des pipelines de données et les solutions de cloud computing, avec une passion pour l'innovation technologique et l'amélioration continue. Expérience avérée dans divers secteurs, notamment l'énergie, les transports et la finance.
    Featured Skill Pyspark
    Kubernetes
    PySpark
    Python
    SQL
    Data Analysis
    ETL
    ETL Pipeline
    Data Extraction
  • $40 hourly
    TECHNICAL SUMMARY Microsoft Azure Data Engineer Associate certified Data Engineer with over 10 years of experience in the IT industry and more than 4 years of expertise in data engineering. Demonstrates a strong focus on designing , developing , and maintaining sophisticated data infrastructure and pipelines . Adept at constructing and optimizing data systems, ETL processes , and data warehouses to guarantee the dependable and efficient collection , storage, and retrieval of data. Possesses a comprehensive skill set in data modeling , proficient database management , and a high level of proficiency in utilizing tools such as Apache Spark and diverse database technologies . Proven ability to contribute to the creation of data architectures that align with organizational goals.
    Featured Skill Pyspark
    Data Warehousing
    Microsoft Azure
    ADF Faces
    PySpark
    Databricks Platform
    Python
    SQL
    ETL Pipeline
    ETL
  • $45 hourly
    I am a Senior IT Developer Data Engineer with 5 years of experience in designing, developing, and optimizing large-scale data pipelines in cloud environments. Currently, I work at TD Bank, where I specialize in data ingestion, transformation, and automation using Azure, Databricks, Kafka, and Delta Lake. End-to-End Data Engineering: Expertise in building robust data pipelines for batch and streaming workloads, ensuring efficient ingestion, transformation, and storage of large datasets. Cloud & Big Data Technologies: Strong proficiency in Azure Data Factory (ADF), Databricks, Delta Lake, Kafka, and Snowflake, enabling scalable and high-performance data processing. - Automation & Optimization: Experience implementing CI/CD pipelines, automating data workflows, and optimizing ETL processes to enhance efficiency and reduce manual effort. - Financial & Fraud Data Processing: Deep understanding of financial data models, fraud detection systems, and regulatory compliance, having worked extensively on projects like DCMCR, TSYS Pipeline, and UAP Ingestion. -Solution Architecture & Data Governance: Skilled in defining data models, schema enforcement, deduplication strategies, and rule-based transformations to maintain data integrity and quality. Key Projects & Achievements: DCMCR Project – Led the end-to-end processing of financial data, ensuring seamless integration between DSAP, FRAM, and ACI systems while maintaining security and compliance. TSYS Pipeline – Developed a rule-based JSON mapping approach for ingesting MBNA, ADS, and fraud logs into Delta tables, handling structured and semi-structured data efficiently. UAP Ingestion & Migration – Engineered solutions to support missing batch ingestion in streaming pipelines, transforming Avro source files into Delta tables with optimized workflows. PRM Upgrade – Enhanced existing streaming Delta tables by incorporating new columns and handling schema evolution, solving technical challenges such as 256-parameter limits in linked services. I am passionate about solving complex data challenges and building innovative solutions that drive data-driven decision-making. My goal is to transition into a leadership role in data engineering or solution architecture, where I can mentor teams, shape data strategies, and design high-impact cloud solutions.
    Featured Skill Pyspark
    Bitbucket
    GitHub
    Microsoft Windows PowerShell
    Visualization
    Microsoft Azure
    Microsoft Azure SQL Database
    SQL Server Reporting Services
    Data Mining
    Data Analysis
    Snowflake
    Scala
    PySpark
    SQL
    ETL Pipeline
    Data Extraction
  • $65 hourly
    SUMMARY AND TECHNICAL SKILLS: 5+ years of ML/MLOps experience in Generative AI, Machine Learning, Data Engineering, and Data Science in Public Services, Aviation, Market Research, Medical, Finance, and Insurance, Customer Service domains. SKILLS: AI, ML, Data Science, Cloud Engineering, DevOps, and BI Tools: • Computer Language: Python, C++, MS SQL, Java, Shell scripting, Javascript, Bash • Toolkits: Databricks, Numpy, Pandas, Sklearn, PySpark, MlOps, NLP, AWS, Azure OpenAI, GCP, Copilot Studio, Visual Studio Code, Docker, Kubernetes, LangChain, Google Vertex AI, Linux, Jupyter, Gitlab, Pytorch, Tensorflow, Pandas, LLM, RAG, Azure DevOps, Azure ML Studio, Azure App Services, MlFlow, Spark, Kafka, Azure AI Search, PowerApps, AWS Sagemaker
    Featured Skill Pyspark
    Stanford CoreNLP
    PySpark
    SQL
    Python
    Databricks MLflow
    ETL
    Machine Learning
    Machine Learning Model
    Artificial Intelligence
  • $85 hourly
    AVAILABILITY * Interview Availability: 1-2 days' notice. * Start: 2 weeks' notice upon offer (open to discuss). * Vacations: No vacations planned for the next 6 months. * Work Status: Permanent resident in Canada - no sponsorship required. PROFESSIONAL SUMMARY * I'm a Senior Big Data Engineer with 8+ years of experience providing both on-premises and cloud-based solutions across different industries such as finance, technology, insurance, Capital Market, tele-communication, and government. I have extensive experience designing and implementing ETL pipelines that process terabytes of data daily, ensuring efficient and scalable data integration across various systems. * Azure Certified Data Engineer Associate and Databricks Certified Data Engineer Associate with sound analytical and big data expertise, having hands-on experience across multiple data
    Featured Skill Pyspark
    Tableau
    Snowflake
    Apache Hadoop
    PySpark
    Databricks Platform
    Microsoft Azure
    Python
    SQL
    Data Extraction
    Mining
    Data Analysis
    ETL Pipeline
    ETL
  • $35 hourly
    SUMMARY * Skilled IT professional with 8+ years of diverse experience, excelling as a Big Data Engineer for the past 4+ years. Proficient in developing industry-specific software applications and implementing Big Data technologies in core and enterprise environments. * Expertise in Analysis, Design, Development, Deployment, and Integration, utilizing SQL and Big Data tools. * Proficient in SQL programming across various databases including MySQL, SQL Server, Oracle, Cassandra, and HBase. * Demonstrated proficiency in Hadoop architecture and components such as HDFS, Hive, Pig, and MapReduce. * Skilled in Scala for functional programming and Python for scripting and data analysis. * Experience in Extraction, Transformation, and Loading (ETL) processes, adept at data processing and manipulation using Apache Spark. * Certified AWS Solutions Architect Associate with hands-on experience in AWS services such as
    Featured Skill Pyspark
    PySpark
    Apache Spark
    Scala
    SQL
    Data Analysis
    ETL Pipeline
    Data Extraction
    ETL
  • $70 hourly
    Professional Summary * Over 15+ years of IT experience out of which 7+ Years in Cloud Analytics using Azure, AWS, GCP, Snowflake, ETL, Business Objects, and SAP HANA. * Experience with Azure transformation projects and Azure architecture decision-making Architect and implement ETL and data movement solutions using Azure Data Factory (ADF). * Implemented Copy activity, Custom Azure Data Factory Pipeline Activities for On-cloud ETL processing using various databases including Azure Synapse/Snowflake Database. * Spearheaded the development of medium to large-scale BI solutions leveraging Azure Data Platform services, including Azure Data Factory (ADF), Azure Databricks, and Snowflake, alongside Power BI for comprehensive data analytics and visualization. * Executed meticulous data Extraction, Transformation, Loading, ingestion, integration, cleansing, and aggregation processes across diverse sources, ensuring data accuracy.
    Featured Skill Pyspark
    Microsoft Azure
    PySpark
    Python
    Snowflake
    Microsoft Power BI
    Tableau
    PostgreSQL
    MongoDB
    SQL
    Data Engineering
    Data Visualization
    Data Modeling
    Data Warehousing
    Data Analysis
    ETL
  • $34 hourly
    I’m a Business Intelligence Analyst with 6+ years of experience helping organizations make data-driven decisions. Whether you need interactive dashboards, data analysis, or workflow optimization, I can help. Proficient in Power BI, Tableau, SQL, Python, and database technologies. I manage projects end-to-end and prioritize clear communication. Let’s keep in touch!
    Featured Skill Pyspark
    PySpark
    SQL
    Python
    Tableau
    Microsoft Power BI Data Visualization
    ETL Pipeline
    Mining
    Beta Testing
    Alpha Testing
    Data Analysis
    ETL
    Data Extraction
    Data Mining
    Agriculture & Mining
    Analytical Presentation
  • $20 hourly
    Over 5 years experience as a Data Engineer with expertise in Data Analytics & Statistics, Web Scraping, Big Data, Data QA and Business Intelligence. I have experience working with automation, data analysis, and dashboard creation. Technical Skills: • Data Analysis & Data Visualization • Data Mining • Web Scraping • Big Data Analysis using PySpark, Pandas, My SQL • Business Intelligence using Google Data Studio and Tableau • Quality Assurance of Data • Create Data Analysis tool using Python • Google Sheets & Excel Programming Language: • Python • SQL
    Featured Skill Pyspark
    PySpark
    Data Science
    Web Scraping
    Data Analysis
    Selenium WebDriver
    Data Visualization
    Data Mining
    Python
    Data Scraping
    pandas
  • $25 hourly
    As a dedicated and detail-oriented Data Engineer with 4 years of proven experience, I offer a robust skill set tailored to meet your project needs. Proficient in designing, constructing, and maintaining highly scalable data management systems, I specialize in developing efficient data pipeline architectures capable of handling large volumes of complex data from diverse sources. My strengths lie in collaborating closely with data and business analysts to enhance data models and systems, ensuring alignment with evolving business requirements. Key highlights of my experience include: Developing ingestion frameworks for streaming data sources using technologies such as Databricks, PySpark, Apache Hive, and Kafka, resulting in a 60% reduction in development time. Successfully designing and implementing data ingestion frameworks and pipelines, including the configuration of Spark Streaming to receive real-time data from Kafka. Leveraging Azure Databricks to mount and transform different data storages, improving data quality and speeding up analysis processes. Coordinating automation strategies to improve system monitoring and reducing service tickets through restructuring processes. Completing a Master's degree in Applied Computing with a focus on Artificial Intelligence from the University of Windsor, complemented by certifications and a strong academic background in Computer Science. With a track record of delivering high-quality projects on time and within budget, I am committed to continuous learning and staying abreast of cutting-edge technologies to drive innovation. I am adept at project management, delegation, and maintaining thorough documentation, ensuring seamless collaboration within teams. If you are seeking a skilled Data Engineer with a proven ability to tackle complex challenges and deliver results, I am confident in my ability to exceed your expectations.
    Featured Skill Pyspark
    Microsoft Azure
    Azure DevOps
    Apache Kafka
    PySpark
    Relational Database
    Django
    Azure Machine Learning
    Apache Spark
    Machine Learning
    Keras
    TensorFlow
    Python
    C#
    WordPress
    Java
  • Want to browse more freelancers?
    Sign up

How hiring on Upwork works

1. Post a job

Tell us what you need. Provide as many details as possible, but don’t worry about getting it perfect.

2. Talent comes to you

Get qualified proposals within 24 hours, and meet the candidates you’re excited about. Hire as soon as you’re ready.

3. Collaborate easily

Use Upwork to chat or video call, share files, and track project progress right from the app.

4. Payment simplified

Receive invoices and make payments through Upwork. Only pay for work you authorize.