Hire the Best Big Data Engineers

More than 3,000 reviews on G2
Rating is 4.5 out of 5.
4.5/5
of Upwork by G2 peer reviewers
Kashif S.

Gudja, Malta

$25/hr
5.0
7 jobs

I build data platforms that work at scale and keep working as your business grows. Over the past 10 years I've served as the lead or founding data engineer across fintech, e-commerce, ride-hail, legal tech, and cybersecurity companies. That means I've designed systems from scratch, made architecture decisions with no one to fall back on, and delivered platforms that product teams actually use. Here's what I typically get hired to do: → Build greenfield data platforms on AWS or GCP from the ground up → Design and ship production ETL/ELT pipelines (Airflow, Dagster, dbt) → Set up scalable warehouses and governance (Snowflake, BigQuery, Redshift) → Implement real-time streaming pipelines (Kafka, Spark Streaming, CDC) → Build AI-powered data applications (RAG, LLMs, LangChain, vector DBs) → Fix broken or unreliable pipelines and make them production-grade → Architect cloud infrastructure on AWS, GCP, Azure (Terraform, Kubernetes) Recent work includes: - Led data platform engineering for a US e-commerce company processing billions of events daily. I re-architected ingestion pipelines, built Snowflake governance from scratch, introduced Prometheus monitoring and CI/CD standards across the platform. - Built a full data platform on GCP (BigQuery, Dataproc, Airflow) for a music streaming company. Firebase, AppsFlyer, and app store data all flowing into one warehouse within weeks. - Designed an AWS data platform for a ride-hail company managing 500+ streaming and 700+ batch jobs — including a self-serve portal that replaced multi-step CLI workflows for engineers. - Built a legal AI search engine using LangChain, Pinecone, and RAG — full pipeline from document ingestion to LLM-generated answers, deployed on AWS with auto-scaling. - Built an AI inventory insights agent for a US automotive company — multi-source data pipelines, real-time APIs, conversational interface. I work in English daily, communicate proactively, and deliver production- ready code — not prototypes. I'm used to working directly with CTOs and technical leads in US and European time zones. Tools I work with regularly: Python · SQL · Airflow · Dagster · dbt · Snowflake · BigQuery · Spark · Meltano · Kafka · AWS (S3, EMR, Glue, ECS, Lambda, EC2, EKS) · Databricks · GCP · Azure Terraform · Docker · Kubernetes · LangChain · FastAPI · MLflow · Weaviate, Celery If you're building a data platform, fixing one, or adding AI/ML capabilities to your stack, let's talk.

  • Data Engineering
  • Docker
  • DevOps
  • GitHub
  • BigQuery
  • Snowflake
  • Python
  • Apache Airflow
  • Apache Spark
  • Google Cloud Platform
  • Terraform
  • Microsoft Azure
  • ETL
  • Amazon Web Services
  • Apache Kafka
Sonali J.

Gurgaon, India

$40/hr
4.3
3 jobs

Power BI developer + Data Engineering — I build the automated pipelines (Databricks, ADF, SQL, Fivetran) and the dashboards leadership trusts, end to end. If your team is still pulling numbers by hand, stitching exports together, and quietly not trusting what's in the report — your data isn't slow, it's unbuilt. And every decision made on top of it is shakier than it should be. I'm a data engineer and BI specialist with 13+ years building the full path from raw, messy source data to the dashboards leadership actually trusts. Co-founder of GrowthBI, an analytics consultancy serving scaling companies across Australia, Europe, and the US. Here's what makes me different: most freelancers do one half. Pure data engineers hand you clean tables and no story. Pure dashboard builders make polished reports on top of data that breaks every week. I build both halves — the pipeline and the reporting — so you hire one person instead of managing two. 80%+ of my clients come back for follow-on work. Where I deliver the most impact: → Data pipelines & automation — Fivetran, DBT, SQL pipelines that ingest from SAP, CRMs, ad platforms, and spreadsheets, model the data in Databricks/BigQuery, and refresh on a schedule so reports build themselves → Data modelling & warehousing — clean, documented, single-source-of-truth tables your whole team can rely on → Power BI & Looker Studio dashboards for marketing, product, sales, and finance leadership → Executive reporting — YTD, MTD, YOY, and scenario comparisons leadership can act on → Customer & growth analytics — RFM modelling, cohort analysis, churn prediction, CAC vs. LTV, channel attribution → Row-level security (RLS) & governance for multi-team and multi-entity reporting Results I've delivered: → Consolidated [X] disconnected sources into one automated pipeline — eliminated ~[X] hrs/week of manual reporting and gave leadership numbers they could finally trust → 10x growth in customer acquisition — surfaced channel-mix insights that reshaped how a leadership team allocated marketing spend → 10% churn reduction — identified the highest-risk customer cohort with RFM segmentation and built a live re-activation tracker for the sales team → Prevented significant revenue loss — caught a payment failure from a default product setting before it compounded, with a live monitoring dashboard run until the fix shipped My core stack: SQL · DBT · Fivetran · Databricks · BigQuery · Power BI · DAX · Power Query · Alteryx · SSIS · Excel Industries: E-commerce · SaaS · Real Estate · Financial Services · Insurance · Consumer Goods Clients I've worked with: Rate My Agent · Keep It Cleaner · Montgomery Homes · Peeplcoach · Paylater · Joolca I work directly with C-suite leaders, senior leadership teams, and cross-functional analysts. I understand what each team needs to answer — and I build the data layer and the dashboards that answer it, without a manual every time.

  • Big Data
  • Microsoft Power BI
  • Microsoft Power BI Data Visualization
  • Microsoft Power BI Development
  • Data Analysis Expressions
  • Power Query
  • Data Analysis
  • Data Analysis Consultation
  • Data Analytics
  • Data Visualization
  • SQL
  • Data Modeling
  • Analytics Dashboard
  • Query Development
  • AI Data Analytics
Muhammad Umer S.

Arlington, Texas

$85/hr
5.0
21 jobs

𝐈 𝐛𝐮𝐢𝐥𝐝 𝐩𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐨𝐧-𝐠𝐫𝐚𝐝𝐞 𝐝𝐚𝐭𝐚 𝐩𝐥𝐚𝐭𝐟𝐨𝐫𝐦𝐬 𝐟𝐨𝐫 𝐜𝐨𝐦𝐩𝐚𝐧𝐢𝐞𝐬 𝐝𝐞𝐚𝐥𝐢𝐧𝐠 𝐰𝐢𝐭𝐡 𝐛𝐫𝐨𝐤𝐞𝐧 𝐩𝐢𝐩𝐞𝐥𝐢𝐧𝐞𝐬, 𝐬𝐜𝐚𝐭𝐭𝐞𝐫𝐞𝐝 𝐬𝐲𝐬𝐭𝐞𝐦𝐬, 𝐬𝐥𝐨𝐰 𝐫𝐞𝐩𝐨𝐫𝐭𝐢𝐧𝐠, 𝐚𝐧𝐝 𝐮𝐧𝐫𝐞𝐥𝐢𝐚𝐛𝐥𝐞 𝐦𝐞𝐭𝐫𝐢𝐜𝐬. I’m a Senior Data Engineer with 10+ years of experience building cloud data platforms, ETL/ELT pipelines, lakehouses, warehouses, and analytics-ready data layers using Microsoft Fabric, Snowflake, AWS, BigQuery, dbt, Python, SQL, Airflow, Fivetran, Airbyte, and Databricks. My focus is not just moving data from point A to point B. I design reliable data systems that are automated, scalable, well-modeled, and trusted by business teams. 𝐖𝐡𝐚𝐭 𝐈 𝐡𝐞𝐥𝐩 𝐰𝐢𝐭𝐡 ✅ 𝐌𝐢𝐜𝐫𝐨𝐬𝐨𝐟𝐭 𝐅𝐚𝐛𝐫𝐢𝐜 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠 Lakehouse, Warehouse, Dataflows Gen2, pipelines, notebooks, semantic models, Medallion architecture, and Power BI-ready data layers. ✅ 𝐂𝐥𝐨𝐮𝐝 𝐃𝐚𝐭𝐚 𝐖𝐚𝐫𝐞𝐡𝐨𝐮𝐬𝐞 𝐃𝐞𝐯𝐞𝐥𝐨𝐩𝐦𝐞𝐧𝐭 Snowflake, BigQuery, Redshift, Databricks, PostgreSQL, SQL Server, and Azure Synapse architecture. ✅ 𝐄𝐓𝐋/𝐄𝐋𝐓 𝐏𝐢𝐩𝐞𝐥𝐢𝐧𝐞 𝐃𝐞𝐯𝐞𝐥𝐨𝐩𝐦𝐞𝐧𝐭 API ingestion, database replication, SaaS integrations, file ingestion, batch jobs, incremental loads, and scheduled workflows. ✅ 𝐝𝐛𝐭 & 𝐀𝐧𝐚𝐥𝐲𝐭𝐢𝐜𝐬 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠 Staging, intermediate, marts, incremental models, tests, documentation, metric definitions, and business logic standardization. ✅ 𝐏𝐢𝐩𝐞𝐥𝐢𝐧𝐞 𝐀𝐮𝐭𝐨𝐦𝐚𝐭𝐢𝐨𝐧 & 𝐎𝐫𝐜𝐡𝐞𝐬𝐭𝐫𝐚𝐭𝐢𝐨𝐧 Airflow, Dagster, AWS Lambda, Glue, Step Functions, ADF, CI/CD, monitoring, retries, alerts, and production workflow automation. ✅ 𝐃𝐚𝐭𝐚 𝐐𝐮𝐚𝐥𝐢𝐭𝐲 & 𝐆𝐨𝐯𝐞𝐫𝐧𝐚𝐧𝐜𝐞 Deduplication, reconciliation, schema drift handling, validation rules, MDM, Golden Record logic, RBAC, access control, and audit-ready reporting layers. 𝐑𝐞𝐜𝐞𝐧𝐭 𝐩𝐫𝐨𝐣𝐞𝐜𝐭 𝐞𝐱𝐩𝐞𝐫𝐢𝐞𝐧𝐜𝐞: 𝐌𝐢𝐜𝐫𝐨𝐬𝐨𝐟𝐭 𝐅𝐚𝐛𝐫𝐢𝐜 𝐞𝐧𝐭𝐞𝐫𝐩𝐫𝐢𝐬𝐞 𝐚𝐧𝐚𝐥𝐲𝐭𝐢𝐜𝐬 𝐩𝐥𝐚𝐭𝐟𝐨𝐫𝐦 Built a centralized Fabric platform with Lakehouse, Warehouse, Dataflows, pipelines, semantic models, and Power BI reporting layers for a global organization with fragmented reporting sources. 𝐍𝐞𝐚𝐫 𝐫𝐞𝐚𝐥-𝐭𝐢𝐦𝐞 𝐒𝐧𝐨𝐰𝐟𝐥𝐚𝐤𝐞 𝐝𝐚𝐭𝐚 𝐩𝐥𝐚𝐭𝐟𝐨𝐫𝐦 Designed operational pipelines into Snowflake with incremental ingestion, schema change handling, deduplication, retries, and reliable dbt-based reporting models. 𝐌𝐨𝐝𝐞𝐫𝐧 𝐒𝐚𝐚𝐒 𝐝𝐚𝐭𝐚 𝐬𝐭𝐚𝐜𝐤 Centralized HubSpot, Stripe, GA4, Google Ads, Salesforce, MongoDB, and product data into BigQuery/Snowflake using Fivetran, Airbyte, dbt, Dagster, and Metabase. 𝐋𝐞𝐠𝐚𝐜𝐲 𝐦𝐢𝐠𝐫𝐚𝐭𝐢𝐨𝐧 𝐭𝐨 𝐜𝐥𝐨𝐮𝐝 𝐰𝐚𝐫𝐞𝐡𝐨𝐮𝐬𝐞 Led migrations from SQL Server, SAP BW, Oracle, Hadoop, and on-prem systems into modern cloud warehouses with optimized performance and automated workflows. 𝐀𝐖𝐒 𝐝𝐚𝐭𝐚 𝐚𝐮𝐭𝐨𝐦𝐚𝐭𝐢𝐨𝐧 Built automated workflows using S3, Lambda, Glue, Step Functions, IAM, SNS, CloudWatch, and Python to reduce manual reporting and improve pipeline reliability. 𝐓𝐨𝐨𝐥𝐬 𝐈 𝐰𝐨𝐫𝐤 𝐰𝐢𝐭𝐡 Microsoft Fabric, Snowflake, BigQuery, Redshift, Databricks, Azure Synapse, PostgreSQL, SQL Server, Oracle, SAP BW, Hadoop, dbt, Python, SQL, Airflow, Dagster, Fivetran, Airbyte, ADF, SSIS, Talend, AWS S3, Lambda, Glue, Step Functions, IAM, SNS, CloudWatch, Power BI, Tableau, Metabase, and Looker. You should reach out if you need a senior data engineer to: ✅ Build a cloud data warehouse or lakehouse ✅ Migrate legacy systems to Snowflake, Fabric, BigQuery, or AWS ✅ Fix unreliable ETL/ELT pipelines ✅ Design dbt models and trusted reporting layers ✅ Automate manual reporting workflows ✅ Integrate APIs, CRMs, ERPs, databases, and SaaS platforms ✅ Build production-ready data infrastructure for analytics and BI If your data stack is messy, slow, or hard to trust, send me a message. I’ll help you map the cleanest path from scattered systems to a reliable data platform.

  • Big Data
  • Data Engineering
  • Azure DevOps
  • Snowflake
  • Data Analytics
  • ETL
  • Amazon Web Services
  • Data Migration
  • Python
  • SQL
  • Artificial Intelligence
  • Microsoft Power BI
  • Databricks Platform
  • Data Modeling
  • dbt
  • Apache Airflow
  • API Integration
  • Cloud Engineering
  • Apache Kafka
  • Azure Service Fabric
Michael N.

Toronto, Canada

$75/hr
4.8
36 jobs

☑️ 14+ Years Professional Experience ☑️ US Client Specialty ☑️ <10 Minute Client Message Response Times ☑️ Outstanding Pro-Active Communication ☑️ 4,600+ Hours Delivered, Maintaining a 100% Job Success Score ☑️ Extensive Experience In: Software - SaaS - eCommerce - Finance ☑️ Worked with Netskope & RSAC ☑️ On Time Every Time ☑️ Long-Term Partnership Mindset 📞 𝗜𝗻𝘃𝗶𝘁𝗲 𝗺𝗲 𝘁𝗼 𝘆𝗼𝘂𝗿 𝗽𝗿𝗼𝗷𝗲𝗰𝘁 𝗼𝗻 𝗨𝗽𝘄𝗼𝗿𝗸 𝗼𝗿 𝘀𝗲𝗻𝗱 𝗺𝗲 𝗮 𝗱𝗶𝗿𝗲𝗰𝘁 𝗺𝗲𝘀𝘀𝗮𝗴𝗲 𝘁𝗼 𝗯𝗼𝗼𝗸 𝗮 𝗰𝗼𝗺𝗽𝗹𝗶𝗺𝗲𝗻𝘁𝗮𝗿𝘆 𝗰𝗼𝗻𝘀𝘂𝗹𝘁𝗮𝘁𝗶𝗼𝗻 𝗰𝗮𝗹𝗹 𝘁𝗼𝗴𝗲𝘁𝗵𝗲𝗿. ❝ Michael and the DSC team is absolute gold! I cannot speak highly enough of their data insights and software development capabilities, which based on my experience, are second to none! I have served as a product strategy and development leader for a number of major brands, including Starbucks, Microsoft, Wells Fargo and more. Furthermore, I have worked with many talented Power BI, data analytics and software development colleagues, consultants and third party providers. I have never found a more capable team to deliver best-in-class solutions for any data related initiative - simple or complex. ❞ 🗣 Bryan Guy - CEO - Databillity — Hi! I’m Michael. I specialize in supporting US-based clients with exceptional data engineering, data architecture and advanced analytics solutions. I’m the leader of a team of 10 hand-picked Canadian data scientists and data engineers. When you work with me, you work with an industry-leading team of native-English experts. Together, we create outcomes impossible to do individually. The main outcomes I consistently create with my clients are enabling them to make higher-quality business decisions with reliable data, all while reducing manual work and leveraging automation for their entire team. I offer end-to-end solutions. I take responsibility for every step of your data journey. This includes building automated data pipelines that pull information directly from systems like your CRM, ERP, and financial platforms. Your data can then be centralized in a cloud-based data warehouse, creating a single reliable source of data serving as a stable foundation for analysis and reporting. From there, we can work together to determine what matters for your business to track with earnestly helpful reporting dashboards. Your KPI’s and the data you need to make meaningful decisions are communicated clearly in accessible reporting that you and your team will actually use. — One example of a recent successful data engineering project was: Data ingestion and analytics with an American IT security company. Ingesting data from Salesforce (Opps, Leads, Contacts, Campaigns etc.) into Snowflake using Fivetran. Building a metadata layer in Looker using LookML code, and creating dashboards to track key metrics such as ARR and MRR. Overseeing the rollout of dbt, a data transformation platform, used as an intermediary data transformation step between Snowflake and Looker. — Data sources I often work with include: 🔷 CRM & Sales Systems - Salesforce - HubSpot - Proprietary CRM Systems 🔷 Marketing & Analytics Platforms - Google Analytics - Google Tag Manager - Facebook Ads - Marketing Attribution Platforms 🔷 Financial & Payment Systems - Stripe - QuickBooks - ERP Systems - Financial Reporting Tools 🔷 Product & Application Data - Web Apps - Internal Databases - Event Tracking Systems - Custom Application Data 🔷 Cloud & Data Infrastructure - Google BigQuery - Snowflake - Azure-Hosted Data Environments 🔷 Custom & API Integrations - REST APIs - Webhook-Based Systems - Custom Data Pipelines From Internal Tools — Data architect and data engineering services I commonly support my clients with includes: 🔸 Specialties: Snowflake - dbt - Microsoft Power BI - Tableau - Looker Studio 🔸 Data Modeling - Data Architecture - Data Engineering 🔸 Reporting - Custom Dashboards - Data Visualization 🔸 Data Warehouse Projects - Data Migrations - Data Transformation - Data Ingestion 🔸 Business Intelligence Projects 🔸 Microsoft Azure - Azure Hosted Data - Azure Hosted Analytics 🔸 Data Engineering Projects 🔸 Automated Testing 🔸 Fivetran - Data Integration Platform Projects 🔸 Airbyte Data Monitoring Tool - Custom Connections Built With: QuickBooks - Microsoft Dynamics - Leadspedia - Everflow - Proprietary CRM systems 🔸 Custom Python Scripts - Custom SQL 🔸 LookML - Amazon Quicksight - Holistics BI - Lightdash - Google BigQuery - Sigma Computing 📞 𝗜𝗻𝘃𝗶𝘁𝗲 𝗺𝗲 𝘁𝗼 𝘆𝗼𝘂𝗿 𝗽𝗿𝗼𝗷𝗲𝗰𝘁 𝗼𝗻 𝗨𝗽𝘄𝗼𝗿𝗸 𝗼𝗿 𝘀𝗲𝗻𝗱 𝗺𝗲 𝗮 𝗱𝗶𝗿𝗲𝗰𝘁 𝗺𝗲𝘀𝘀𝗮𝗴𝗲 𝘁𝗼 𝗯𝗼𝗼𝗸 𝗮 𝗰𝗼𝗺𝗽𝗹𝗶𝗺𝗲𝗻𝘁𝗮𝗿𝘆 𝗰𝗼𝗻𝘀𝘂𝗹𝘁𝗮𝘁𝗶𝗼𝗻 𝗰𝗮𝗹𝗹 𝘁𝗼𝗴𝗲𝘁𝗵𝗲𝗿. Thank you for learning about how I can support you with your data architect and data engineering project. Michael Nandlall

  • Big Data
  • Data Visualization
  • Data Analysis
  • Database Architecture
  • Database Design
  • Data Warehousing
  • ETL Pipeline
  • Data Engineering
  • Data Migration
  • Data Integration
  • Dashboard
  • Snowflake
  • Tableau
  • Looker
  • dbt
  • Microsoft Power BI
  • Data Science
  • Microsoft Excel
  • SQL
  • Python
M Haseeb A.

Stockholm, Sweden

$55/hr
5.0
39 jobs

Struggling to unlock value from your data or build scalable, high-performance analytics platforms? I’m 𝑯𝒂𝒔𝒆𝒆𝒃 𝑨𝒔𝒊𝒇,a Senior Data Engineer specializing in Databricks, Snowflake, Big Data Engineering, and scalable ETL/ELT solutions. With expertise in PySpark, Python, SQL, GCP, AWS, Azure, and NLP, I build high-performance data pipelines, cloud data platforms, and real-time analytics solutions. Experienced in data warehousing, cloud integration, machine learning workflows, and performance optimization to transform raw data into actionable business insights. Let’s build reliable, scalable, and data-driven solutions for your business growth. I’ve successfully completed 99+ projects across industries, designing ETL pipelines, MLOps workflows, Delta Lake architectures, and cloud analytics solutions on AWS, Azure, and GCP. ✔️ 𝑯𝒐𝒘 𝑰 𝑯𝒆𝒍𝒑 𝑩𝒖𝒔𝒊𝒏𝒆𝒔𝒔𝒆𝒔 𝑻𝒓𝒂𝒏𝒔𝒇𝒐𝒓𝒎 𝑫𝒂𝒕𝒂 𝒊𝒏𝒕𝒐 𝑰𝒏𝒔𝒊𝒈𝒉𝒕𝒔 ➜ Databricks & Big Data Engineering I specialize in designing enterprise-grade Databricks Lakehouse architectures and Delta Lake solutions. My expertise in Spark and PySpark allows me to build high-performance pipelines for both batch and real-time analytics, ensuring your data infrastructure is robust and scalable. ➜ Machine Learning & MLOps With a focus on machine learning and MLOps, I build and deploy predictive models using tools like MLflow and TensorFlow. I automate end-to-end ML pipelines to enhance efficiency and accuracy, driving impactful insights from your data. ➜ Cloud & Data Platforms I implement secure, scalable cloud solutions on platforms like AWS, Azure, and GCP. My experience includes cloud migration, Kubernetes, Docker, and CI/CD automation, ensuring seamless integration and optimal performance. ➜ ETL & Data Pipelines I develop reliable ETL processes and data pipelines that streamline data integration and transformation. My work with streaming analytics using Kafka and Spark ensures real-time data processing and actionable insights. ➜ Data Analyst & Visualization I create actionable dashboards and visualizations using Power BI, Tableau, and Databricks SQL. My focus is on driving KPI reporting and business intelligence to support strategic decision-making. ➜ Snowflake I leverage Snowflake's capabilities to build efficient data warehousing solutions, optimizing data storage and retrieval for enhanced performance and scalability. ➜ Python My proficiency in Python allows me to develop complex data processing scripts and machine learning models, ensuring robust and efficient data handling. ➜ NLP (Natural Language Processing) I apply NLP techniques to extract meaningful insights from unstructured data, enabling advanced text analytics and improved decision-making processes. ➜ GCP (Google Cloud Platform) I utilize GCP's powerful tools to design and deploy scalable cloud solutions, ensuring high availability and performance for your data-driven applications. ➜ Data Warehouses I design and manage data warehouses that provide a centralized repository for your data, facilitating efficient data analysis and reporting. ✔️ 𝑲𝒆𝒚 𝑻𝒐𝒐𝒍𝒔 & 𝑻𝒆𝒄𝒉𝒏𝒐𝒍𝒐𝒈𝒊𝒆𝒔 ▪ Databricks & Big Data: Databricks, Delta Lake, Apache Spark, PySpark, Unity Catalog, Kafka, Hadoop, Real-time Streaming ▪ Machine Learning: MLflow, TensorFlow, PyTorch, scikit-learn, Feature Store, Predictive Analytics, NLP ▪ Cloud Platforms: AWS, Azure, GCP, Kubernetes, Docker, CI/CD ▪ Analytics & BI: Power BI, Tableau, Databricks SQL, KPI Dashboards, Data Strategy ▪ Data Engineering: ETL Pipelines, Data Lakes, Data Warehousing, Data Migration, Performance Optimization ✔️ 𝑾𝒉𝒚 𝑪𝒉𝒐𝒐𝒔𝒆 𝑴𝒆 I combine deep technical expertise with practical business understanding, delivering scalable, cost-efficient, and AI-ready data solutions. My goal is to turn your data into a strategic asset that powers smarter decisions and measurable growth. Let’s collaborate to build your next-generation analytics platform and unlock the full potential of your data. Check my portfolio for architecture samples, dashboards, and case studies. Databricks Engineer, Big Data Consultant, Spark Developer, MLOps Engineer, Data Engineer, AWS Data Specialist, Azure Databricks, GCP Analytics, ETL Developer, Data Analytics, Delta Lake Expert, Machine Learning Engineer, Python, Database Architecture, Data Processing, ETL, Big Data, Database Design, Data Engineering, Data Analytics & Visualization Software, Data Visualization, Deep Learning Modeling, Data Warehousing & ETL Software, Snowflake, Amazon Web Services, ETL Pipeline, Machine Learning, Deep Learning, Data Science, Data Analysis, Cloud Engineering, Artificial Intelligence, Databricks Engineer, Big Data Consultant, Spark Developer, MLOps Engineer, Data Engineer, AWS Data Specialist, Senior Data Engineer specializing in Databricks, Snowflake, Big Data Engineering, and scalable ETL/ELT solutions. With expertise in PySpark, Python, SQL, GCP, AWS, Azure, and NLP

  • Big Data
  • Python
  • ETL
  • Data Engineering
  • Snowflake
  • Machine Learning
  • ETL Pipeline
  • Database Architecture
  • Data Processing
  • Database Design
  • Data Analysis
  • Cloud Engineering
  • Data Analytics & Visualization Software
  • Data Warehousing & ETL Software
  • BigQuery
  • Data Integration
  • Databricks Platform
  • Database
  • Data Analytics
  • Apache Flink
Aleksey Y.

Kushuhum, Ukraine

$50/hr
4.9
43 jobs

👋 Hi, I’m a 🏆 ⭐️⭐️⭐️⭐️⭐️Top Rated Python Web Scraping and Data Mining Expert at GroupBWT - a 100-person engineering team delivering scalable, high-accuracy data extraction and web scraping systems since 2009. I specialize in building end-to-end data mining and lead generation pipelines using tools like Python, Selenium, Playwright, and Puppeteer to extract structured information from complex, dynamic, and protected sources. Over the years, I’ve worked on systems where data mining and web scraping are core infrastructure components, not just scripts. We help clients replace spreadsheets, manual monitoring, and unreliable sources with fully automated pipelines delivered into APIs, dashboards, or databases: ✅ Our Python Web Scraping and data mining systems process over 1.5 million URLs per day, with proxy rotation, retry logic, and real-time monitoring. ✅ We reduce reporting delays by 70% using real-time data mining and web scraping pipelines for inventory, pricing, and market tracking. ✅ Our systems collect and structure more than 15,000 news articles weekly using web scraping and data mining pipelines, converting unstructured content into usable insights. 🔁 What I do 📦 Python Web Scraping Data Mining: We build large-scale Python Web Scraping data mining systems that extract structured data from marketplaces, real estate platforms, B2B directories, social networks, and news sources. ✔ Product & Price Monitoring ✔ Competitor Analysis ✔ Business Directory extraction ✔ Lead Generation & Contact Discovery 🧹 Cleaning, Enrichment & Integration We don’t stop at extraction - every data mining and Python Web Scraping pipeline includes cleaning, normalization, deduplication, and enrichment. We turn raw outputs into Data Analyst-ready datasets that integrate with CRMs, BI tools, and internal systems. ✔ Document & Market analysis ✔ Data matching and enrichment ✔ Classification and tagging 📤 Automation & Scalable Delivery We design recurring web scraping data mining pipelines that run daily or hourly with full observability. Every web scraping data mining system includes monitoring, alerts, retries, and structured delivery into APIs, databases, or BI dashboards. ✔ Scheduled web scraping data mining pipelines ✔ Real-time dashboards ✔ Automated reporting feeds ✔ Lead Generation database creation and enrichment ✔ Custom bot developer solutions ✔ Data Analyst-ready dataset 🧾 Data Entry & Data Scraping Support We provide structured Data Entry services: ✔ Lead Generation database creation and enrichment ✔ Data Entry for large-scale scraped datasets into spreadsheets, CRMs, and databases ✔ Data Entry automation for transforming unstructured data scraping outputs into structured records ✔ High-volume Data Entry handling for product, business, and directory information ✔ Data Entry validation to ensure accuracy and consistency across datasets ✔ Semi-automated Data Entry workflows ⚙️ Tech Stack: Languages: Python, JavaScript, PHP Databases: MySQL, PostgreSQL, MongoDB, DynamoDB, MSSQL, Snowflake Frameworks: Scrapy, Twisted, Hadoop, DRF, Laravel, ApacheSpark Containerization: Docker, k8s Message Brokers: RabbitMQ, Redis AWS: EC2, SQS, RDS, S3, Lambda, Gateway, Neptune Libraries: Celery, Puppeteer, NLTK, numpy, pandas, spaCy, TensorFlow, scikit-learn, React-PHP, Keras, sqlalchemy, Gensim, pyTorch RPA & Workflow: Selenium, Playwright, UI automation, Slack/Google Sheets integrations, API workflows, Excel automation Bot Developer: Custom bot developer | data scraper solutions Platform: Databricks, Snowflake, BigQuery, Airflow, DBT, Redshift, Power BI 🧩 SAMPLE WEB SCRAPING & DATA MINING PROJECTS ✅ Built a platform covering 300+ marketplaces using Python Web Scraping and data mining, processing 60M+ records daily ✅ Built pricing intelligence system using data mining and Python Web Scraping across 120+ eCommerce domains ✅ Delivered news intelligence platform powered by web scraping data mining from 80+ global media sources ✅ Built a lead generation platform using Python scraping and web scraping data collection across multiple business directories, generating thousands of verified leads weekly 🤝 Why Clients Work With Us: ✅ Enterprise-Grade Web Scraping & Data Mining Systems 🔹 We build full-scale ecosystems, not scripts - every Python Web Scraping and data mining system includes monitoring, scaling, error recovery, and Data Analyst-ready outputs. 🔹 Clients reduce manual work by up to 60% using automated web scraping data mining pipelines 🔹 Result: 25-35% lower operational costs through stable web scraping data mining infrastructure ✅ Built to Scale 🔹 Systems handle 100M+ pages/month using web scraping data mining architecture 🔹 Designed for horizontal scaling of web scraping data mining workloads 🔹 Real-time updates through scheduled web scraping and data mining execution 🚀 Ready to turn data into structured intelligence using web scraping and data mining? ➡️ Invite me to your job and let’s build your pipeline

  • Data Scraping
  • Data Extraction
  • Robotic Process Automation
  • Data Mining
  • Python
  • ETL
  • ETL Pipeline
  • Web Scraping
  • Data Warehousing & ETL Software
  • Data Analysis
  • Data Science
  • Data Engineering
  • Data Analytics
  • Snowflake
  • Databricks Platform
  • Data Entry
  • Web Crawling
  • Beautiful Soup
  • Selenium
  • Microsoft Excel

How it works

Post a job for free Post a job

Tell us what you need. Create your own job post or generate one with AI then filter talent matches.

Hire top talent fast

Consult, interview, and hire quickly, so you can meet the freelancers you're excited about.

Collaborate easily

Use Upwork to chat or video call, share files, and track project progress right from the app.

Payment simplified

Manage payments in one place with flexible billing options. Only pay for approved work, hourly or by milestone.

Don't just take our word for it

How do I hire a Big Data Engineer on Upwork?

You can hire a Big Data Engineer on Upwork in four simple steps:

  • Create a job post tailored to your Big Data Engineer project scope. We’ll walk you through the process step by step.
  • Browse top Big Data Engineer talent on Upwork and invite them to your project.
  • Once the proposals start flowing in, create a shortlist of top Big Data Engineer profiles and interview.
  • Hire the right Big Data Engineer for your project from Upwork, the world’s largest work marketplace.

At Upwork, we believe talent staffing should be easy.

How much does it cost to hire a Big Data Engineer?

Rates charged by Big Data Engineers on Upwork can vary with a number of factors including experience, location, and market conditions. See hourly rates for in-demand skills on Upwork.

Why hire a Big Data Engineer on Upwork?

As the world’s work marketplace, we connect highly-skilled freelance Big Data Engineers and businesses and help them build trusted, long-term relationships so they can achieve more together. Let us help you build the dream Big Data Engineer team you need to succeed.

Can I hire a Big Data Engineer within 24 hours on Upwork?

Depending on availability and the quality of your job post, it’s entirely possible to sign up for Upwork and receive Big Data Engineer proposals within 24 hours of posting a job description.