Data Harvesting Specialist (Senior Level)

Posted 2 days ago

Worldwide

Summary

We are looking for a Senior Data Harvesting and Automation Engineer who can build strong, reliable systems that collect large amounts of information from online sources. The goal is to turn raw online data into clean, organized, useful information that our teams can use for automation, analytics, sales, research, and decision-making.

In this role, you will design full data pipelines. This means you will collect the data, clean it, enrich it, organize it, store it, and then send it to the tools our teams use every day. To do this well, you must be very comfortable working with Python, Playwright, Selenium, Scrapy, and modern data-handling tools. You should also know how to work with databases, APIs, and ETL processes.

The systems you build must handle websites with heavy JavaScript, login pages, moving parts, changing layouts, and anti-automation challenges. You will create pipelines that run on their own, update themselves, and fix small issues without stopping.

We want someone who thinks about long-term solutions, not quick fixes. You should be able to plan, design, test, and maintain full data systems that stay strong even as they grow bigger and handle more information. The work you do will support our automation systems, AI tools, dashboards, and market-research programs across many industries.

This job is perfect for someone who enjoys solving complex problems, building strong data systems, and making information easy for others to use.

Deliverables

1. Data Harvesting Systems
Build high-volume data harvesting tools using Python, Playwright, Selenium, Scrapy, and similar libraries.
Handle dynamic websites, multi-step forms, login pages, and heavy JavaScript content.
Create pipelines that collect data in real-time, on schedules, or in large batches.
Use methods like proxy rotation, session management, and browser automation for reliability.
2. Automated ETL and Data Enrichment
Build ETL workflows that clean, organize, validate, and enrich data.
Standardize messy information into clear formats.
Add extra value through enrichment such as categories, tags, metadata, and location details.
Ensure the final datasets are clean, accurate, and ready to use.
3. API and System Integrations
Build REST APIs that deliver processed data to internal tools.
Connect harvested data to systems such as CRMs, dashboards, and automation platforms.
Design schemas for leads, companies, products, service categories, and multi-industry datasets.
Maintain consistent data flow from pipelines into our operational tools.
4. Database Architecture
Create and manage SQL and NoSQL databases.
Build data models that support fast reading, writing, and updating.
Use indexing and caching techniques for better performance.
Keep data stored safely, clearly, and logically.
5. Monitoring and System Stability
Build monitoring tools that check pipeline health, speed, and accuracy.
Set up alerts for errors, slowdowns, or unusual behavior.
Add retry systems, backup methods, and fail-safe processes.
Keep pipelines running smoothly with little manual work.
6. Documentation
Write clear instructions and explanations for each pipeline and tool.
Create diagrams showing how data moves through the system.
Document best practices so the system can grow over time.
Keep code and SOPs organized and easy for others to understand.
7. Team and Data Support
Work with analytics, automation, marketing, and sales teams.
Provide clean data that powers dashboards, workflows, and research.
Help teams use the data to improve decision-making and strategy.
Support multi-industry intelligence projects with accurate, timely information.

More than 30 hrs/week
Hourly
6+ months
Duration
Expert
Experience Level
$50.00
-
$100.00
Hourly
Remote Job
Complex project
Project Type

Skills and Expertise

Mandatory skills

Data Analysis

Data Science

Nice-to-have skills

Data Mining

Big Data

Tools

Apache Spark

Python

SQL

Activity on this job

Proposals:20 to 50
Last viewed by client:2 days ago
Interviewing:
3
Invites sent:
3
Unanswered invites:
0

About the client

Member since Dec 5, 2022

United States
Canton11:52 AM
$114K total spent
61 hires, 29 active
7,424 hours

Explore similar jobs on Upwork

Snowflake Data Engineer - Colocate and Visualize HR DataHourly‐ Posted 2 months ago

Snowflake

Database Design

Data Integration

Data Preprocessing

Data Transformation

Data Migration

Data Engineering

ETL Pipeline

SQL

Looker

Data Visualization

Scripting Language

Database University AssignmentsHourly‐ Posted 1 month ago

SQL

Database

Microsoft Excel

Database Design

Database Management

SQL Server Integration Services

Excel Macros

Excel Formula

Microsoft Power BI

Microsoft Excel PowerPivot

Power Query

Data Entry

Data Cleaning

Data Analytics

Data Extraction

How it works

Create your free profile
Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you want
Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securely
From contract to payment, we help you work safely and get paid securely.