Data Scraper & Engineer Needed: Web Scraping, Data Parsing (Nested to Flat), and QA

Posted 8 hours ago

Worldwide

Summary

Overview We need a skilled Data Engineer / Web Scraper to extract, clean, and structure nutritional data from a public directory. Because this data will be used to generate commercial food labels, 100% data accuracy and meticulous attention to detail are absolute requirements. Scope of Work Automated Scraping: The target website hosts thousands of food items. However, there is no bulk download feature; each item's data must be exported individually as a CSV file. You will write a robust, respectful automation script (Python/Selenium/Playwright/Scrapy) to download all available item files without overloading the host servers. Data Parsing & Flattening: The exported CSVs utilize a deeply nested, hierarchical classification system muddled into semi-colon (;) separated formats. You will write a script (e.g., using Python/Pandas) to parse this hierarchical structure and flatten it into a clean, unified relational database format (SQL or structured JSON), capturing the lowest-level nutrient metrics available. Quality Assurance (QA): Perform rigorous validation and spot-checks against the source website to ensure no numbers, decimal points, or units (grams, milligrams, kcal) were corrupted or shifted during the extraction and flattening process. Required Skills High proficiency in Web Scraping tools (Python, Selenium, Playwright, or Scrapy). Strong background in Data Engineering and Data Cleaning (Pandas, NumPy). Experience handling nested, hierarchical, or poorly formatted raw text/CSV files. Extremely detail-oriented with a proven track record in Data QA/Validation. To Apply, Please Provide: A brief explanation of the tools/libraries you would choose for this specific workflow (scraping + flattening). Your estimated timeline & budget to complete the scraping and cleaning phases. Examples of past projects where you successfully scraped complex structures and delivered highly accurate, flattened datasets. Note: A sample raw CSV and a screenshot of the target web layout are ready to be shared with shortlisted candidates for a precise technical assessment. Also there might be a limited access to the portal for location based IPs. the candidate must solve this without any law violation.

  • Less than 30 hrs/week
    Hourly
  • < 1 month
    Duration
  • Expert
    Experience Level
  • Remote Job
  • One-time project
    Project Type
Skills and Expertise
Mandatory skills
Data Scraping
Python
Data Extraction
Activity on this job
  • Proposals:50+
  • Last viewed by client:4 hours ago
  • Hires:
    1
  • Interviewing:
    0
  • Invites sent:
    0
  • Unanswered invites:
    0
About the client
Member since Nov 6, 2024
  • SAU
    الرياض10:50 PM
  • $905 total spent
    8 hires, 0 active

Explore similar jobs on Upwork

Local Lead GenerationHourly‐ Posted 2 weeks ago
Web Scraping
Data Scraping
Data Extraction
Lead Generation
Data Entry
Data Mining
Data Collection
Data Processing
Web Scraping Framework
Web Crawler Framework
Web Scraping Software
Web Scraping Plugin
Web API
Search Tool
Search Engine
Microsoft Word
Data Entry
Administrative Support
Microsoft Excel

How it works

  • Post a job icon
    Create your free profile
    Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
  • Talent comes to you icon
    Work the way you want
    Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
  • Payment simplified icon
    Get paid securely
    From contract to payment, we help you work safely and get paid securely.
Want to get started? Create a profile

About Upwork

  • Rating is 4.9 out of 5.
    4.9/5
    (Average rating of clients by professionals)
  • G2 2021
    #1 freelance platform
  • 49,000+
    Signed contract every week
  • $2.3B
    Freelancers earned on Upwork in 2020

Find the best freelance jobs

Growing your career is as easy as creating a free profile and finding work like this that fits your skills.

Trusted by

  • Microsoft Logo
  • Airbnb Logo
  • Bissell Logo
  • GoDaddy Logo