Data Scraper & Engineer Needed: Web Scraping, Data Parsing (Nested to Flat), and QA
Worldwide
Overview We need a skilled Data Engineer / Web Scraper to extract, clean, and structure nutritional data from a public directory. Because this data will be used to generate commercial food labels, 100% data accuracy and meticulous attention to detail are absolute requirements. Scope of Work Automated Scraping: The target website hosts thousands of food items. However, there is no bulk download feature; each item's data must be exported individually as a CSV file. You will write a robust, respectful automation script (Python/Selenium/Playwright/Scrapy) to download all available item files without overloading the host servers. Data Parsing & Flattening: The exported CSVs utilize a deeply nested, hierarchical classification system muddled into semi-colon (;) separated formats. You will write a script (e.g., using Python/Pandas) to parse this hierarchical structure and flatten it into a clean, unified relational database format (SQL or structured JSON), capturing the lowest-level nutrient metrics available. Quality Assurance (QA): Perform rigorous validation and spot-checks against the source website to ensure no numbers, decimal points, or units (grams, milligrams, kcal) were corrupted or shifted during the extraction and flattening process. Required Skills High proficiency in Web Scraping tools (Python, Selenium, Playwright, or Scrapy). Strong background in Data Engineering and Data Cleaning (Pandas, NumPy). Experience handling nested, hierarchical, or poorly formatted raw text/CSV files. Extremely detail-oriented with a proven track record in Data QA/Validation. To Apply, Please Provide: A brief explanation of the tools/libraries you would choose for this specific workflow (scraping + flattening). Your estimated timeline & budget to complete the scraping and cleaning phases. Examples of past projects where you successfully scraped complex structures and delivered highly accurate, flattened datasets. Note: A sample raw CSV and a screenshot of the target web layout are ready to be shared with shortlisted candidates for a precise technical assessment. Also there might be a limited access to the portal for location based IPs. the candidate must solve this without any law violation.
- Less than 30 hrs/weekHourly
- < 1 monthDuration
- ExpertExperience Level
- Remote Job
- One-time projectProject Type
Skills and Expertise
Activity on this job
- Proposals:50+
- Last viewed by client:1 hour ago
- Hires:1
- Interviewing:0
- Invites sent:0
- Unanswered invites:0
About the client
- SAUالرياض8:11 PM
- $905 total spent8 hires, 0 active
Explore similar jobs on Upwork
How it works
Create your free profileHighlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you wantApply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securelyFrom contract to payment, we help you work safely and get paid securely.
About Upwork
- 4.9/5(Average rating of clients by professionals)
- G2 2021#1 freelance platform
- 49,000+Signed contract every week
- $2.3BFreelancers earned on Upwork in 2020
Find the best freelance jobs
Growing your career is as easy as creating a free profile and finding work like this that fits your skills.
Trusted by