Andrii isn't taking new orders for this project right now. Here are some similar projects to explore.
You will get a cleaned, structured, chunked dataset for your LLM / RAG pipeline
Rising Talent

Rising Talent

Project details
Building a RAG app or fine-tuning a model? I turn your raw files, public records, or open datasets into a clean, structured, ready-to-ingest dataset — deduplicated, normalized, chunked, and schema-consistent, stripping out the junk that triggers hallucinations.
You provide the source — raw files, a public dataset, or open-repository links — and choose the output format (JSON, JSONL, CSV), chunk size, and metadata fields. I clean, normalize, dedupe, and structure the data so it drops straight into your pipeline. Backed by 13 years in software QA, the data is programmatically validated before it reaches your model; anything not publicly available is flagged, never invented.
Proof: I built and run a live data product — 40,000+ deduplicated records, automated nightly. Send a sample or describe your source and I will return a small, ingest-ready sample first.
You provide the source — raw files, a public dataset, or open-repository links — and choose the output format (JSON, JSONL, CSV), chunk size, and metadata fields. I clean, normalize, dedupe, and structure the data so it drops straight into your pipeline. Backed by 13 years in software QA, the data is programmatically validated before it reaches your model; anything not publicly available is flagged, never invented.
Proof: I built and run a live data product — 40,000+ deduplicated records, automated nightly. Send a sample or describe your source and I will return a small, ingest-ready sample first.
Data Tool
PythonWhat's included
| Service Tiers |
Starter
$70
|
Standard
$150
|
Advanced
$300
|
|---|---|---|---|
| Delivery Time | 3 days | 4 days | 7 days |
Number of Revisions | 1 | 2 | 2 |
3 reviews
(3)
(0)
(0)
(0)
(0)
This project doesn't have any reviews.
AD
Alejandro D.
Jun 28, 2026
Add AI-Generated Renovation Brief to Existing Google Slides Automation
Great working with Andrii
RN
Robert N.
Jun 12, 2026
General Ledger Transaction Matching
Andrii performed work well, came up with intelligent suggestions and responses to my suggestions. Very quick and professional.
SB
Scott B.
Jun 8, 2026
U.S. Pet Organization Data Acquisition
It is always a challenge picking from 40+ contractors for a job. Andrii was excellent right from the start and was a great choice. Not only did he communicate clearly every step of the way, he went above and beyond in the final delivery of the web data extract that I needed. I will go directly to Andrii to see if he can handle my future work needs (on Upwork, of course) before I search anyone else, he was that good. Choose him, you will not be disappointed.
About Andrii
Web Apps & Data Pipelines | Web Scraping | B2B Lead Lists & Python
100%
Job Success
Kyiv, Ukraine - 9:23 pm local time
WEB APPS & DASHBOARDS - software that puts your data to work:
I build the full web side in Python (FastAPI) and React on PostgreSQL: searchable tools, dashboards, admin panels, and data-backed sites, built end-to-end. I collect and clean the data, then build the app that serves it - so you get one person who owns the whole chain instead of a file handed off to a separate developer. This is the exact stack powering my live portfolio project.
WHAT YOU GET - a clean, ready-to-use dataset:
For lead lists: company or organization, named decision-maker and title, published email, phone, website, address, and LinkedIn. Deduplicated, format-checked, and source-verified. Every email is run through deliverability verification so bounce-prone addresses are dropped, not shipped. Where a data point is not publicly available, I flag it as "not found" rather than guess or invent it.
HOW I GUARANTEE IT - the QA pipeline:
Most data work fails on quality: duplicates, wrong contacts, emails that bounce and burn your sending domain, silent gaps. My background in rigorous software QA is the difference. I build a real collection-and-verification pipeline - gather from public sources, clean, dedupe, cross-check against the live source, and verify - so what you get is accurate, current, and ready to use. I am honest about realistic coverage (typically 40-55% published emails for public-only sources) and I will steer you away from anyone promising "90k valid emails", because those are spam-traps that hurt your domain.
DATA PIPELINES & AUTOMATION - need data moved and transformed on a schedule?
I build Python pipelines that pull from sites, APIs, and PDFs, clean and validate, and load into your sheet, CRM, or database. I use LLMs only where they raise accuracy on messy extraction, always with QA on the output - never a raw model dump.
SCRAPER RESCUE - already have a scraper that broke?
If your Python scraper died, got blocked, or started returning garbage after a site change, I diagnose the real cause (layout shift, anti-bot, rate limits, parser bug), fix it, and validate the output on real data. I pin down the root cause fast and give you an honest timeline before any fix work - many issues clear within a day or two, and I tell you up front if yours is the harder kind. I do not bypass paywalls, logins, or CAPTCHAs.
PROOF (see my portfolio):
My live portfolio piece is a cultural-events aggregator I built and run end-to-end: it collects from dozens of public sources into PostgreSQL, validates them, and serves a fast, searchable React site with maps. 40,000+ deduplicated records, refreshed automatically by a nightly pipeline. The full stack - collect, clean, validate, store, and serve - running in production.
Toolkit: Python (FastAPI, Scrapy, Playwright, BeautifulSoup, Selenium, Pandas), React, PostgreSQL, REST APIs, Google Places API, email-deliverability verification, LLM enrichment (Claude / OpenAI / Gemini).
I am building my reputation on Upwork, so my focus is 100% on your result. Want proof before you commit? Tell me your project - a web build, a target audience, or a broken scraper - and I will send a small working sample or a quick diagnosis first.
Steps for completing your project
After purchasing the project, send requirements so Andrii can start the project.
Delivery time starts when Andrii receives requirements from you.
Andrii works on your project following the steps below.
Revisions may occur after the delivery date.
Share your source and spec
You provide raw files or open-repository links and specify output format, chunk size, and metadata fields.
Clean, structure, and chunk
I deduplicate, normalize, validate, and chunk the data into a schema-consistent, ingest-ready format.