You will get Custom Text Dataset Creation for NLP & AI Training


Project details
I build custom text datasets for NLP and AI training, any domain. I specialize in Philippines low‑resource languages but work with English and high‑resource languages too.
My proven pipeline: data sourcing (web scraping or your files) → cleaning & normalization → annotation (sentiment, classification, or custom labels) → quality validation → delivery in CSV/JSON/Parquet with source code and documentation.
Proof: I created HiliSenti v1, the first publicly available sentiment dataset for Hiligaynon (23,337 sentences, 93.5% accuracy, DOI: 10.57967/hf/8737, 117+ downloads).
I work async (chat/email only). Fixed‑price quotes. Free revisions included. Let me know your requirements.
My proven pipeline: data sourcing (web scraping or your files) → cleaning & normalization → annotation (sentiment, classification, or custom labels) → quality validation → delivery in CSV/JSON/Parquet with source code and documentation.
Proof: I created HiliSenti v1, the first publicly available sentiment dataset for Hiligaynon (23,337 sentences, 93.5% accuracy, DOI: 10.57967/hf/8737, 117+ downloads).
I work async (chat/email only). Fixed‑price quotes. Free revisions included. Let me know your requirements.
Machine Learning Tools
BERT, ChatGPT, NLTK, NumPy, pandas, Python, Python Scikit-Learn, PyTorch, Scrapy, SQL, TensorFlow, TextBlob, Word2vecWhat's included
| Service Tiers |
Starter
$40
|
Standard
$120
|
Advanced
$250
|
|---|---|---|---|
| Delivery Time | 2 days | 7 days | 14 days |
Number of Revisions | 1 | 2 | 2 |
Number of Model Variations | 0 | ||
Model Validation/Testing | - | ||
Model Documentation | - | ||
Data Source Connectivity | - | - | - |
Source Code |
About Jessie James
Junior ML Engineer | NLP Researcher | Linux Administration
Hinigaran, Philippines - 10:22 am local time
Steps for completing your project
After purchasing the project, send requirements so Jessie James can start the project.
Delivery time starts when Jessie James receives requirements from you.
Jessie James works on your project following the steps below.
Revisions may occur after the delivery date.
1. Data Sourcing
Collect raw text from provided files, web scraping, or APIs. Deliver a raw data log.
2. Data Cleaning & Normalization
Deduplicate, fix encoding, normalize text (lowercase, punctuation, abbreviation expansion). Provide cleaned dataset.
