You will get Custom RAG structured and sourced QA pairs


Project details
I can produce a high quality QA RAG structured dataset fully custom and oriented to your specific needs with minor constraints on context scale. (2000 characters).
The job will be performed by an in house, custom foundation model designed for the task and followed up by light automated cleaning. The final output serves as an enriched training substrate- appending a question and a subject classification to the input and producing a JSON structure to contain this as a relationship.
This process will create a QA, RAG map of a given information source.
The job will be performed by an in house, custom foundation model designed for the task and followed up by light automated cleaning. The final output serves as an enriched training substrate- appending a question and a subject classification to the input and producing a JSON structure to contain this as a relationship.
This process will create a QA, RAG map of a given information source.
Machine Learning Tools
NumPy, Python, PyTorch, SQL, Tesseract OCRWhat's included
| Service Tiers |
Starter
$119.99
|
Standard
$189.99
|
Advanced
$249.99
|
|---|---|---|---|
| Delivery Time | 3 days | 6 days | 10 days |
Number of Revisions | 0 | 0 | 0 |
Model Validation/Testing | - | - | - |
Model Documentation | - | - | - |
Data Source Connectivity | - | - | - |
Source Code | - | - | - |
Optional add-ons
You can add these on the next page.
Fast Delivery
+$49.99 - $199.99
Additional Revision
+$49.99About Cameron
LLM ML Data Engineer
Rycroft, Canada - 11:29 pm local time
AI developer and LLM engineer with end-to-end experience in custom GPT model design, synthetic dataset generation, and domain-specialized automation systems. Creator of a from-scratch 200M GPT model, built on a custom GPT decoder architecture featuring SwiGLU activation, rotary positional embeddings (RoPE), learned positional embeddings (WPE), and adapter support (integration completed, training pending). Primary specialization in RAG-Function calling, dialog generation pipelines and SQLite database-backed data orchestration, producing 18+ high-quality datasets across technical support, STEM education, drone telemetry, and embedded systems troubleshooting.
Demonstrated via live Discord AI server with real-time domain-specific dialogue as well as a diversified range of procedural synthetic and model generated synthetic datasets on Hugging Face. ex: Cosmopedia_QA_RAG_JSON_SQLite
SUMMARY OF SKILLS
* Expert level computer hardware and software knowledge
* Experienced Customer Service oriented Tech Support as well as AI model and software development
Tech Stack
* Linux: Ubuntu, Lubuntu, Kali, Android
* Windows: All versions up to Windows 10
* Server OS: Windows Sever 2003 - 2012
* Microsoft Office 2003 - 2010 - 365
* Protocols: Pop3, SMTP, UDP, TCP, SOAP
* Databases: SQLite, MySQL, MS Access
* Languages: JAVA (Selenium, HTMLUnit, JSoup) VB.Net, HTML 5, CSS
Steps for completing your project
After purchasing the project, send requirements so Cameron can start the project.
Delivery time starts when Cameron receives requirements from you.
Cameron works on your project following the steps below.
Revisions may occur after the delivery date.
Aggregating Data Sources
With the targeted information outlined, I can begin processing data and generating results.
Manual Review and Cleaning
I check the data for quality and consistency and confirm with your expectations. Updating the logic as necessary for better results.



