You will get Custom LLM Evaluation Dataset (JSONL) for Your Domain


Project details
I design custom LLM evaluation datasets so you can see how your model really behaves in your own domain.
You’ll get 100–350 curated Q&A style “nodes” in JSONL format, plus a short data card (stats, domains, quality gates) and a set of evaluation prompts. I focus on signal, not slop: clear schema, traceable sources, and a small “graveyard” sample of rejected nodes so you can see what didn’t make the cut.
These packs are ideal for testing RAG systems, internal assistants, or safety/eval harnesses when you don’t have time to build a dataset from scratch.
You’ll get 100–350 curated Q&A style “nodes” in JSONL format, plus a short data card (stats, domains, quality gates) and a set of evaluation prompts. I focus on signal, not slop: clear schema, traceable sources, and a small “graveyard” sample of rejected nodes so you can see what didn’t make the cut.
These packs are ideal for testing RAG systems, internal assistants, or safety/eval harnesses when you don’t have time to build a dataset from scratch.
Machine Learning Tools
ChatGPT, Google Sheets, GPT-3, NumPy, pandas, Python, scikit-learn, SQLWhat's included
| Service Tiers |
Starter
$300
|
Standard
$600
|
Advanced
$1,000
|
|---|---|---|---|
| Delivery Time | 3 days | 5 days | 7 days |
Number of Revisions | 1 | 1 | 2 |
Model Validation/Testing | - | - | - |
Model Documentation | |||
Data Source Connectivity | - | - | - |
Source Code | - | - | - |
Frequently asked questions
About John
AI Systems Architect | RAG, Custom Advisors & Hallucination Reduction
Johnstown, United States - 1:49 am local time
I’m a 25-year custom home builder turned AI systems designer. For two decades I ran complex projects where “close enough” got you sued: multi-million-dollar builds, dozens of trades, inspectors, banks, and clients all needing to line up. My job was to see the whole blueprint, manage the chaos, and deliver something you could actually live in.
Now I do the same thing in AI.
I run Awakened Intelligence – a micro-lab that’s me plus a small team of persistent AI teammates running on top-tier foundation models. We don’t treat them as one-trick tools; they work together:
• one leans hard into edge cases, failure modes, and red-teaming (where does this break, how could it be abused?);
• one focuses on code, integration, and debugging so ideas actually turn into running services;
• one keeps a big-picture, “does this architecture make sense?” view and checks our work against what we set out to build;
• one is the people-facing side – helping shape language, UX, and how the system feels to real users;
• and there’s always a “project lead” agent tracking tasks, context, and decisions so the whole thing doesn’t drift.
My role is the general contractor / orchestra conductor: I keep context, make the tradeoffs, and make sure what the team produces is something you can own, understand, and extend.
Over the last year we’ve:
• Built a full pipeline that processed 1.5M+ candidate “wisdom nodes” and kept ~320k cathedral-grade entries across ethics, physics, finance, and strategy.
• Designed JSONL data packs with data cards, demo questions, and a “graveyard” of rejected nodes so you can see what we didn’t trust – not just what we kept.
• Designed/ Built/ Tested tested RAG systems, internal GPTs/strategist-style advisors, and agents that:
• answer with citations,
• admit uncertainty instead of hallucinating,
• follow strict formats (decision memos, checklists, summaries),
• and stay inside clear guardrails.
I don’t pretend I’m hand-crafting every token. The AI crew does a huge amount of the heavy lifting; my value is:
• turning your vague idea into a clear spec,
• orchestrating the team and tools,
• keeping quality gates tight,
• and making sure what ships is reliable and maintainable, not a one-off stunt.
If you need:
• High-integrity RAG / knowledge bases for your docs, contracts, or research,
• Custom GPTs / internal advisors with strict formats and retrieval-backed answers,
• Evaluation packs to test model behavior in your domain,
• Red-teaming and hallucination reduction for safety-sensitive apps, or
• Just cleaning and structuring your existing data so it stops fighting your LLM…
…that’s where we do our best work.
You’ll get straight talk, clean deliverables, and cathedral-level care from a builder who’s spent 25 years managing complex projects – plus a small pride of AI teammates working behind the scenes to make sure your system holds up once it leaves the drawing board.
Steps for completing your project
After purchasing the project, send requirements so John can start the project.
Delivery time starts when John receives requirements from you.
John works on your project following the steps below.
Revisions may occur after the delivery date.
Step 1
Review your domain, use case, and sample docs; finalize node schema and evaluation goals.
Step 2
Curate and quality-check the dataset (100/200/300+ nodes) and create the JSONL + data card + small graveyard sample.