You will get an AI agent trajectory audit and evaluation
Rising Talent

Rising Talent

Project details
If your AI agent works in demos but behaves unpredictably in production, you do not have an AI problem; you have an evaluation problem.
I evaluate AI agent trajectories, tool usage, task completion quality, and workflow efficiency using rubric-based analysis informed by real-world production engineering experience. My background as a senior software engineer allows me to assess not only whether an agent succeeds, but whether it follows the correct workflow to get there efficiently and reliably.
This service is designed for teams already running AI agents who need structured evaluation, trajectory analysis, and actionable reliability insights. I review real workflows, analyze tool-call behavior against expected trajectories, identify failure patterns, and provide clear recommendations to improve agent performance and operational reliability.
I evaluate AI agent trajectories, tool usage, task completion quality, and workflow efficiency using rubric-based analysis informed by real-world production engineering experience. My background as a senior software engineer allows me to assess not only whether an agent succeeds, but whether it follows the correct workflow to get there efficiently and reliably.
This service is designed for teams already running AI agents who need structured evaluation, trajectory analysis, and actionable reliability insights. I review real workflows, analyze tool-call behavior against expected trajectories, identify failure patterns, and provide clear recommendations to improve agent performance and operational reliability.
AI Development Type
Knowledge Representation, Model Tuning, Software MaintenanceAI Tools
Amazon SageMaker, Azure Machine Learning, Keras, MLflow, PyTorch, TensorFlowAI Development Language
PythonWhat's included $120
These options are included with the project scope.
$120
- Delivery Time 5 days
- Number of Revisions 2
- AI Model Integration
- Knowledge Graph
- Model Documentation
About Semih
Senior Full-stack Developer | E-commerce, Logistics, Custom Back-offic
Berlin, Germany - 12:26 pm local time
What I built:
• Beliaa Shop (beliaashop com): Aftermarket auto spare parts marketplace built on Medusa, Next.js, React. Custom vehicle-to-parts matching logic at scale.
• Trends Budget (trends-budget com): White-label e-commerce platform with full operations back-office (customer service, logistics, finance, expenses). Built in Laravel.
• Syal Express (syal-express com): End-to-end cargo logistics system handling waybill lifecycle, COD reconciliation, and delivery tracking. Built in Laravel.
Stack: Next.js, React, Node.js, Laravel, PHP, Medusa.
Bonus: I run my own production infrastructure. I can deploy and host your project on my managed VPS, including free for the first few months of a development engagement. I also offer standalone hosting and deployment services: migrations from expensive cloud platforms, broken deployments, and ongoing managed hosting from 25 EUR/month per app.
Available immediately.
Steps for completing your project
After purchasing the project, send requirements so Semih can start the project.
Delivery time starts when Semih receives requirements from you.
Semih works on your project following the steps below.
Revisions may occur after the delivery date.
Review Your Agent Workflow
Share your AI agent architecture, workflows, tools, and example traces or conversations.
Trajectory & Tool Evaluation
I analyze task completion, tool usage, workflow sequencing, and efficiency across real interactions.




