Turkish AI-Generated Text Detection

Posted 2 weeks ago

Worldwide

Summary

We are seeking a skilled freelancer to enhance our existing (or newly created) machine learning model for detecting AI-generated text in Turkish. The ideal candidate will have experience in natural language processing and machine learning, with a focus on improving model accuracy and efficiency. The project involves working with Python and requires a strong understanding of AI-generated text detection techniques. --- **What we have built:** We have a working alpha system that detects AI-generated text in Turkish academic documents. The current pipeline: - Fine-tuned XLM-RoBERTa (xlm-roberta-base) on a labeled Turkish dataset - Chunk-based inference with sliding window sentence scoring - 94.2% accuracy on definitive decisions in production testing (Unfourtanetly real result is bad) - Trained on 5,000 labeled text across 25 academic disciplines (human vs AI-generated) **What we need:** Improve the model's performance using one or more of the following approaches. You choose the best method based on results: 1. **Temperature scaling / calibration** — Make confidence scores meaningful and well-calibrated 2. **Perplexity-based signal (Binoculars approach)** — Add a training-free generator-agnostic signal using two open Turkish-capable LLMs (e.g. Qwen2.5). Fuse with existing classifier. 3. **Paraphrase augmentation (RADAR approach)** — Augment AI training samples with paraphrased versions to improve robustness against humanization tools 4. **Stronger backbone** — Evaluate mDeBERTa-v3-base or XLM-R-large as drop-in replacement 5. **Active learning** — Identify and prioritize the most informative uncertain examples for labeling You may also build a brand-new model from scratch using the data. We accept that as well. **Deliverables:** - Python scripts only (no UI, no API, no deployment) - Training script with the improved method - Evaluation report: accuracy, FPR per category, comparison with baseline - Saved model weights + inference script **Data:** - 5,000 labeled JSONL examples provided (title-matched human/AI pairs, 53 disciplines, 5 AI models) - Can be increased if needed (by me if needed X10) - Format: `"discipline", "title", "text" (human), "ai_text" (AI), "model"` The most important point is this: we are testing with data that is not in the test set (i.e., not included in training). For example, we are testing with works written before 2020. Even though the scores appear high in Python's own internal test, the scores come out lower in real-world tests with data that was never part of the dataset. For that reason, "being done" does not mean the test performed in Python with the selected/curated data — it means a real-world test that the developer themselves can also conduct.

  • $300.00

    Fixed-price
  • Intermediate
    Experience Level
  • Remote Job
  • Ongoing project
    Project Type
Skills and Expertise
Mandatory skills
Machine Learning
Artificial Intelligence
Activity on this job
  • Proposals:20 to 50
  • Last viewed by client:last week
  • Hires:
    1
  • Interviewing:
    14
  • Invites sent:
    26
  • Unanswered invites:
    9
About the client
Member since Oct 11, 2021
  • Turkey
    Elazig3:20 AM
  • $650 total spent
    2 hires, 2 active

Explore similar jobs on Upwork

LLM and Prompt Engineering ExpertHourly‐ Posted 4 weeks ago
Python
LLM Prompt Engineering
Python
Machine Learning
AI Agent Development
OpenAI Codex
Agent GPT
Artificial Intelligence
Data Science
Data Analysis
Data Scraping

How it works

  • Post a job icon
    Create your free profile
    Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
  • Talent comes to you icon
    Work the way you want
    Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
  • Payment simplified icon
    Get paid securely
    From contract to payment, we help you work safely and get paid securely.
Want to get started? Create a profile

About Upwork

  • Rating is 4.9 out of 5.
    4.9/5
    (Average rating of clients by professionals)
  • G2 2021
    #1 freelance platform
  • 49,000+
    Signed contract every week
  • $2.3B
    Freelancers earned on Upwork in 2020

Find the best freelance jobs

Growing your career is as easy as creating a free profile and finding work like this that fits your skills.

Trusted by

  • Microsoft Logo
  • Airbnb Logo
  • Bissell Logo
  • GoDaddy Logo