AI Engineer / RAG Pipeline Developer for Compliance Law Management Information System
Worldwide
Key Responsibilities You will be responsible for building an end-to-end pipeline including: 1. Data Collection & Crawling - Design and implement web crawling pipelines for legal/compliance sources - Extract structured and unstructured legal content from websites and portals - Ensure compliance with robots.txt and legal scraping constraints 2. Document Processing (PDF + Text) - Build robust PDF parsing and extraction pipeline using tools like Docling - Handle complex legal documents (tables, footnotes, multi-column layouts) - Clean, normalize, and structure extracted content for downstream AI use 3. RAG Pipeline Development - Design and implement Retrieval-Augmented Generation architecture - Chunking strategies optimized for legal/compliance context - Embedding generation and metadata enrichment - Query understanding and response synthesis using LLMs 4. Vector Database (Pinecone) - Set up and optimize Pinecone vector database - Design indexing schema (metadata, filters, namespaces) - Optimize retrieval speed and accuracy - Implement hybrid search if needed (keyword + vector) 5. AI/LLM Integration - Integrate LLMs (OpenAI / open-source models) - Build prompt engineering for compliance/legal reasoning - Ensure traceability and citation-backed responses Required Skills - Strong experience building RAG systems in production - Hands-on experience with Pinecone or other vector databases - Experience with PDF parsing tools (Docling, PyMuPDF, Unstructured, etc.) - Strong Python backend development skills - Experience with web scraping/crawling frameworks (Scrapy, Playwright, etc.) - Familiarity with LLM APIs (OpenAI, Anthropic, or open-source models) - Understanding of embeddings, vector search, and semantic retrieval - Experience handling large-scale document pipelines Nice to Have - Experience with legal tech or compliance systems - Knowledge of information retrieval / NLP - Experience with LangChain, LlamaIndex, or similar frameworks - Cloud deployment (AWS/GCP/Azure) - Docker / Kubernetes experience Deliverables - Fully functional ingestion + crawling pipeline - PDF processing system using Docling or equivalent - Pinecone vector database setup with optimized schema - Working RAG system with API endpoints - Documentation of architecture and setup - Optional: simple UI for testing queries Project Type - Short-term MVP with potential for long-term extension - Possibility of ongoing development and scaling How to Apply Please include: - Relevant experience building RAG systems - Examples of similar AI or document intelligence projects - Your preferred stack for RAG pipelines - Any experience with legal/compliance data systems
- Less than 30 hrs/weekHourly
- 1-3 monthsDuration
- ExpertExperience Level
$10.00
-
$40.00
Hourly- Remote Job
- Complex projectProject Type
Skills and Expertise
Activity on this job
- Proposals:50+
- Last viewed by client:2 days ago
- Hires:1
- Interviewing:0
- Invites sent:0
- Unanswered invites:0
About the client
- United StatesColumbus3:12 AM
- $984 total spent5 hires, 1 active
- 35 hours
Explore similar jobs on Upwork
How it works
Create your free profileHighlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you wantApply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securelyFrom contract to payment, we help you work safely and get paid securely.
About Upwork
- 4.9/5(Average rating of clients by professionals)
- G2 2021#1 freelance platform
- 49,000+Signed contract every week
- $2.3BFreelancers earned on Upwork in 2020
Find the best freelance jobs
Growing your career is as easy as creating a free profile and finding work like this that fits your skills.
Trusted by