Parse PDFs into Structured JSON
Worldwide
We are looking for an experienced Python developer with strong skills in PDF parsing and data extraction to help us process a large batch of educational PDF files (exam papers) into a structured JSON format. The exams contain a mix of text, multiple-choice questions, math formulas, reading comprehension texts, and graphical elements (diagrams, tables, and images). Your Responsibilities: Data Extraction: Extract text, multiple-choice options, and correct answers from the PDF files. JSON Structuring: Map the extracted data into a predefined, highly structured JSON schema. Image Cropping/Extraction: Programmatically identify, crop, and save relevant images, diagrams, and graphs associated with specific questions. Edge Case Handling: Handle complex layouts, including multi-column text, rotated pages, and questions that span across multiple pages. Required Skills & Experience: Proven experience working with PDF extraction libraries in Python (e.g., PyMuPDF / fitz, pdfplumber, or similar). Experience with OCR tools or Vision-Language Models (e.g., OpenAI GPT-4o, Claude 3.5 Sonnet) for parsing complex graphical layouts is a huge plus. Strong understanding of JSON and data structuring. Attention to detail – the output JSON must be 100% accurate and ready for production use. Project Scope: We will provide a set of test PDFs and the desired JSON schema. You will develop a scalable script/pipeline to process these files. Once the pipeline is validated, it will be run across our entire library of PDFs.
- Less than 30 hrs/weekHourly
- < 1 monthDuration
- IntermediateExperience Level
$8.00
-
$25.00
Hourly- Remote Job
- One-time projectProject Type
Skills and Expertise
Activity on this job
- Proposals:50+
- Last viewed by client:6 days ago
- Hires:1
- Interviewing:1
- Invites sent:1
- Unanswered invites:0
About the client
- SwedenLulea7:47 PM
- $1.1M total spent183 hires, 61 active
- 18,314 hours
Explore similar jobs on Upwork
How it works
Create your free profileHighlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you wantApply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securelyFrom contract to payment, we help you work safely and get paid securely.
About Upwork
- 4.9/5(Average rating of clients by professionals)
- G2 2021#1 freelance platform
- 49,000+Signed contract every week
- $2.3BFreelancers earned on Upwork in 2020
Find the best freelance jobs
Growing your career is as easy as creating a free profile and finding work like this that fits your skills.
Trusted by