Python Developer for Patent XML/PDF Data Ingestion Pipeline - Fixed Scope Trial

Posted last week

Worldwide

Summary

I need a Python data engineer to build a fixed-scope ingestion/parser milestone for patent data. This is not a full platform rebuild. The goal is to take provided sample patent data files and implement a clean, testable ingestion pipeline that can parse XML/PDF-source metadata into structured outputs. Sample is provided. Initial milestone scope: 1. Inspect provided Chinese patent XML sample package structure. 2. Build a Python parser for the sample XML files. 3. Extract key patent fields including: - publication/application identifiers - claims - claim numbers - independent/dependent claim indicators where available - description sections - bibliographic metadata - legal/current-owner metadata where available 4. Output parsed data into clean structured tables or files suitable for PostgreSQL loading. 5. Provide clear source-to-target field mapping. 6. Add basic tests using the provided sample files. 7. Provide runnable setup instructions. Possible follow-on work may include: - Korean PDF description extraction - Japanese bulk XML full-text parsing - PostgreSQL integration - translation pipeline integration Important: - This first milestone does not include dashboard work. - This first milestone does not include production deployment. - This first milestone does not include legal translation. - Please do not estimate a large open-ended rebuild. I am looking for a practical fixed-scope parser/data pipeline milestone. Ideal freelancer: - Strong Python experience - Comfortable with XML parsing - Comfortable with messy real-world data files - Experience with ETL/data pipelines - PostgreSQL experience is helpful - Patent data experience is a plus but not required Please include in your proposal: 1. Similar XML/ETL parsing work you have done. 2. How you would structure the parser. 3. What you would deliver for the fixed-price milestone. 4. Confirmation that you understand this is a bounded trial milestone, not a full platform rebuild.

  • $500.00

    Fixed-price
  • Intermediate
    Experience Level
  • Remote Job
  • Ongoing project
    Project Type
Skills and Expertise
Mandatory skills
API Integration
Python
Activity on this job
  • Proposals:20 to 50
  • Last viewed by client:last week
  • Interviewing:
    1
  • Invites sent:
    0
  • Unanswered invites:
    0
About the client
Member since May 22, 2025
  • IND
    Hyderabad9:36 PM
  • $542 total spent
    4 hires, 2 active
  • 9 hours

Explore similar jobs on Upwork

Job Aggregation and Dashboard CreationHourly‐ Posted 8 months ago
Automation
Data Scraping
Data Extraction
API
UI/UX Prototyping
Web Scraping
AI UGC Video Generator - n8n WorkflowHourly‐ Posted 7 months ago
Automation
AI Agent Development
CRM Automation
Artificial Intelligence
Business Services
AI Content Creation
Make.com
n8n
OpenAI API
Business Process Automation
Lead Generation
Google Ads
Email Automation
Social Media Marketing Automation
Omnichannel Automation

How it works

  • Post a job icon
    Create your free profile
    Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
  • Talent comes to you icon
    Work the way you want
    Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
  • Payment simplified icon
    Get paid securely
    From contract to payment, we help you work safely and get paid securely.
Want to get started? Create a profile

About Upwork

  • Rating is 4.9 out of 5.
    4.9/5
    (Average rating of clients by professionals)
  • G2 2021
    #1 freelance platform
  • 49,000+
    Signed contract every week
  • $2.3B
    Freelancers earned on Upwork in 2020

Find the best freelance jobs

Growing your career is as easy as creating a free profile and finding work like this that fits your skills.

Trusted by

  • Microsoft Logo
  • Airbnb Logo
  • Bissell Logo
  • GoDaddy Logo