Python/PostgreSQL Engineer for China Patent Data Pipeline Productionization, QA, and Scale-Up

Posted 6 days ago

Worldwide

Summary

We need a Python/PostgreSQL data engineer to implement the China-only patent data integration for an existing patent analytics platform. This is a fixed-price China milestone for IPPH patent data. NOTE : Japan and Korea loads are excluded based on conversations with Krish and may be handled later as separate follow-on work. The existing platform already has Python ingestion patterns, PostgreSQL, bronze/silver processing, MinIO/file-ingestion infrastructure, translation infrastructure, assignee standardization, and a dashboard. The goal is to extend the existing system, not rebuild it from scratch. Total Budget: $800 fixed price Milestone 1: IPPH File Ingestion, XML Parsing, and Initial Database Load Budget: $300 Scope: - Inspect the IPPH sample / initial delivery package structure. - Use the existing MinIO/file-ingestion pattern. - Handle package manifests and nested ZIP/XML packages. - Parse key China patent fields where available: - publication identifiers - application identifiers - claims - claim numbers - independent/dependent claim indicators - claim counts - description sections - bibliographic metadata - legal status metadata - current owner / assignee metadata - applicant/inventor metadata - drawings metadata - rich citation fields - Load raw and parsed data into PostgreSQL following the existing bronze/silver architecture. - Preserve source traceability: source file, package date, document path, document ID, load timestamp, and processing status. Acceptance Criteria: - Provided IPPH sample files can be processed end to end. - Parsed records are loaded into PostgreSQL or clearly structured for PostgreSQL loading. - Key source fields are mapped and documented. - Failed/partial records are logged with useful error messages. Milestone 2: Delta Handling, Translation, and Assignee Standardization (Full Load) Budget: $300 Scope: - Implement CREATE/UPDATE/DELETE handling for the confirmed IPPH package format. - Track processed packages/documents to avoid duplicate loads on rerun. - Add retry-safe/idempotent behavior where practical. - Integrate Chinese-to-English translation using the existing approved model endpoint / infrastructure. - Store original Chinese text, English translation, translation status, model/prompt/version metadata, and errors. - Integrate Chinese applicant/current-owner/assignee names into the existing assignee standardization pipeline. - Preserve raw Chinese names and translated/normalized names. - Add confidence/status fields or review flags where useful. Acceptance Criteria: - Rerunning the job does not duplicate already processed records. - CREATE/UPDATE/DELETE records are handled according to the confirmed IPPH package semantics. - Chinese text is routed through the agreed translation endpoint and stored with status metadata. - Chinese assignee data flows through the existing standardization process. Milestone 3: Dashboard Integration, QA, Tests, and Handover Budget: $200 Scope: - Make China data visible in the existing dashboard. - Reuse existing dashboard patterns; no dashboard rebuild. - Ensure China records can be filtered/viewed in relevant existing views. - Surface key parsed fields and standardized assignee information where supported by the current dashboard. - Add focused tests using sample files. - Provide validation counts: - files processed - documents parsed - records loaded - translations attempted/succeeded/failed - assignee records processed - Provide runnable setup instructions and short handover documentation. Acceptance Criteria: - All ingested China records after assignee standardization and translation are visible in the existing dashboard. - Basic tests pass against sample data. - A run summary/log is available for validation. - Documentation is sufficient for another developer to run, monitor, and validate the pipeline.

  • $800.00

    Fixed-price
  • Intermediate
    Experience Level
  • Remote Job
  • Ongoing project
    Project Type
Skills and Expertise
Mandatory skills
API Integration
Python
Activity on this job
  • Proposals:10 to 15
  • Last viewed by client:6 days ago
  • Hires:
    1
  • Interviewing:
    0
  • Invites sent:
    1
  • Unanswered invites:
    0
About the client
Member since May 22, 2025
  • IND
    Hyderabad10:05 PM
  • $542 total spent
    4 hires, 2 active
  • 9 hours

Explore similar jobs on Upwork

Local Lead GenerationHourly‐ Posted 2 weeks ago
Web Scraping
Data Scraping
Data Extraction
Lead Generation
Data Entry
Data Mining
Data Collection
Data Processing
Web Scraping Framework
Web Crawler Framework
Web Scraping Software
Web Scraping Plugin
Web API
Search Tool
Search Engine
Microsoft Word
Data Entry
Administrative Support
Microsoft Excel

How it works

  • Post a job icon
    Create your free profile
    Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
  • Talent comes to you icon
    Work the way you want
    Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
  • Payment simplified icon
    Get paid securely
    From contract to payment, we help you work safely and get paid securely.
Want to get started? Create a profile

About Upwork

  • Rating is 4.9 out of 5.
    4.9/5
    (Average rating of clients by professionals)
  • G2 2021
    #1 freelance platform
  • 49,000+
    Signed contract every week
  • $2.3B
    Freelancers earned on Upwork in 2020

Find the best freelance jobs

Growing your career is as easy as creating a free profile and finding work like this that fits your skills.

Trusted by

  • Microsoft Logo
  • Airbnb Logo
  • Bissell Logo
  • GoDaddy Logo