Python/PostgreSQL Engineer for China Patent Data Pipeline Productionization, QA, and Scale-Up
Worldwide
We need a Python/PostgreSQL data engineer to implement the China-only patent data integration for an existing patent analytics platform. This is a fixed-price China milestone for IPPH patent data. NOTE : Japan and Korea loads are excluded based on conversations with Krish and may be handled later as separate follow-on work. The existing platform already has Python ingestion patterns, PostgreSQL, bronze/silver processing, MinIO/file-ingestion infrastructure, translation infrastructure, assignee standardization, and a dashboard. The goal is to extend the existing system, not rebuild it from scratch. Total Budget: $800 fixed price Milestone 1: IPPH File Ingestion, XML Parsing, and Initial Database Load Budget: $300 Scope: - Inspect the IPPH sample / initial delivery package structure. - Use the existing MinIO/file-ingestion pattern. - Handle package manifests and nested ZIP/XML packages. - Parse key China patent fields where available: - publication identifiers - application identifiers - claims - claim numbers - independent/dependent claim indicators - claim counts - description sections - bibliographic metadata - legal status metadata - current owner / assignee metadata - applicant/inventor metadata - drawings metadata - rich citation fields - Load raw and parsed data into PostgreSQL following the existing bronze/silver architecture. - Preserve source traceability: source file, package date, document path, document ID, load timestamp, and processing status. Acceptance Criteria: - Provided IPPH sample files can be processed end to end. - Parsed records are loaded into PostgreSQL or clearly structured for PostgreSQL loading. - Key source fields are mapped and documented. - Failed/partial records are logged with useful error messages. Milestone 2: Delta Handling, Translation, and Assignee Standardization (Full Load) Budget: $300 Scope: - Implement CREATE/UPDATE/DELETE handling for the confirmed IPPH package format. - Track processed packages/documents to avoid duplicate loads on rerun. - Add retry-safe/idempotent behavior where practical. - Integrate Chinese-to-English translation using the existing approved model endpoint / infrastructure. - Store original Chinese text, English translation, translation status, model/prompt/version metadata, and errors. - Integrate Chinese applicant/current-owner/assignee names into the existing assignee standardization pipeline. - Preserve raw Chinese names and translated/normalized names. - Add confidence/status fields or review flags where useful. Acceptance Criteria: - Rerunning the job does not duplicate already processed records. - CREATE/UPDATE/DELETE records are handled according to the confirmed IPPH package semantics. - Chinese text is routed through the agreed translation endpoint and stored with status metadata. - Chinese assignee data flows through the existing standardization process. Milestone 3: Dashboard Integration, QA, Tests, and Handover Budget: $200 Scope: - Make China data visible in the existing dashboard. - Reuse existing dashboard patterns; no dashboard rebuild. - Ensure China records can be filtered/viewed in relevant existing views. - Surface key parsed fields and standardized assignee information where supported by the current dashboard. - Add focused tests using sample files. - Provide validation counts: - files processed - documents parsed - records loaded - translations attempted/succeeded/failed - assignee records processed - Provide runnable setup instructions and short handover documentation. Acceptance Criteria: - All ingested China records after assignee standardization and translation are visible in the existing dashboard. - Basic tests pass against sample data. - A run summary/log is available for validation. - Documentation is sufficient for another developer to run, monitor, and validate the pipeline.
$800.00
Fixed-price- IntermediateExperience Level
- Remote Job
- Ongoing projectProject Type
Skills and Expertise
Activity on this job
- Proposals:10 to 15
- Last viewed by client:6 days ago
- Hires:1
- Interviewing:0
- Invites sent:1
- Unanswered invites:0
About the client
- INDHyderabad10:05 PM
- $542 total spent4 hires, 2 active
- 9 hours
Explore similar jobs on Upwork
How it works
Create your free profileHighlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you wantApply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securelyFrom contract to payment, we help you work safely and get paid securely.
About Upwork
- 4.9/5(Average rating of clients by professionals)
- G2 2021#1 freelance platform
- 49,000+Signed contract every week
- $2.3BFreelancers earned on Upwork in 2020
Find the best freelance jobs
Growing your career is as easy as creating a free profile and finding work like this that fits your skills.
Trusted by