AI Document Data Extraction — Test Assignment Specification
Worldwide
1. Objective We need to build a solution that can automatically extract specific business data points from complex document packages. The documents may include contracts, leases, amendments, reports, financial documents, due diligence materials, exhibits, schedules, and other supporting files. A document package may contain hundreds of pages and several related documents. The same business field may be mentioned in multiple places, changed by later documents, or appear in sections that look relevant but should not be used. The goal is to return the most accurate, applicable, and business-correct value for each requested data point. 2. Input The system should receive: one or more documents as a document package; a list of requested data points; a business description of what each data point means; an expected output format. The documents may contain: original agreements; amendments or addendums; exhibits and schedules; supporting reports; historical or superseded provisions; similar but non-applicable clauses; conflicting or overlapping language. 3. Output For each requested data point, the system should return: extracted value or summary; source reference showing where the answer came from; indication when no relevant information is available; indication when the result is uncertain or requires human review; short explanation of why the value was selected. The output should be concise, structured, and suitable for review by an operations or business user. 4. Example Data Point: Landlord Audit Rights Business Meaning Determine whether the Landlord has the right to audit, inspect, review, or verify Tenant’s books and records when those records relate to Gross Sales, Percentage Rent, sales reporting, or similar revenue-based rent obligations. This is an example of a complex data point. It is not enough to simply search for the words “audit”, “records”, “books”, or “sales”. The system must determine whether the language actually describes an applicable Landlord audit right. 5. Business Rules for This Example The system should include provisions where Landlord has a right to: audit, inspect, examine, review, verify, or reconcile Tenant’s books or records; review records related to Gross Sales, Percentage Rent, sales statements, or sales reporting; exercise the audit right within a specific time period or lookback window; verify reported sales or revenue-based rent obligations. The system should exclude provisions that only describe: procedural audit details; notice requirements; audit location; who performs the audit; business hours; audit costs or reimbursement; penalties or remedies; consequences of audit findings; default-related rights; termination-related rights; expired or historical rights; finality, waiver, or dispute periods that do not grant an actual audit right. If a later related document changes the original provision, the system should use the currently applicable language. If the relevant information is not available, the system should return: No information available 6. Example Output ( "datapoint": "Landlord Audit Rights", "value": "Landlord may audit Tenant's books and records relating to Gross Sales within two years after receipt of the applicable sales statement.", "source": ( "document": "Lease Agreement", "page": 84, "section": "Percentage Rent / Sales Records" ), "status": "Extracted", "requiresReview": false, "explanation": "The selected clause grants Landlord the right to audit Tenant's books and records relating to Gross Sales and includes a two-year audit period." ) 7. Quality and Iteration Expectations The extraction results should be evaluated against expected business-approved answers. The solution should be designed with the understanding that complex documents may contain edge cases, ambiguous wording, conflicting provisions, superseded language, and similar but non-applicable text. Business rules, extraction instructions, examples, and edge cases should be possible to refine over time based on review results and quality findings. The approach should make it possible to understand: which data points were extracted correctly; which data points were missed; where similar but non-applicable text was incorrectly selected; where conflicting or updated document language caused an incorrect result; which cases require clearer business rules or additional examples. The goal is not only to produce one extraction result, but to demonstrate a path for improving accuracy and consistency over time. 8. Expected Deliverable Provide a short explanation of the proposed approach and, if possible, a working prototype or pseudo-implementation. The solution should demonstrate how it would: process a package of related documents; identify relevant information for a requested data point; avoid extracting similar but non-applicable text; handle conflicts between original documents and later related documents; return a structured answer with source reference; identify cases where no reliable answer can be found; identify cases that require human review; support evaluation of extraction quality; allow business rules and extraction behavior to be improved over time. The technical approach, tools, architecture, and implementation details may be chosen freely. (edited)
- Less than 30 hrs/weekHourly
- 3-6 monthsDuration
- ExpertExperience Level
$30.00
-
$40.00
Hourly- Remote Job
- Ongoing projectProject Type
Skills and Expertise
Activity on this job
- Proposals:50+
- Last viewed by client:2 days ago
- Hires:1
- Interviewing:4
- Invites sent:0
- Unanswered invites:0
About the client
- United StatesBrooklyn1:47 PM
- $277K total spent25 hires, 5 active
- 9,855 hours
Explore similar jobs on Upwork
How it works
Create your free profileHighlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you wantApply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securelyFrom contract to payment, we help you work safely and get paid securely.
About Upwork
- 4.9/5(Average rating of clients by professionals)
- G2 2021#1 freelance platform
- 49,000+Signed contract every week
- $2.3BFreelancers earned on Upwork in 2020
Find the best freelance jobs
Growing your career is as easy as creating a free profile and finding work like this that fits your skills.
Trusted by