AI Document Data Extraction — Test Assignment Specification

Posted 2 days ago

Worldwide

Summary

1. Objective We need to build a solution that can automatically extract specific business data points from complex document packages. The documents may include contracts, leases, amendments, reports, financial documents, due diligence materials, exhibits, schedules, and other supporting files. A document package may contain hundreds of pages and several related documents. The same business field may be mentioned in multiple places, changed by later documents, or appear in sections that look relevant but should not be used. The goal is to return the most accurate, applicable, and business-correct value for each requested data point. 2. Input The system should receive: one or more documents as a document package; a list of requested data points; a business description of what each data point means; an expected output format. The documents may contain: original agreements; amendments or addendums; exhibits and schedules; supporting reports; historical or superseded provisions; similar but non-applicable clauses; conflicting or overlapping language. 3. Output For each requested data point, the system should return: extracted value or summary; source reference showing where the answer came from; indication when no relevant information is available; indication when the result is uncertain or requires human review; short explanation of why the value was selected. The output should be concise, structured, and suitable for review by an operations or business user. 4. Example Data Point: Landlord Audit Rights Business Meaning Determine whether the Landlord has the right to audit, inspect, review, or verify Tenant’s books and records when those records relate to Gross Sales, Percentage Rent, sales reporting, or similar revenue-based rent obligations. This is an example of a complex data point. It is not enough to simply search for the words “audit”, “records”, “books”, or “sales”. The system must determine whether the language actually describes an applicable Landlord audit right. 5. Business Rules for This Example The system should include provisions where Landlord has a right to: audit, inspect, examine, review, verify, or reconcile Tenant’s books or records; review records related to Gross Sales, Percentage Rent, sales statements, or sales reporting; exercise the audit right within a specific time period or lookback window; verify reported sales or revenue-based rent obligations. The system should exclude provisions that only describe: procedural audit details; notice requirements; audit location; who performs the audit; business hours; audit costs or reimbursement; penalties or remedies; consequences of audit findings; default-related rights; termination-related rights; expired or historical rights; finality, waiver, or dispute periods that do not grant an actual audit right. If a later related document changes the original provision, the system should use the currently applicable language. If the relevant information is not available, the system should return: No information available 6. Example Output ( "datapoint": "Landlord Audit Rights", "value": "Landlord may audit Tenant's books and records relating to Gross Sales within two years after receipt of the applicable sales statement.", "source": ( "document": "Lease Agreement", "page": 84, "section": "Percentage Rent / Sales Records" ), "status": "Extracted", "requiresReview": false, "explanation": "The selected clause grants Landlord the right to audit Tenant's books and records relating to Gross Sales and includes a two-year audit period." ) 7. Quality and Iteration Expectations The extraction results should be evaluated against expected business-approved answers. The solution should be designed with the understanding that complex documents may contain edge cases, ambiguous wording, conflicting provisions, superseded language, and similar but non-applicable text. Business rules, extraction instructions, examples, and edge cases should be possible to refine over time based on review results and quality findings. The approach should make it possible to understand: which data points were extracted correctly; which data points were missed; where similar but non-applicable text was incorrectly selected; where conflicting or updated document language caused an incorrect result; which cases require clearer business rules or additional examples. The goal is not only to produce one extraction result, but to demonstrate a path for improving accuracy and consistency over time. 8. Expected Deliverable Provide a short explanation of the proposed approach and, if possible, a working prototype or pseudo-implementation. The solution should demonstrate how it would: process a package of related documents; identify relevant information for a requested data point; avoid extracting similar but non-applicable text; handle conflicts between original documents and later related documents; return a structured answer with source reference; identify cases where no reliable answer can be found; identify cases that require human review; support evaluation of extraction quality; allow business rules and extraction behavior to be improved over time. The technical approach, tools, architecture, and implementation details may be chosen freely. (edited)

  • Less than 30 hrs/week
    Hourly
  • 3-6 months
    Duration
  • Expert
    Experience Level
  • $30.00

    -

    $40.00

    Hourly
  • Remote Job
  • Ongoing project
    Project Type

Contract-to-hire opportunity

This lets talent know that this job could become full time.
Learn more
Skills and Expertise
Mandatory skills
Document AI
Activity on this job
  • Proposals:50+
  • Last viewed by client:2 days ago
  • Hires:
    1
  • Interviewing:
    4
  • Invites sent:
    0
  • Unanswered invites:
    0
About the client
Member since Sep 9, 2008
  • United States
    Brooklyn3:22 PM
  • $277K total spent
    25 hires, 5 active
  • 9,855 hours

Explore similar jobs on Upwork

Local Lead GenerationHourly‐ Posted 2 weeks ago
Web Scraping
Data Scraping
Data Extraction
Lead Generation
Data Entry
Data Mining
Data Collection
Data Processing
Web Scraping Framework
Web Crawler Framework
Web Scraping Software
Web Scraping Plugin
Web API
Search Tool
Search Engine
Microsoft Word
Data Entry
Administrative Support
Microsoft Excel

How it works

  • Post a job icon
    Create your free profile
    Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
  • Talent comes to you icon
    Work the way you want
    Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
  • Payment simplified icon
    Get paid securely
    From contract to payment, we help you work safely and get paid securely.
Want to get started? Create a profile

About Upwork

  • Rating is 4.9 out of 5.
    4.9/5
    (Average rating of clients by professionals)
  • G2 2021
    #1 freelance platform
  • 49,000+
    Signed contract every week
  • $2.3B
    Freelancers earned on Upwork in 2020

Find the best freelance jobs

Growing your career is as easy as creating a free profile and finding work like this that fits your skills.

Trusted by

  • Microsoft Logo
  • Airbnb Logo
  • Bissell Logo
  • GoDaddy Logo