Python developer for parsing HTML regulatory documents

Posted 3 weeks ago

Worldwide

Needs to hire 10 Freelancers
Summary

We turn structured documents from public websites into clean, structured JSON. For each source you'll receive a list of sample URLs and write one HTML parser for that source - mostly by pointing CSS selectors at the page's headings and body so the document comes out as clean, structured JSON (chapters, sections, articles). This is mostly CSS selectors, with some Python. If you're comfortable reading HTML in your browser's dev tools and writing selectors like div.content h2.section-title, you can do this work. The development loop is fast and fully offline You capture a copy of each page yourself and work against it offline. No logins, and no access to any client app or system. Your work is checked by automated validators plus a quick manual review. When it passes both, it's merged. How it pays Fixed price: $25 per validated parser. No hourly tracking. Billable quickly after parser is approved This is high, steady volume - on the order of 200 parsers in June alone, with more after - so there's ongoing work for people who do good work. Step one is a paid screening task The first thing you do is a real document parser. We hand you one source; you deliver it through the same pipeline everyone uses. If it passes validation and a quick review, you get paid for it and you're in. Required skills Comfort reading HTML and writing CSS selectors (the core skill) Basic Python Familiarity with BeautifulSoup Basic git / pull-request workflow (clone, branch, open a PR) Working style Fully remote, asynchronous, your own hours. overlap with European business hours (CET) is required for quick back-and-forth example url: https://www.irishstatutebook.ie/eli/2018/act/25/enacted/en/print.html a preview of a parsed document is in the attachment

  • $1,500.00

    Fixed-price
  • Entry level
    Experience Level
  • Remote Job
  • Ongoing project
    Project Type
Skills and Expertise
Mandatory skills
Web Scraping
Python
bs4
Activity on this job
  • Proposals:50+
  • Last viewed by client:last week
  • Hires:
    4
  • Interviewing:
    0
  • Invites sent:
    0
  • Unanswered invites:
    0
About the client
Member since Jun 19, 2021
  • Serbia
    Belgrade9:01 AM
  • $25K total spent
    56 hires, 5 active
  • 1,991 hours
  • Tech & IT
    Mid-sized company (10-99 people)

Explore similar jobs on Upwork

Local Lead GenerationHourly‐ Posted 2 weeks ago
Web Scraping
Data Scraping
Data Extraction
Lead Generation
Data Entry
Data Mining
Data Collection
Data Processing
Web Scraping Framework
Web Crawler Framework
Web Scraping Software
Web Scraping Plugin
Web API
Search Tool
Search Engine
Microsoft Word
Data Entry
Administrative Support
Microsoft Excel

How it works

  • Post a job icon
    Create your free profile
    Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
  • Talent comes to you icon
    Work the way you want
    Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
  • Payment simplified icon
    Get paid securely
    From contract to payment, we help you work safely and get paid securely.
Want to get started? Create a profile

About Upwork

  • Rating is 4.9 out of 5.
    4.9/5
    (Average rating of clients by professionals)
  • G2 2021
    #1 freelance platform
  • 49,000+
    Signed contract every week
  • $2.3B
    Freelancers earned on Upwork in 2020

Find the best freelance jobs

Growing your career is as easy as creating a free profile and finding work like this that fits your skills.

Trusted by

  • Microsoft Logo
  • Airbnb Logo
  • Bissell Logo
  • GoDaddy Logo