Python Developer Needed for Legislative Data Pipeline, XML Parsing, and NLP-Based Bill Analysis
Worldwide
We are seeking an experienced Python developer to build a reproducible workflow for collecting, processing, and analyzing legislative bill data from a recent state legislative session. The project involves parsing legislative XML files, constructing a bill-level database, enriching records with legislator metadata, and performing initial text-based classification and exploratory analysis. ### Scope of Work * Parse a master legislative XML index file containing all measures introduced during a legislative session * Extract and structure bill-level metadata, including: * Bill number * Title * Sponsor information * Committee assignments * Status/history information * Related document links * Apply filtering and data-cleaning procedures to create a research-ready dataset * Merge bill records with legislator roster datasets to enrich sponsor information * Automate retrieval of linked bill-history XML files and associated bill-text documents * Build a reproducible data-processing pipeline that can be reused for future legislative sessions * Perform initial NLP-based topic classification and content categorization of bill text * Generate descriptive summaries and exploratory statistics across bills, sponsors, committees, and policy topics ### Technical Requirements Required experience: * Python * pandas * requests * lxml * BeautifulSoup * Regular Expressions (regex) * XML parsing and data extraction * Data cleaning and transformation workflows * Relational data merging and normalization Preferred experience: * Natural Language Processing (NLP) * Topic modeling or text classification * Document processing (PDF/XML) * Exploratory data analysis and visualization * Reproducible research workflows and project documentation ### Deliverables * Fully documented Python workflow * Clean bill-level analytical dataset * Automated data collection and processing scripts * Topic-classified bill dataset * Summary statistics and exploratory analytical outputs * Documentation explaining workflow execution and data structure ### Additional Information To keep the initial posting concise, detailed source materials, sample files, data schemas, and project-specific documentation will be shared only with shortlisted candidates. Selected candidates will receive access to representative XML files, supporting datasets, and additional project requirements necessary for preparing an accurate implementation plan and estimate. The solution should be modular, reproducible, and designed so that additional legislative sessions can be processed with minimal modifications.
- More than 30 hrs/weekHourly
- 1-3 monthsDuration
- ExpertExperience Level
$10.00
-
$40.00
Hourly- Remote Job
- Ongoing projectProject Type
Skills and Expertise
Activity on this job
- Proposals:50+
- Last viewed by client:2 weeks ago
- Interviewing:9
- Invites sent:17
- Unanswered invites:8
About the client
- United StatesWashington Township4:50 AM
- $332 total spent5 hires, 1 active
- 5 hours
- Health & FitnessMid-sized company (10-99 people)
Explore similar jobs on Upwork
How it works
Create your free profileHighlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you wantApply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securelyFrom contract to payment, we help you work safely and get paid securely.
About Upwork
- 4.9/5(Average rating of clients by professionals)
- G2 2021#1 freelance platform
- 49,000+Signed contract every week
- $2.3BFreelancers earned on Upwork in 2020
Find the best freelance jobs
Growing your career is as easy as creating a free profile and finding work like this that fits your skills.
Trusted by