Build a PDF Document Classification Script Using Python

Posted 5 days ago

Worldwide

Summary

We are looking for a Python developer to build a standalone script that automatically categorizes PDF documents into predefined categories. The script will extract text from each PDF, process the content, and output a structured result showing the category for each file. This is a small, self-contained task focused on clean, working code. Job Responsibilities: 1. Develop a Python script that reads PDF files from a folder. 2. Extract text from each PDF using a Python library (e.g., `PyPDF2` or `pdfplumber`). 3. Implement text preprocessing such as cleaning, tokenization, and stop-word removal. 4. Apply rule-based logic or a lightweight ML model to classify PDFs into predefined categories. 5. Generate an output file (CSV or JSON) containing: - PDF file name - Assigned category - Optional confidence score 6. Handle encrypted, empty, or unreadable PDFs gracefully and provide meaningful error messages. Requirements: - Strong Python experience - Familiarity with PDF text extraction libraries (`PyPDF2`, `pdfplumber`, or similar) - Experience with text processing / NLP (Pandas, NLTK, or similar) - Ability to implement clear, maintainable logic Deliverables: - Standalone Python script (`.py`) - Sample input PDFs and output file demonstrating the classification - Short instructions on how to run the script Notes: This task is focused on quickly producing a working, reliable script that can classify PDF documents according to predefined rules or categories.

  • Not Sure
    Hourly
  • < 1 month
    Duration
  • Expert
    Experience Level
  • Remote Job
  • One-time project
    Project Type
Skills and Expertise
Mandatory skills
Python
Machine Learning
Activity on this job
  • Proposals:50+
  • Last viewed by client:4 days ago
  • Interviewing:
    4
  • Invites sent:
    0
  • Unanswered invites:
    0
About the client
Member since May 30, 2026
  • United States
    Ragland5:39 PM

Explore similar jobs on Upwork

GHL Tech Specialist/Team LeadFixed-price‐ Posted 1 month ago
n8n
Zapier
Automation
API Integration
Salesforce CRM
HighLevel
Customer Relationship Management
Email Marketing
Email Campaign Setup
Marketing Automation
Pipedrive

How it works

  • Post a job icon
    Create your free profile
    Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
  • Talent comes to you icon
    Work the way you want
    Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
  • Payment simplified icon
    Get paid securely
    From contract to payment, we help you work safely and get paid securely.
Want to get started? Create a profile

About Upwork

  • Rating is 4.9 out of 5.
    4.9/5
    (Average rating of clients by professionals)
  • G2 2021
    #1 freelance platform
  • 49,000+
    Signed contract every week
  • $2.3B
    Freelancers earned on Upwork in 2020

Find the best freelance jobs

Growing your career is as easy as creating a free profile and finding work like this that fits your skills.

Trusted by

  • Microsoft Logo
  • Airbnb Logo
  • Bissell Logo
  • GoDaddy Logo