Product Categorization/Taxonomy tagging

Posted last week

Worldwide

Summary

We have a large catalog of about 100k products across many categories. We have several large product lists that need to be cleaned, organized, and classified within our provided taxonomy. This work can not be done by hand, and needs to be automated with a combination of scripts and Ai. We need a repeatable rules based process. What you'll do: - Build an automated classification pipeline that maps products to the correct categories from our controlled vocabulary (we provide the list — no free-typing). - Assign product attributes/specs programmatically from a defined set of valid values. Clean and normalize messy titles and field values, deduplicate, and fix systematic mis-tags across the whole list. - Automatically flag the rows the rules can't confidently resolve, so only a small share needs human review instead of all 100k. - Build QA checks and sampling so we can measure accuracy and catch errors. - Deliver clean output in CSV, formatted for a standard eCommerce product import. Required skills: - Python (pandas) or equivalent scripting for data work — this is the core requirement. - Experience with rule-based classification, text normalization, fuzzy/string matching, and deduplication on large datasets. - Strong with CSV/spreadsheet data and working to a defined schema or controlled vocabulary. - A QA mindset — building validation, sampling, and accuracy metrics. Please have previous project experience using scripts/llms to classify large sets of data. To apply, please answer: At ~100,000 rows, how would you approach automating classification against a fixed category list — what's your general method, and how do you handle the rows your rules can't confidently resolve? How do you measure and keep accuracy high on a job this size (validation, sampling, error checks)? What tools/libraries would you use, and roughly how large a dataset have you cleaned or categorized before? Thank you

  • More than 30 hrs/week
    Hourly
  • 3-6 months
    Duration
  • Expert
    Experience Level
  • Remote Job
  • Ongoing project
    Project Type
Skills and Expertise
Mandatory skills
Python
data classification
Activity on this job
  • Proposals:50+
  • Last viewed by client:yesterday
  • Interviewing:
    5
  • Invites sent:
    8
  • Unanswered invites:
    2
About the client
Member since Dec 21, 2018
  • United States
    Skokie4:04 AM
  • $12K total spent
    29 hires, 14 active
  • 356 hours
  • Individual client

Explore similar jobs on Upwork

Qualitative Research
Research Methods
Survey Design
Scientific Literature Review
Proofreading
Qualtrics
NVivo
Quantitative Analysis
Stata
Public Health
Statistical Analysis
SAS
Statistics
Data Analysis for Stock ExchangesHourly‐ Posted 1 month ago
Data Analysis
Statistics
Data Science
IBM SPSS

How it works

  • Post a job icon
    Create your free profile
    Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
  • Talent comes to you icon
    Work the way you want
    Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
  • Payment simplified icon
    Get paid securely
    From contract to payment, we help you work safely and get paid securely.
Want to get started? Create a profile

About Upwork

  • Rating is 4.9 out of 5.
    4.9/5
    (Average rating of clients by professionals)
  • G2 2021
    #1 freelance platform
  • 49,000+
    Signed contract every week
  • $2.3B
    Freelancers earned on Upwork in 2020

Find the best freelance jobs

Growing your career is as easy as creating a free profile and finding work like this that fits your skills.

Trusted by

  • Microsoft Logo
  • Airbnb Logo
  • Bissell Logo
  • GoDaddy Logo