Python Developer Needed for Bulk Raw TXT Retrieval and Merging

Posted 2 weeks ago

Worldwide

Summary

If this task can be successfully completed, along with payment I will provide a very positive review along with potential future works with higher budget. I need a Python developer to run a bulk raw text retrieval workflow from a public archive/API structure. The goal is to collect many raw .txt documents, download their full text contents, and merge them into one or more large combined “mega TXT” files. The source data comes from the SEC’s public company filing archive. Each company has a unique company ID called a CIK, which is used to access that company’s filing records. The starting ID list comes from SEC’s ticker.txt, which currently contains about 12,000 ticker-to-CIK entries. The workflow should be able to process this list, starting with a 3-CIK test run before scaling to the full file. For the initial test, please use these 3 padded company IDs: 0000320187 0000320193 0000789019 For each company ID, request its public submissions JSON, for example: https://data.sec.gov/submissions/CIK0000320187.json From that JSON, extract the filing/accession records and generate the related raw TXT archive URLs, such as: https://www.sec.gov/Archives/edgar/data/320187/000032018726000037/0000320187-26-000037.txt The job is to: 1. Use the CIK list I will provide 2. Request each CIK’s submissions JSON 3. Generate the matching raw TXT archive URLs 4. Retrieve/download the full raw TXT contents 5. Merge the TXT contents into one or more large “mega TXT” files 6. Keep a separate log for completed, skipped, failed, and duplicate records 7. Handle large files efficiently without loading everything into memory You would be responsible for running the workflow and delivering the final combined TXT output files, along with the script used. The output does not need to be organized by filing type unless easy to add. The main goal is bulk retrieval and merging of raw TXT data into large combined files. After this 3-company test is confirmed, the same workflow should be reusable for the larger CIK list (12,000). All further details will be provided in direct messages.

  • $5.00

    Fixed-price
  • Entry level
    Experience Level
  • Remote Job
  • One-time project
    Project Type

Contract-to-hire opportunity

This lets talent know that this job could become full time.
Learn more
Skills and Expertise
Mandatory skills
Data Extraction
Scripting
Activity on this job
  • Proposals:20 to 50
  • Last viewed by client:2 weeks ago
  • Hires:
    1
  • Interviewing:
    0
  • Invites sent:
    0
  • Unanswered invites:
    0
About the client
Member since Oct 8, 2024
  • USA
    The Woodlands6:09 AM
  • $802 total spent
    82 hires, 15 active
  • Tech & IT
    Individual client

Explore similar jobs on Upwork

Job Aggregation and Dashboard CreationHourly‐ Posted 8 months ago
Automation
Data Scraping
Data Extraction
API
UI/UX Prototyping
Web Scraping
Install and Integrate UTM Grabber PluginFixed-price‐ Posted 3 weeks ago
WordPress

How it works

  • Post a job icon
    Create your free profile
    Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
  • Talent comes to you icon
    Work the way you want
    Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
  • Payment simplified icon
    Get paid securely
    From contract to payment, we help you work safely and get paid securely.
Want to get started? Create a profile

About Upwork

  • Rating is 4.9 out of 5.
    4.9/5
    (Average rating of clients by professionals)
  • G2 2021
    #1 freelance platform
  • 49,000+
    Signed contract every week
  • $2.3B
    Freelancers earned on Upwork in 2020

Find the best freelance jobs

Growing your career is as easy as creating a free profile and finding work like this that fits your skills.

Trusted by

  • Microsoft Logo
  • Airbnb Logo
  • Bissell Logo
  • GoDaddy Logo