We are looking for help with the creation of a tool/script that parses/scrapes data from the SEC EDGAR database. The use case is for a unique group of about 40-50 funds in which we seek to obtain financial reporting information. There is currently no central repository comparing these funds (returns, fees, assets, etc) or these metrics.
There are two potential sources of data we would like to tap in order to capture the desired metrics. Initially, we would prove the scraping feasibility by extracting only a few critical fields from a specified SEC report. To the extent we are able, we would then be interested in obtaining more granular data for each fund. That data is currently avaiable in the form of HTML and PDF documentation. We would like to consume this data in a more useful format for our database.
Each fund files (i) html documents with the SEC (an annual report, a semi-annual and 2 quarterly reports) and (ii) maintains a fund website that includes a Fund Factsheet in PDF format. We would like to take the information from either source, and post the relevant data to our SQL database. A listener will be attached to the SEC EDGAR RSS feed (sorted by fund ID/CIK) and can pull all documents associated with a particular filing. These documents are then considered ready for ingestion/scraping. We seek to determine the most efficient and reliable approach to executing this workflow are open to expert opinion on this matter.
Want more information? Contact us for specifics about the metrics we seek to capture, current platforms and architecture, infrastructure, estimates, etc.
Less than 30 hrs/week
More than 3 months3+ monthsProject LengthDuration
I am willing to pay higher rates for the most experienced freelancers