As a proof-of-concept, I need a developer to build scripts that will scrape 3 websites, pull publicly available Tax Roll / Mineral Ownership data, normalize and store that data, and script the upload of that data to a cloud-based CRM system.
- The source data that we are scraping is delivered in non-uniform ways from sites with no public API.
- The CRM system will be cloud-based and have a documented API.
- The Tax Roll / Mineral Ownership data we scrape needs to be normalized and stored in an intermediary database.
- The DB write (scrape) processes need to correlate database entries with scraped website entries, to prevent duplication of input data, and to allow for updates to DB entries when source website data changes.
- The DB read (CRM API) processes need to correlate database entries with entries in the CRM system, to prevent duplicate entries in the CRM system, and to allow for CRM updates when the source data changes.
- Both input and output processes will be run on a recurring basis to populate the CRM system with updated data.
- Both input and output processes need to be designed to allow for additional scrape processes to be added in the future.
- This will be implemented using the following technologies:
Base CRM or Zoho CRM
The final deliverable is a configured environment, evidence of the above prerequisites, basic documentation on the use of the scripts, basic documentation on the environment (DB Schema, etc) and a demo account on a CRM system populated with the required data. I'm looking for a developer who is comfortable setting up the database and doing the basic Linux setup and config work. I'll provide the AWS and CRM accounts.
Examples of possible input sites: