I'm looking for someone familiar with Python, Django, Celery, Big Data, Postgres. We are currently working on a SaaS product and need to improve and enhance a given code.
You'll be dealing with a SaaS product (not in production nor set up on a live server yet) that is for the purpose of lead generation. Company and contact information shall be automatically obtained by the system and give the user the possibility to find prospective customers.
The following things need to be done and integrated in the existing code:
- check and debug celery worker / tasks for working properly again (maybe seperate different tasks to different workers; check why saving problems occur)
- improve code and make it more efficient and faster
- replace GoogleScraper & Proxy with Bing API (if possible just adding Bing API would be favorite - so that I just need to select in Django admin view which way to go for obtaining data (seperated between person or company data) - a) scraper or b) API
- every API request need to be downloaded in our own DB and each result needs to be stored there
- if multiple addresses have been found for a company, the one with highest identity factor should be choosen and shown as main address
- complete sites and all related subpages from websites should be downloaded and stored in DB (corporate website e.g. 1)Home 2) About Us 3) News 4) Team 5) Customers 6) Products 6a) Product A 6b) Product B ... etc ...
- subsquently all the important text (about us/home/product texts) shall be extracted and saved in the main database, directly associated with the company.