So ultimately I have a few goals - extract keywords from a given document - extract topic from a given document - categorize a document and its resulting topic I need a system that is distributed. So it can scale in request capacity I'd like to leverage cloud storage for the documents and I'm open to recommendation for database for the results I'd love a system we can train. It outputs what it thinks, we tell it yes/no to each keyword, topic or category suggestion and it learns from that. And ultimately if we can tell how close a topic is to a given category in relevance, that would be super useful. The whole thing definitely needs to be queue based. Event driven. And have a well document source and install set and be fully controllable by API. I am not willing to use third party APIs / services to handle the NLP part. I'm willing to hire on an hourly basis to consult on the NLP core part, or potentially hire for the full build.