My name is Tyler. I have expertise in building front office investment solutions and using those solutions as a financial analyst. This is a test project.
I'd like an expert to help me think through the beginning of a longer term big data project. This should be fairly enjoyable to you guys. I'm looking for a small time commitment right now and would like you to help me brainstorm.
Here's the basic idea. I want to create a fairly simple ETL architecture that can pull in data from a variety of sources / formats. Right now the two in question will be Yahoo Finance Data and Twitch.tv data have RESTful json APIs that should make things fairy easy. But I want to build with the flexibility in mind and to eventually support unstructured data. Basically I could run a simple script and pull this into a MySQL database, but I want build this as if it were a long term solution for a larger agenda.
The part I need help with is thinking through the technologies I should use as well as what the ETL pipeline architecture should look like. I'm sure many of you have worked on Big Data analytic projects and have learned a lot in the process.
At first, all I'm looking for is an adviser. Someone that can give me some pointers or tips. Someone I could pay a couple of hours for their time. That said, over time, I would see the relationship expanding into something more interesting.
I'd like to use Google Cloud Platform over AWS but I'm pretty naive. I'm much less familiar with GCP and have created a few test projects to get use to the platform. To me AWS feels too heavy for this project. If you're going to build an enterprise solution you'd go with AWS, their new Kinesis Firehose, Hadoop, etc. However, I want to keep it really simple and I'm not as familiar with non-relational database. I'm not oppose to something like that but I'm looking to start pulling in data sooner rather than later.
I'd like to use Python or Java as the ingestion language. I'm familiar with both and am learning towards Python.
So a couple questions I'd like to answer in your time with me.
1. GCP or AWS bearing in mind simplicity in the architecture is key.
2. Technologies, scripts, workflows that I need to go from data ingestion to pushing it into data storage, to cleaning, to analyzing and potentially computing proprietary numbers.
3. If there is a potential to package sell finalized analytic data, I need reliability, security, backups.
4. Data architecture / etc
5. Tips / learning tools that helped you and could help me get up and running quick.
One important requirement is fluent English.
So if any of you are interested in being an adviser / teacher let me know.