-You need to use a Hadoop distribution (either local VM such as VirtualBox or on Google Cloud) to create a Kafka Cluster, connect it to Flume and push some CSV files we will download to HDFS (link to be provided)
-Once the data is in HDFS, you need to manipulate it using Pig, Hive and Scala and store it in Parquet format
-Final step is to send the data to Elastic Search
If this sounds interesting to you, please reach out to me with your price and I can send more documentation. The trick is to integrate all the various components (Kafka/Flumne/HDFS/Spark Scala) as I don't have the time unfortunately.
Thanks for your help!
Hours to be determined
Less than 1 month< 1 monthProject LengthDuration
I am looking for a mix of experience and value