You will get Automatic csv file ingestion: sorting of files by header, upload to cloud


Project details
The objective of this work was to create an automated bash script that sorts a folder of csv files, using the header name of the csv files. This project was for the final project of the Google Analytics Coursera Certification program (github.com/j622amilah/Case_Studies/tree/main/1_case_study_bikeshare). The bash code first reads in an initial header and labels this the current_header, it reads in each csv file and compares the new csv file header with the current_header. Each word/header in the current_header is compared with each word/header in the new csv file header, using both lexical matching and word similarity via a REST API Hugging Face model. Sufficiently similar words/headers in the new csv file are replaced by the equivalent word/header in the current_header. Once all the header words are compared, if the new csv file has the same header as the current_header it is moved to a folder called exact_match_header. If the new csv file has at least one header word that matches the current_header, the new csv file is moved to a folder called similar_match_header. If no header words match the current_header, it is moved to a folder called no_match_header.
What's included $100
These options are included with the project scope.
$100
- Delivery Time 14 days
- Number of Revisions 0
- Number of Pages Mined/Scraped 0
- Number of Sources Mined/Scraped 0
About Jamilah
Data Scientist | modeling, cloud, scientific computing
Lyon 03, France - 1:50 am local time
Steps for completing your project
After purchasing the project, send requirements so Jamilah can start the project.
Delivery time starts when Jamilah receives requirements from you.
Jamilah works on your project following the steps below.
Revisions may occur after the delivery date.
Re-verify that the code is in working condition
Add bash commands to upload the sorted csv files to a cloud service