We have a regular, updated set of CSVs which include non-western UTF-8 characters. We need a linux-based script that downloads the latest CSV, merges, de-duplicates and sorts the CSVs and then allows the completed CSV to be searched for string matches.
Merge Program Function(s)
1. Downloads latest CSV from web location
2. Merges data from latest CSV with master CSV, overwriting old data where duplicated
3. Sorts master CSV based on certain field
Search Program Function(s)
1. Accepts UTF-8 or ASCII character string
2. Searches through CSV for results and returns matching lines
3. Preferably performs search as quickly as possible.
The biggest headache here is handling the UTF-8 character sets. We have CSVs of several different languages that will need to be handled using these two tools.