I've got a CSV dataset A that consists of 1 column of identifying numbers and 2 columns of text (1-3 sentences each). Each row is a separate identifying number and case. I've also got a CSV file B that has a list of words in one column and a corresponding numerical value (An 'X-score') for each word in another column.
I'd like to have a program that searches through each column of dataset A and does the following:
1) removes all punctuation from the text, etc
2) counts the number of words in total per text (1 text per row)
3) identifies the number of words in the row that are contained within the CSV B (ones that are found on the list and have a 'x-score'
4) Creates an average of the x-scores of the words from each row dataset A
5) lists the maximum x-score from all the words in a column
In addition to some other related stuff. This should be relatively quick for someone familiar with these of techniques, I could do this in excel with a vlookup as well as some other functions, but in R I'm not sure.
I am looking for a mix of experience and value