I have a dataset of 29,894 sentences spoken on the floor of the US Congress. These were taken from 1834 pdfs.
I simply need someone to match the sentence with the speaker's name and if possible also state and district (I can provide these linked files too). The speaker's name is always introduced in the following way - 'Mr SMITH:' and so it should be relatively straightforward for someone who knows NLP in R or Python to automate the process. I'm attaching an example pdf file with this job.
If you have any questions please let me know.
I am looking for a mix of experience and value