Hello. I need assistance with textual parsing for an article on the US presidential debate. I am turning the transcript into data. I currently have a dataset where 1 observation is a full statement in the transcript but I want 1 sentence to be 1 observation. This implies parsing on periods (.), question marks (?), and exclamations (!) to create a new observation rather than a new variable. The new variable will obviously have the same value for the Person field because it was spoken by the same person in consecutive sentences.
There are currently 315 statement-observations and when you finish there will be many more sentence-observations. It is very important to fill in the field for "Person" and "ID" in the spreadsheet. The IDs must be consecutive such that sorting by ID will give a transcript that reads in the order of the actual debate.
In addition, please create another indicator variable equal 1 if the sentence-observation is longer than 244 characters.
See the attached example, demonstrate your understanding of the project, and ask any remaining questions you have in your proposal. I would love to get this taken care of within 1 day. If this goes well I may do the same for the two future debates.