I need a script that can scrape word counts from posts on a web site. The input would be a text file of urls to be scraped, with one url per line. The output would be a csv file, where the first column is the url, the second column is the post title from the html title tag, and the third column is the word count. I'm looking for a desktop-based script that will run on a Mac.
The source to be scraped is a WordPress site, so the start and end points will be consistent across posts. But it would be great if there is an editable configuration/settings file where I can specify the start and end tags for the scraping in case we change the layout in the future.
With regard to the word count function, I would like the script to exclude words that are in image captions and html code.
February 20, 2018
I am looking for a mix of experience and value