I have 20 books that have been scanned to PDF files, equalling about 2500 pages in total.
The pages currently contain tabular data that I need to extract into CSV files.
Each of the books has a slightly different table layout, but you can see a sample of a pages from 3 of the books here:
BOOK 1: https://app.box.com/s/vgxajvp570i0jp3i5wyngxl3uxndxf2y
BOOK 2: https://app.box.com/s/leumulos7x02jbw1n5mqd5b05klsqcgz
BOOK 3: https://app.box.com/s/vt2qy7h410h94vo3w7kg89qdpi9vnhb4
I don't need the whole table from each page copied, just select fields as shown in this doc (sample of BOOK 1 URL above): https://app.box.com/s/vgxajvp570i0jp3i5wyngxl3uxndxf2y
Note, if more columns can be extracted in addition to those listed (without much extra work) that would be very useful to me.
This would suit someone who:
*has strong skills with a PDF editor who can use built in functions to process the tables automatically.
*can work programmatically with PDF document to extract fields. Most of the formats are almost identical, so scripts for each book will only need slight adjustment.
Please do not apply if you plan on doing this process manually is not realistic - it will take way too long.
Please do not apply if you have not opened the sample pages. This is a complex task, and I have had a number of freelancer win similar projects only to realise they could not complete it. Please don't be this person.
I would prefer to hire someone to complete the whole project (all 20 books).