I'm looking someone with experience in performing Optical Character Recognition (OCR) for scanned PDFs. I have many thousands of scanned PDFs that I need the text of to be used for an internal project. The scanned PDFs contain blocks of text and also tables that would require OCR.
The nature of this project requires that the OCR be as close to 100% accurate as possible. The use of any technology is acceptable (tesseract, ABBYY, etc) as long as the OCR of the PDF files is as close to 100% as possible.
I will provide the files in PDF format via Dropbox and the deliverable format should be in .txt format (no formatting other than line breaks required).
Proficiency in english (written and spoken) is a must-have requirements for this job and be able to communicate status updates and issues.
There is a short term need to digitize 2,000 files and potential for follow on work up to 500 files a month there after.