Text Extraction from Scanned PDF document

Closed - This job posting has been filled and work has been completed.
Web, Mobile & Software Dev Web Development Posted 4 years ago

Fixed Price

Delivery by September 10, 2012




I have some scanned(OCR) & vector PDF documents in my server. I need an expert in "tesseract", "ocropus" and/or "cuneiform" to suggest me a good solution to automate the task.

The task is to extract text content from PDF document and update in database. I already have the script written in PHP. It currently uses ocrwebservice.com webservice. But this webservice is not returning all the text found in a page image. So I am now looking for a better solution.

Your responsibilities:
1. You check "tesseract", "cuneiform" and "ocropus" and let me know which one could provide the best result
2. You tell me how to install the same in linux environment. I have ubuntu VPS server. I install it in my server and take your help, only when required.
3. I give you my current script. You change the calls from ocrwebservice.com to whatever we decide to use
4. I test the script and we complete the project.

For an expert in this area, it is going to be really simple and may be you could complete it in a day.

Skills: pdf, linux

About the Client

(5.00) 5 reviews

Berlin 12:01 PM

16 Jobs Posted
44% Hire Rate, 1 Open Job

$667 Total Spent
7 Hires, 0 Active

Member Since Aug 19, 2012