Looking for a Senior Software Engineer with extensive experience building Optical Character Recognition (OCR) systems to extract data from structured forms. We were hoping to build our solution with open source tools like Tesseract and OpenCV, but we did not get the results we wanted for speed and accuracy. If it is not possible to use these open source tools, we may have to use the ABBY Flexicapture Engine SDK to build our solution. Using that SDK would create recurring costs for our system, which is not desirable, but a course of action we may have to take. But, before we go that direction, we wanted the opinion/direction of an OCR expect to help us compare these tools.
Our open source system works great when extracting data from standardized PDF tax forms (from the IRS website). But, we encounter many forms that are homegrown versions of these forms that do not match the exact geometry of the IRS forms. So, our current Key Point Matching and Homography solution will not work reliably. Therefore, new methodologies need to be developed to enable the library to process a wider variety of both tax forms and versions of a single tax form.
If this solution works, it will lead to a multiple-year project. We have many uses for this kind of tool, and will need OCR expertise on our staff for the long-term.