01/20/2017 Travis Spangler

Where The “User” Stands With OCR Technology…


Automating a business process can entail many things – repurposing staff, changing corporate mindsets, and implementing new technology. Although the idea of this can seem daunting, rest assured that it is well worth the investment after you look at the savings and the assurance of business continuity! This article will focus on the user-based technology, in other words, the software that those who are actually doing the work will use.

First, we need to understand which parts of the process are done by a human and which parts are done by the software. You can find a simple chart below, but please note that the specific setup is different for each company and process.

StepStep NameStep DescriptionHuman or Software?
1ScanningUsing a scanner to transfer paper documents into digital formatHuman
2RecognitionExtracting data from a form or “batch” of documents/imagesSoftware
3VerificationValidating the extracted data from Step 2Human
4ExportSaving both the document and the extracted data in the business system of choiceSoftware

Now that we understand which steps are done by humans, let’s look at the technology those valuable people use. The first is what is called a “Scanning Station” or “Scanning Client”. Just like the description above shares, the purpose of the scanning station is to take physical paper documents and digitize them. The scanning station is typically connected to a local or network scanner where the user will feed paper to. Once the software captures an image copy of the document(s), it then presents them to the user to review and adjust. Common options are flipping the image X amount of degrees, cropping, deleting, and even redacting. More advanced options may allow the user to take the batch of scanned images and separate them into documents or change image resolutions. For a great example of scanning station technology, please check out our YouTube video of ABBYY’s scanning technology: https://www.youtube.com/watch?v=yOskkih5EKo

The other step done by humans in some organizations is Verification. Being able to understand what the software extracted, where it found it on the document, and make adjustments as needed is all part of the “Verification Station” or “Verification Client”. At this point in our process, the documents or images are already captured, so a direct connection to a scanner is irrelevant and not needed. The Verification Station typically will show a copy of the image/document on one side of the screen and the extracted data on the other. On the extracted data side, there will be fields with data populated in them. If the software has a low confidence level on the recognition, it will highlight either the whole word or part of the word in red. Some verification software will even allow the user to “train” it on where to find fields.

For a basic example of ABBYY’s verification station in use, check out this video: https://www.youtube.com/watch?v=IzEDHh3Zl30

For a more advanced view of ABBYY’s verification station and how the “training” is used, check out this video: https://www.youtube.com/watch?v=_0JAAlMt4nM

Travis Spangler

Travis writes articles dealing with various technical aspects of document capture and forms processing. He is fluent in Microsoft.NET and holds several certifications including ABBYY FlexiCapture and IRISXTract. As general manager and sales director, he controls the daily operations as well as manages customer accounts to ensure both customers and prospects are receiving the very best from UFC, Inc. Travis has many years of experience with document capture software and content management systems. He also has wide areas of expertise including custom functions in ABBYY FlexiCapture, email API's, Microsoft SQL Server Reporting Services, and many other applications and platforms. He has integrated Amazon Web Services EC2 instances with several applications including the company's CRM system.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.