Learn how to make non-searchable documents searchable with ABBYY FineReader Server.
Hello. Today I’d like to show you our ABBYY FineReader Server product. Now this product is a fantastic solution if you’re looking to make non-searchable documents into searchable documents and perhaps maybe even you have a variety of different formats that you’re dealing with as well. Now as you can see in front of us, the FineReader Server product comes with an administration console, which is where we administer how documents are coming in, what happens during process, and also what happens as they’re exported. Everything that happens here is done via a workflow and in essence this workflow is really where the nitty gritty of the solution takes place. So I’m going to show you an example workflow and there are seven tabs. There’s a general tab, which is where of course we put a name to the workflow. Also we determine the input source.
So is it a folder? Are we grabbing information from emails? Or is there something like a SharePoint system where we need to grab them from a document repository. But in today’s demo, I’m just going to quickly show you how we use a traditional hot folder to grab documents from. The input tab tells us where to grab those documents and in this case I’m using a shared folder, but we also can use FTP in this sense and although there are a lot of other settings here, I won’t necessarily go into the specifics with, I want to give you confidence that this software has been used to process billions of pages, so there are a lot of different settings you can use to kind of tweak it and tune it to work in your environment here. The next step is processing and this is where we determine the languages and also the OCR quality that we expect to receive out of the application.
The next step is how do we separate a document? Do we keep individual files? Do we want to use bar codes? Do we want to be able to use blank pages as a separator? There are a lot of different ways that organizations separate documents, but you can see here we have the ability to handle a variety of those different methods. Quality control tells us when and if we want a human involved in the process as part of a traditional OCR process. It’s important that we understand there’s a confidence level that the software provides and if for some reason there is low confidence for a given character or given document, we may want a human or a staff member to review the OCR results and therefore, any low confidence information is corrected before it’s exported into its final format. And so this tab on the quality control gives us the ability to set some of those settings here.
The indexing tab gives us the ability to set up different document types and therefore different fields for each document type. It also allows for a human to interact with the software and provide the ability to manually index a document. And then lastly we have an output tab. This output tab tells us what kind of format we want to standardize as the output. And you can see here I’m exporting a document as both a PDF and I’m exporting the text to a text file as well. We support a lot of different output formats. You control all of them and actually each of them have their own settings and destination locations that you can customize. So I won’t go into each one of those individually. But understand, once again being used over billions of pages, there’s a lot of ways that you can customize and make sure you’re getting the proper tuning and output format that you’re expecting.
So as you can see from a workflow perspective, it’s a very, very easy process for us. We set up these tabs, we’ve set up our settings, we hit okay and we move on. And actually what we’re going to do is show you this specific workflow in action. So I’m going to provide some documents to an input folder. Remember the input folder is where we’re told the software to grab the documents that we want OCR performed on. So I have some documents that are on my clipboard. I’m just going to drop them into our input folder and the software is going to monitor. As you see it just did that. It grabbed those documents and now it’s performing OCR on them. And now if we go look at an output folder, you will see that we will have both our PDF versions of these as well as a text version.
So the original documents that I sent in for OCR were JPEG formats, but now we are sending them out as PDFs and I also have access to all of the text on the documents. So now I can refer and use a control F to find different information on these documents. And just for fun, I did have the software provide the text file of each document so that you can rest assured that we’re extracting all the information on each file as well and grabbing that text. So ABBYY FineReader Server is a fantastic solution in this bulk OCR scenario where we want to make a non-searchable document or documents searchable and therefore be able to use them in downstream systems. Hope you enjoyed this video. If you have any questions, feel free to reach out to us. Thank you so much!