ABBYY Recognition Server - Processing Unstructured Docs

See how your enterprise may process both structured and unstructured documents quickly and cost effectively using award winning ABBYY Recognition Server.

Good afternoon, this is Jim Hill from UFC. This afternoon I'm going to show you ABBYY Recognition Server processing some unstructured documents. These types of documents will include things like letters, contracts, and we will also process some structured documents including invoices. Let me introduce ABBYY Recognition Server, talk about some of the features, the advantages and disadvantages of using recognition server to process unstructured documents.

This is one alternative. There are many different alternatives for processing unstructured documents. Let's look at the demo. What's I'm going to show you is the ABBYY Recognition Server. I'm just going to show you the quick ... This is what the server looks like. This is where you set up your workflows. Let me just quick show you the workflow. From left to right, the name of the workflow, where the documents are coming from, in this case a shared folder, what you want done to the documents, what the language is, the level of OCR you want for example, would you like to optimize for quality or speed, here I have it set in the middle, how are you going to separate the documents many times. It's simply one document per file. It you're processing a PDF file, that can be how it can be done. If you're scanning documents, most frequently we'd use a barcode separator sheet, which we would provide to you when you want to scan documents.

Quality control talks about if you want to verify the data coming off, which is verifying that the OCR engine produced the correct data. Indexing, now here's where you set up the different document types. I've got three document types set up, different index values for each. There is a contract. We have an agreement number and we have a data for the contract. It takes a couple minutes to set those up. What do you want to do with the output file, that's the last tab. We have the output data going to Microsoft Word. There's a variety of options. We can set it to XML files, which I also have selected here. We can save the original images. We will show you that as we run some sample batches.

What I'm going to do right now is I'm going to show you some samples. Here's the folder that is being watched. As I move in some sample documents, I have these sample documents here in this other folder. Let me just show you what one of those documents looks like. Here's a contract. We're going to extract the agreement number off of this contract. And some other forms, an invoice, there is a letter. These are all going to be processed together in one big blob as we would expect in an unstructured document processing system. I'm just going to select these documents, copy them over to the watched folder. You notice they disappeared quickly. ABBYY has already processed those documents.

Now what's happened is we go to our verify, or actually our document indexing station. Since I did not select, tell the system to do any verify functions. What I'm doing here is just entering in index information. This index information will be put into an XML file. It will also be carried in variables that can be used for programming purposes. I'm also putting that index information into the file name. What I have set here, I have these documents set to go to Microsoft SharePoint so that you can have access to these documents anywhere that you have access to your SharePoint server. In this particular case, it's Office 365 SharePoint, but any SharePoint system will work.

I can clearly see that this is an invoice document. It's just asking me now what is the invoice, what's the vendor. I'm just going to say it's JL Smith, obviously that's the vendor, and the date, so I'm going to find the invoice date. A caution on the date, it says month, month, day, day, year, year. You need to be careful that this was set correctly. In this case what I can do is just click and keyed the date, and then just move it around a little bit, accept the document. That's pretty quick.

Now it moves on and it's going to recognize another document's pending, a sales agreement. I know that's the sales agreement. I'm just going to go ahead now and select that field for the agreement number. There's the agreement number. Okay, contract seller signature date. I'm going to jump to the last page, obviously that's where the signature is going to be. Now again, I grabbed a little extra data and that format needs to be in month, month, day, day, year, so there's a little manual intervention required here. What will happen is we can program this to strip out characters once they're processed. I'm just going to go ahead and put it into that correct format, accept the document. Now the system is not learning how to process these documents. Each and every time I process these documents this is how I'm going to have to do it. You can see that this is a letter, and it's looking for the sender. Well I'm going to have to go to the last page to find out who sent it. There's sender. Who's the addressee, we can see there's the addressee and there's the letter date again with our format correct 12/09/2006, accept the document. Then that is it for the batch.

Now what we will do is go to our SharePoint folder where we expect the document results to come up, and there's the document results. We see a contract, a letter, and an invoice. Let's just pull up one of those documents. Here's the contract document showing the exported data coming out of the contract. You see what it did, it just grabbed all the text of the document. If I need to manipulate this data into another system, I can copy paste the system. Now of course we could program the system to put this data somewhere. It will require a little bit of custom programming in the system. Scripting is built into the system. Also, the index data that we entered here is written into the file name. ABC123 was the sales agreement number, and the data of execution was 3/15/2006 as you see that. Similarly for the invoice, you're going to get the invoice data. It does a very good job of extracting the detail of the invoice, and same thing for the letter.

Well that concludes the ABBYY recognition server demo. I just want to say that there are many different alternatives that we can offer. This is one cost effective way to process unstructured data. We will also show ABBYY FlexiCapture, which is a much more powerful system, but does require more complex setup and is much more expensive. Please get back with me if you have any questions, but we wanted to show this so that you would be aware of the unstructured processing capabilities in ABBYY Recognition Server. Thank you very much.

Save

Information about the Author
Jim Hill
About Me
Articles by Jim Hill: Jim works to align the customer's needs with software and consulting solutions in the areas of forms processing, document capture software, and content migration. His background is in the following: 1. Enterprise content management systems including FileNet and SharePoint, including migration of documents to and from FileNet. 2. Document capture systems including Quillix Capture, ABBYY Flexicapture and IRISXtract. 3. OCR systems including ABBYY FineReader, ABBYY Recognition Server, ABBYY FineReader Engine, 4. Forms processing systems including ABBYY Flexicapture and IRISXtract by Canon. Jim began his career as a mechanical engineer at Ford Motor Company. He joined UFC, Inc. in 1998.
Some of My Other Articles

Find out How We Can Help You on Your ABBYY Software Project

man in suit pointing figure forward towards the word information on a screen

Recent Blog Posts

Related Articles:

ZUGFeRD: Emerging Standard for Automated Invoicing

Read Full Article »

Which ABBYY Product Should I Choose? OCR Software Comparison

Read Full Article »

Video: Windows Shortcut Tool for FlexiCapture

Read Full Article »

User Friendly Consulting Receives ABBYY Marketer Award

Read Full Article »