Learn how to create a classification training batch in ABBYY FlexiCapture 12.
Hello. Today I’m going to show you how we create a classification training batch within ABBYY FlexiCapture and the concept of creating this classifier is the ability to teach the software how to determine document types automatically so that a human doesn’t have to be involved in telling the software the document type that is processing. So what I have are three separate types of documents. You can see them listed in my document definitions menu here and I’m just going to double click and show you that each document definition is blank. There are no samples, there’s not even fields here listed on the side. And what we’re going to do is tell the software using samples, how to determine the document type. So what I’ve done is I’ve created a classifier training batch and in just one second I’m going to show you how we load images. But I want you to understand that when we do classification, sometimes it’s helpful that we have documents listed in a folder and sub-folders with the document type listed separately. And the reason why we do that is just so that it’s easy for us to process them and keep the truth documents in the proper spot where they’re supposed to be.
Now on the classifier batch, before we get started, I want you to understand also that there are settings called “Classification Profile”. Also we have what’s called a recall or precision priority and it’s important for you to understand and research these types of settings and the effects thereof because they may impact how the software is looking at your documents and how it’s training and reviewing the document types and using that classification technology. But for today’s demo, I’m going to keep these defaulted so I’m just going to open my classifier batch and you can see here I’ve already put some documents in my batch and once we’ve loaded them, sometimes it’s nice to use our sub-folders to name the class. I’m actually going to click that and you can see it’s very quick. The software comes through and determines the class for us. Just for today’s demo, I’m going to properly set the section. I’m going to click on the document and set the section here on the right. Once I’ve done that, we can modify the state, but we’re going to leave these all “For Training”. In other words, the software is going to literally train itself using these documents. The other setting that you would get is “For Testing” and that can be changed on a per document basis as well. But for today’s date I’m going to use “For Training”. Okay, so I’m going to go ahead and select these and I’m going to tell the software to train.
Once the software is done performing its training, I’m going to simply hit a classify button up here on the menu. And what the software is going to do is use the logic that it trained itself on to determine the result class. The reference class is sometimes what we call a truth class. So that’s the actual answer. And then the result class is what the software is using and telling us is what it’s thinking that document type is. And that classify button, it gives us the ability to run what the software believes. The document type is here and use its own training.
Now that it’s done, you can see our result class matches everything and we’re good to go. There is a benchmark tool that you may want to use and determine how the software is reading a group of documents. But let me just share with you. When we use classification training, it’s very important that we potentially run classification training over hundreds of documents per document type. So even though I only have 11 that’s actually a very small amount of samples. In a real world scenario, we would use hundreds per document type to train the software because a given variety is good and the software needs to understand the different formats potentially that a document type may have. Now before we run some actual documents and determine how well we did as far as training, and I will tell you that it’s important that at this point, you go to “Project” and “Project Properties” and you make sure that we have a classification batch selected here as a classifier. You may need to do this on the batch type as well if you’re using batch types within your project. Okay, so I will go ahead and select that and I’ve had documents loaded here in a working batch for me and I’m just going to simply right click them and recognize what the software is going to do at this point is use that logic that we set up that classifier to determine what kind of document type these are.
So now that the software’s perform recognition, we can see it determined a document type per document and of course we do have these separated, so you can see there’s our banking applications and our questionnaires and our tax documents down here at the bottom. So now we continue processing documents around that classifier so that the software can determine automatically what those document types are. Once again, we would use potentially hundreds of documents samples to train the software and then from this point on we can tell the software and teach it using field extraction batches and other videos that we’ve produced, how to automatically using machine learning tell the software where to extract the data per document type, or even what we call a variant of the document type. So I’d hope you’ve enjoyed this video. This is a really quick preview of how we set up a classification training project, and if you have any questions, feel free to reach out to us. Thank you so much!