In this video you will learn how to set up field extraction training for multiple variants in ABBYY FlexiCapture. This video assumes you are familiar with setting up a field extraction project with a single variable. If you not, or you would like a refresher course on the topic, please watch our Part 1 video on the topic here.
Hello, today I’m going to show you how to set up field extraction training with multiple variants. Now this video assumes that you’re already comfortable setting up a project, a field extraction project with a single variant. So if you’re not please make sure you go watch our video on that, otherwise let’s go ahead and get started.
So the first thing we want to do is jump into our project settings, into our document definition and we want to edit this. If you recall in the first video we did set up a transcript section, so we’re simply going to modify this. We’re going to add some new variants here. In this case we’re dealing with transcripts so I have Doe High School, and then I have, we’ll just call it a home school, it’s a home school transcript. So we’ll just call it our home school transcript.
So we’ll add the number of different variants that we have here, we’ll close, we’ll hit okay. We will save and we will publish this document definition. So that’s set up the different styles of variants that we have. Now the next thing we’re going to do, which we did not do yet is go into our classification training. Classification is a fancy way of saying, “Hey software this is how you learn what kind of document or sometimes we call it a document type, that the software’s processing”. So we’re going to create a new batch, and we’re going to load some images. In this case we’re going to have the images from our second set, this would be our Doe High School. So we’re going to process those, okay, so now that the documents are ready to be processed, what we’re going to do is right click and set a reference class. We’re going to hit the add button and we’re going to give the class a name. In this case we’ll call it our Doe High School class, and we will hit add. We will specify the variant, and we’ll tell the software that this is the Doe High School variant. We’ll hit okay, and you can see here that those now have a reference class there.
The next thing I will do is load the images for our home school variant. I’m just going to go ahead and load those here. I will set the reference class for this one, this will be our home school. We’ll add, we’ll specify the variant and we’ll say that this is the home school variant. Okay. Now that we’ve specified the reference class, in other words that’s the simple word or complicated word for saying our document type. Now that we’ve set the document type for those given ones, we’re going to train the software. It’s very similar to what we’ve done before but instead of field training we’re going to go to classification training and we’re going to hit the train.
Now the software will be able to tell, which document belongs to which variant. Now we need to tell the software based on the variant how do you find the fields? That’s done once again through our field extraction batches, which we’ve seen before, except this time we’re going to create a new batch. We’re going to do our Doe High School one this time. I’m going to load some images here, we’re going to load the ones relevant to this batch. Then what we can start doing is telling the software where to find the given fields for these variants. So there’s the name, the cumulative GPA is there, we’re just going to do that for every document. This is what we’ve done in the previous video for the default one, but now we’re doing it for this given document class, or reference class is sometimes what we call that. I’m just pointing and clicking and telling the software where this is found.
I’m going to go once again to our project menu, hit the fields training and then we will train the field locations for Doe High School. Just like we did for that we will need to do the same thing for the additional variant, which in this case is our home school variant. I’m going to load some images here as well, that’s my last batch here of documents. I’ll go ahead and go into the batch, and then once these are processed we will set the given field locations. Remember it’s just a point and click, or sometimes a drag if its multiple words. I’m just telling them here’s where these fields are for these document types.
The next thing I will do is train for the home school one and we’ll save the changes. Now just like what we did before, we’re going to go ahead and create a new working batch and when we run these documents we will see that the software will do two things. It will remember the type of document, whether it’s the default variant, the Doe High School or the home school and then based on that variant it will learn or remember where the fields were found on those given variants. So let’s go ahead and load all of our samples, and we’ll let the software chew on these and we’ll dive into the batch and see what the results are.
Now that the software’s done processing the batch we’ll go one by one and we’ll just see that the software does remember the location of these fields. So, we call those regions once again, and the software’s remembering the location of these regions. So, as I click around the software’s capturing these no matter how the document looks. So I can just keep going down here, I’ll click through these just so you believe me. You can see no matter the style of document the software does remember where these fields are. There you go, so that’s an example of a working batch that has been trained to do two things, detect the document type, the reference class and to find the location of those fields.
So to summarize we’ve done a few different things, we went into the document definition editor and we told the software that we’re going to process multiple variants. We created a classification batch, so that we can tell the software how to learn the document type. How to learn the differences between the document types. Then after it learned the differences we can then tell the software based on those differences, which fields to capture or where they’re found more specifically for the given document type.
So I hope you enjoyed this video on how to process multiple variants in the field extraction features of the software. If you have any questions please reach out to us, we’d love to help you out in any way we could. Hope you enjoyed this video, thank you so much.