Learn how to use Master Data to classify documents in ABBYY FlexiCapture for Invoices.
Hello. Today I’m going to show you how we use master data to classify documents. When I talk about master data, I’m talking about data that we use within our organizations on a daily basis. This could be a list of companies or a list of accounts or a list of vendors. And we use this data to make us more intelligent and so that we can tell the software how to classify or determine a document based on that relative information that we already have accessible to us.
Alright, the first thing we’re going to do is set up some project settings. That includes image pre-processing by each file. Now this may be different in your environment, so just a note of that. I’m also gonna set up an environment variable so that we can use a database connection string multiple places throughout our project.
Now let’s go to project. We’re gonna create a document definition. And when we do this we’re gonna make sure we select the unstructured option here. We’ll hit next. We’ll go ahead and load a sample file. In this case we’re processing bills. We’ll hit next and then of course give an intelligent name to our document definition and then we will proceed.
It’s common that we would rename the document section because we will reference this document section multiple places in our project. And then once we do this it’s important to note that we should go ahead and allow field extraction training on the section. And then getting over to the data sets tab, we will set up multiple different variants.
Now we will pull these variants from a list that is from our database. This is why we set up that connection string. And it’s another important step that you activate the used database of companies option down below. At that point the software creates a bunch of fields that in this case we’re not going to use so we’re just simply gonna mark them as not used. And then we will proceed with customizing our own master data classification.
We need to map each field to our database column, so we will do that. We’re just simply gonna tell the software where to find these. And then we’re gonna add our own. And this one’s going to be account type. So in our case we’re gonna classify by account number, which is an actual number on the document but that will pull us the account name and then account type. And account type would be like a utility type of phone, some sort of bill statement or something like that.
At this point we’ll update and of course we can view the data. And by viewing the data you’ll see here that I have three different variants, which will tell us to use the master data to locate the account number which then of course tells us the relevant information for each record.
At this point we will create a service field. Like usual we will give it a name. In this case account ID. And then we will set the data source to be a flexible section variant ID, meaning that we will pull that data from a variant. Now assigned to each account is an account name and an account type. So we’re gonna create some fields to accommodate those extra pieces of data.
And now let’s create a rule, so that as the software detects the account ID it will return back the account name and account type for us. We will tell the software to look at the variant data set for its data to do this database check. And then it’s very very important after we select this that we set the record ID down at the bottom to be the account ID, because that is what will be the unique identifier for each account.
At this point we’ll set up some additional fields so that we can map which database fields map to which document section field that we’ve set up here. At this point we’ll just a run a test to make sure the software is grabbing information back for us here within the document definition editor. And it is. And then we’ll go ahead and save and publish our changes.
Now that it’s saved, let’s actually create a test batch. We’ll use this test batch to process some live documents that are actually bills and what we’re going to see is the software use our master data, the data from our database, to figure out who the account number is but then after it determines the account number, it will also return back the additional information mapped to each account number.
So let’s just take a peak at it. There’s in this first example, we have our account number. We have our account name and our account type. On the second one you can see it detected the account number which is different of course for this document. And then it returns back the additional data. And then lastly here on this one as well, using that account number once again as the key field to classify it.
So the key to this part of the software is remembering that we’re using the master data, the data within our organization, to make us intelligent. To take this a step further, the ultra cool part of this is that once our software detects the variants, it then gives us the ability to create these field extraction batches so that the users who are using the system and verifying using the system can train the software on the fly.
Now we’re actually not gonna go through this process together because I have a separate video on it. But I do want you to realize that using the classification by master data along with field extraction batches opens up a whole new world of using machine learning and allowing end users to tell the software and train the software abut location of fields. So that templates and IT resources are not required. It’s a big step here with OCR software.
I hope you enjoyed this video and if you have any questions please reach out to us anytime. And we would love to help you out. Thank you so much.