Watch our video in order to learn how to create your first Natural Language Processing (NLP) Project in ABBYY FlexiCapture.
Hello. Today we’re going to create our first NLP project together using ABBYY FlexiCapture. The first thing we’ll do is create a brand new document definition and we want to select the semi-structured or unstructured option. We’ll hit next. We will not load a FlexiLayout in this video, but we may potentially upload a sample image just so we have something that we can review together on the screen and in this case we can give the name of the document definition something appropriate to our business. Once we hit finish, the document definition editor will open and we’ll be able to modify some settings from there.
Now what we have on the screen here is a lease agreement and what we’re going to do is pull some information maybe out of the premises details. Perhaps we want this address right here, this address that’s located on the lease agreement and we want to pull a full term potentially on the lease agreement as well. So the one thing that I will do is I will right click create a field. And what we need to do is we need to map out what we call segment fields. The segment field is a parent. So for example, this address, which in this case is 950 Emerald Dreams Drive is a child of this parent, which I will call premises. So in this case we call that parent a segment. And so what we will do is we’ll give it a proper name. And the goal here was that we’re going to map the whole paragraph or phrase here for the section called premises.
And what we want to do is make sure we have these two options checked. Now we need some place that we can map just the address itself. So what in this case we’ll call it just the premise or premises. And this time this is not a text segment. So we want to make sure we keep that off. However we do want to make sure that we have this can have region option on. And then lastly I would like to pull out this whole term. So if we find this whole clause here, I want to extract it and actually this is very common where we would want to grab the whole clause and perhaps do a legal review on a specific clause. So in this case, since we only care about the parent, we will just grab the parent itself. I’m going to put the letters SEG for segment at the end of it just to mark that we actually do have a segment there. I will apply and hit okay. So now that we have our fields at least determined and properly set up, what we will do is we will right click and go to properties on the section. On the section we have an NLP tab and what we will do is we will create what’s called a segmentation model.
So we’re going to give it a name. We’ll keep the source as the section. The model type in this case should be segmentation and I will hit next. Now what this means is that as the NLP model is learning. We’re going to train it which segments to populate and since we know which segments because we named them appropriately, we will have the software provide those result fields as the premises and the term. The next thing we will do is we will set up an extraction model and once again remember I said since we’re extracting the premises and the premises comes from a segment which would be its parent, we just want to make sure we tell that software that that is the proper source here. This is extraction because we’re going to extract data, we don’t want just a parent clause and in this case we’re going to put the result field in the premises. Okay, so now I have an NLP model set up to extract three different fields, two segments and one actual field that we’re going to process. At this point we will save and publish our document definition.
Once we have our document definition published, we will go to a field extraction training batches under the view menu. We will right click and create a new batch for this demo. We’re going to go to the lease agreement and we have to select a variant. And in this case, I’m going to select the default variant. The next step is very, very important because we have to tell the software that we’re using NLP training for this training batch, and in this case we just want to trigger this NLP batch flag here on the batch itself. So I’m just going to click that option. Next thing we will do is load some sample images.
All right, now that we have some sample images, I’m just going to mark some of these documents to be unused. In other words, I don’t want to affect the training model for them. Now, understand in a true NLP production and situation, we would load a lot more samples, even potentially hundreds of samples per document type because the NLP model does need a good variety to learn from itself. So truly training the software on five samples is not sufficient enough for a real world environment. But I’m doing that today to show you the initial steps to create an NLP project. So the next step we’ll do is we’ll just go ahead and force a match on these document definitions. And what this will give us the ability to do then is to teach the model. And by lassoing the proper data.
Once all the documents are matched, we can now go lasso the proper data and what we’ll do is we’ll double click our first document and we will simply teach the software through manual lassoing how this is done. So I’m going to click here the premises segment. Remember I said for segments we want to provide the parent. So I’m going to actually lasso the full data here. For the premise I’m going to lasso just the specific data we want for that. So we got the extraction value and we got the segment. Segment being the parent. And then lastly for the term I will select the whole field because in this case I want the whole clause, which is the segment and I don’t need to particularly extract any specific detail out of that segment. So I will do this now for the remaining five.
Okay. And now that we’ve lassoed all the proper detail to train the NLP model, we will simply right click in the white area and train. This training is also accessible through the menu as well. Now to give you a little caveat, the training can take quite a while when we do NLP modeling, so just be aware that sometimes training, this detailed can take quite a bit depending on the number of segments or extraction details you have, so be prepared that you may wait several minutes to even hours to train a full NLP model.
Once the NLP model has been trained, we will give the software the chance to match our unused documents to see and test the results. Once the documents are matched, we can double click each document and see the results. It’s pretty neat because even though I just mapped the fields for these five top documents here, you’ll see here that I’m getting the whole premises parent the actual value that we want out of the premise. By the way, even if the text spans either multiple lines or multiple pages. And then of course, lastly, we’re grabbing any term that’s referenced, and you can see that here as I continue going down through the results. So a lot of cool different ways that we can use NLP in the field, of course with a lot of varying results and varying formats. There are some peculiar ways that we need to set it up and make sure that you follow the steps in this video to do so. But I think you’ll enjoy the results for natural language processing. I hope you enjoyed this video. If you have any questions, feel free to reach out to us. Thank you.