Hello. Today I’d like to talk to you about segmentation within the ABBYY Vantage product. Segmentation gives us the ability to limit where we find content typically in an unstructured document scenario. So things like contracts, leases, letters, just documents that have very limited amount of structure to them, if not any structure to them.
So what we’ve done in a previous video is we talked about named entity recognition, the ability to extract information that typically follows things like names and addresses, currencies, dates, times, et cetera. And we’re gonna kind of build on that topic here, where we can limit where we find that information based on the segments that we find in the document.
So what I’ve done is I’ve created a skill. I’ve uploaded some documents here. And now what I’m going to do is create some fields. When I create a field, what I’m going to do here is I’m gonna create a field and I’m gonna call this the “Notice of Hearing Segment”. What I’m gonna do then is teach the software where to locate on the samples that we’re using, the “Notice of Hearing” information. Now, typically we would do this over a decent size sample set. I’m only using three samples. That typically would not be recommended. You’d probably want something to the effect of 20, to a little bit more documents, to be able to train on the textual piece of what the software’s looking to find here.
So what I’m gonna do is we’re gonna create a field that we call “Notice of Hearing Segment”. And then now that we know where we’re gonna locate the Notice of Hearing, then we’ll actually say, we wanna find the date for the hearing. And maybe just for fun, we’ll wanna find also the address for the hearing.
All right. So now that we have this, one of the first things I’m going to do is go to our “Activities”. Now I have a previous activity for our named entity recognition. Let’s go ahead and delete that for now. Now what we’ll do is we’ll just start over with our segmentation. So we’re gonna add a “Segmentation Activity”. When I do that, what I wanna do is I wanna teach the software that when I output this, I will have a segment here. So I’m just going to map the “Notice of Hearing Segment” and then I’m gonna click this activity editor. The software’s gonna ask me which samples I want to use. I’m just gonna go ahead and select all of them. And then I’m going to teach the software about the “Notice of Hearing Segment”. That’s all I’m gonna do in this case. So I’m going to just zoom out so we see it. And what I’m gonna do is just gonna say, Hey, this is where we can typically find the Notice of Hearing. And I’m just gonna train the sample set, where to find the Notice of Hearing details for this sample set. It’s pretty simple click and lasso here.
And so now that I’ve taught the software about the segment, the clause or the specific spots in these documents of where to find, Notice of Hearing, I’m gonna go ahead and train the activity. This is an important step. We wanna train the software where to locate the “Notice of Hearing Segment” in this case.
Okay. Our segmentation training has completed. And what we will do is we will now go back into our skill. So now that we know the segment, there’s a couple of additional things we may want to find within that segment. One of them may be the hearing date or the hearing address. So what I’m gonna go ahead and do is add our “Named Entity Recognition” step here. And the source in this case will be the segment. And so instead of using the text from the whole document, we’re going to use just the segment that we locate called “Notice of Hearing”. When I do that, I’m going to map the hearing date and hearing address. And I can actually do that by just creating this here. And we’re gonna say, go look for the date and go look for the address that is located in the “Notice of Hearing Segment”.
One thing I will tell you about here is we have the ability to accept multiple. So if we find multiple dates or multiple addresses, we may want to locate those. In today’s demo, I’m not going to do that because it’s not relevant to this specific use case where we’re looking for a hearing date and a hearing address, but in cases where we have multiples, this is a good spot to enable that as well. We’re gonna go ahead and hit save. So what I’ve done is I’ve trained the software, where to find the Notice of Hearing information in this segment. Now I’m gonna tell it to go find me specific dates and addresses within that segment. And now let’s for fun. Go ahead and run a test on this training.
Cool. So now that we have this training complete, there’s a cool portion of the software that I want to show you. So you’ll recall that we trained the software in this sample set, where to locate the segment. So you can see that here highlighted in that very light green. But the cool part is here, the software was able to find the hearing date and the hearing address just by us, literally asking it to say, Hey, only force yourself to look in this segment. And now that you know, the segment go find me a date and an address related to that. Here you can see that as well. So the software found the segment. It’s looking for the hearing date, which in this case is the 20th day of November. And we obviously have a hearing address as well. And our last one here as well. We’re looking for the date located in this section, along with the address.
So literally within just a few clicks by just me as an end user, teaching the software where to locate the segment and then therefore being able to find some named entities. I now have a very powerful and accurate model for extracting these named entities on these documents. At this step, what I could do is simply publish our document and start processing documents in real life against this new skill.
[Music- “‘Engineered to Perfection’ performed by Peter Nickalls, used under license from Shutterstock”.]