Hello. Today. I wanna set up our first Named Entity Recognition Skill. A named entity recognition skill gives us the ability to extract things like names, addresses, dollar amounts, durations, parties, locations in a document that is typically an unstructured document.
So let’s go ahead and do our first one. What we’re gonna do is we’re gonna create a skill. We are in the Advanced Designer by the way, for ABBYY Vantage. So just a note of that. We’re gonna create our first document skill.
Alright, now that we are within our first NER skill, we’re gonna go ahead and upload just a sample of documents. This is sometimes what we refer to as our sample set. The set that we will test against here. So I’m just gonna upload a few arraignment hearing documents. So documents that are just very unstructured. They come from courts throughout the world, and we’re looking for some named entities on those documents.
Alright, now that we have our sample set updated. What we’re gonna do is go ahead and outline a couple of fields. Now named entities do come in many varieties here, but for today, let’s go ahead and just extract maybe the names that we find on a document, as well as some addresses that we find on a document.
So I’ll go ahead and add two different fields. For each of these fields, we wanna make sure we hit the gear and we go into the “Advanced” tab and we make sure we allow multiple items because in a given unstructured document, it would be common for us to find multiple names and in this case, even multiple addresses. So we’re just gonna go ahead and allow multiple items here.
The next thing I will do is go to the activities when I’m in the activities flow, I’m gonna go ahead and modify this and add a Named Entities Recognition Activity. When I do that, there’s a couple of important things that we must provide. The first is the source. Where are we providing the text of the document that we’re going to look for name entities and in today’s demo, this is the whole document text. In practice, it would be very common for us to use things like segmentation. You’ll see this option here to help narrow down where we’re looking for a list of these entities.
But in today’s situation here, we’re going to go ahead and select that we have two different outputs: names and addresses. We want to click this create mapping button. And in this case, we’re going to find the people and we’re gonna put those in the names field that we created. And we’re gonna find the address entities and put them in the addresses field that we created. If you see here and we manage our field, we just wanna make sure that we have that option as repeatable enabled here on those given fields. So I’m gonna go ahead and hit save.
Now that we’ve done the named entity mapping, the magic of ABBYY Vantage takes place. And so if we run our test skill, we will go look at the results and see what the software extracted for both the names and the addresses on this document.
Alright. So now that we have some results completed here, let’s go ahead and take a peek at these results. Now here’s the really cool part about the solution. There are obviously reasons why we would want to narrow down a list of names and a list of addresses, but out of the box on a completely unstructured document, that could be one page. Could be 100 pages. It could be a thousand pages. The software can locate the names and addresses, and obviously other named entity items that we tell it to on the set of unstructured documents. So on this document here, you can see the software has located these specific names on this document. Located a couple of addresses on this document as well. If we look at our next one, you’ll see here, this document has one formal name of a person and one address listed. And if we look at our last sample here, you’ll see, we have a couple of different names located on the document and a couple of different addresses on this document.
Now this demo, we’re not gonna go a step further because we’re just focusing on the type of entities. But like I mentioned, sometimes it’s common to narrow those down to a specific spot of a given document. So in this case, we may wanna go locate the address that we’re supposed to report to on this hearing. And maybe we would want to segment so that we know that we’re only looking for those addresses in the notice of hearing location of the document. Now, obviously that location will differ based on the entity that’s providing the document, but we want to teach the software through sample set, how we’re going to recognize that location. So that’s where we would go back to our activity and we would actually modify that and teach the software about segmentation, which is frankly, just a few clicks on some samples and letting the software train itself. But we’ll focus on that in another video.
But for today’s case, I wanted to show you how simple it was to add the entities here. I will highlight that there are other entity types that we didn’t focus on, such as dates, durations, money, et cetera, but this is how simple it is. You add the fields and you map them. And then the extraction is really where the platform and its intelligence takes place.
From here we would simply just publish this skill and just like any other skill in our Vantage tool set. We now have the ability to extract name entities on these unstructured documents.
[Music- “‘Engineered to Perfection’ performed by Peter Nickalls, used under license from Shutterstock”.]