Discover in this video how to perform Address Parsing in ABBYY Vantage.
Hello. Today I’d like to show you how we perform Address Parsing within ABBYY Vantage. So what I have here is obviously a very simplified version of a document that it has an address. And on this address we wanna extract the name, the street, the city, state, and zip.
What we’re gonna do is we’re gonna go ahead and create a brand new document skill in our Advanced Designer within Vantage. We’ll call this our Address Parsing Skill. And when we do that, the first thing we’re gonna do is we’re gonna go ahead and upload a document, just kind of as our reference document.
And the next thing we’ll do is we’ll go ahead and walk through the setup. We’ll go ahead and map some fields. When we map our fields here, we’re gonna have a few things. So we know that we’re gonna have a field that describes the address. So let’s just go ahead and start mapping some things here. We’re gonna call this the full address because it’s the whole thing. And then we know that we’re gonna want fields for each of these independent parts of the address. So let’s just go ahead and add some fields here for the name, the street, the city, state, and zip.
All right. Now that we have our fields set up, the one thing I like to do at this point is I like to make sure that we have our reference details completed as well. This gives the software the ability to compare what it extracts versus what we told it is really the truth. We call that our reference spot. And so what I’m gonna do is I’m gonna go ahead and find each of these fields just so the software when we do it automatically has something to compare itself against for the truth. So we know for the name, the name is gonna be located here. We know for the street. For the full street, we’re here. For the city, we are located here. For the state, we’re here. And of course for our zip, we are here. So we have the full address. But then of course we told the software what the truth is for the other fields that we wanted to automatically extract.
Now that we have that set up, let’s talk about the activities that we’re going to need. So we have within our software, the ability to extract based on rules. Then we will perform Address Parsing. Now the last part of the video will be to actually get the name on the address, which the Address Parsing Module does not give us, so we will use our Named Entity Recognition to get the name of the address.
So let’s go one by one here. Let’s go ahead and tell the software how to do the extraction. We want the full address. We want the software to know where to find the full address. So we’re gonna go ahead and just make sure we only map the full address here. And when we do that, we’re gonna go ahead and go to our search elements and we’re going to tell the software that we’re gonna draw this on the image.
So we’re telling the software, “Hey software, this is where we want you to pull the full address from.” Just because I know what’s gonna have to happen here. This is gonna be a full paragraph. So not just a single line of text, but we want the software to grab that whole region here. Now we have what’s called a paragraph of text, which is on that specific region on the document.
So at this point we’re gonna go ahead and test the activity. This should be a pretty obvious one for the software here. We’re just telling it where to pull that paragraph of text. We have a hundred percent results and that’s because the software now compares against the truth. If you remember back in one of the previous steps just a minute ago, we told the software where to find it. Now it’s telling us where it found it automatically and it’s gonna go ahead and tell us if there’s any difference, which in this case there is not.
So now we told the software where to find the full address. Now we want to teach it how to parse the address. So we’ll go ahead and add our Address Parsing option here. And the software’s gonna say, “Okay. Hey, where do I find the full address?” And we’re gonna say, “Hey, you find it from this field.” And of course we’re going to tell it the things that we know it will find. Now just out of experience, we won’t be able to find the name as part of the Address Parsing. We’re gonna come back and do that. So we’re gonna find street, city, state, and zip. And we’ll go ahead and set up our mapping here.
And what we will do here is we will go ahead and test the skill. So the software automatically comes with intelligence that can take that full address field and give us the result here. And now we have our results. We can see where the software was able to extract that data for us. And as you can see it found street, city, state, and zip. Of course it did not find names. So let’s go ahead and now that we know the software can find and parse the address, let’s go back and say let’s teach the software, how we can extract that name.
What we’re gonna do here is go ahead and use our Named Entity Recognition. This gives us the ability to reference the full address here, but in this case what we’re going to do is just grab the name from the documents. So we’ll call this the organization and the software is gonna use the full address and give us the organization here and put that into our name column. Let’s go ahead and test the skill now.
And now at this point we have a hundred percent accuracy. So the software’s using what we told it as the truth and now it’s going to extract the information on that name for us.
So at this point we have taught the software where to locate the full address. And then using our Address Parsing Activity, we were able to get the street, the city, state, zip, we could even get things like country. And then to actually get the entity on the address, we used our Named Entity Recognition. So we have just full complete control of how this Address Parsing takes place. Now what we could do is we could obviously deploy this skill like we do in other situations with the Advanced Designer and use this technique to get the address details and the entity on that address.
Hope you enjoyed this video. If you have any questions, please reach out to us.
[Music- “Engineered to Perfection” performed by Peter Nickalls, used under license from Shutterstock.
Adobe, Acrobat, and the Adobe PDF logo are either registered trademarks or trademarks of Adobe in the United States and/or other countries.]