Watch or video in order to learn a more advanced method for creating tables in the ABBYY FlexiCapture FlexiLayout Studio by utilizing repeating groups.
Hello. Today I’m going to explain to you how we create tables but more advanced. And sometimes we get in situations where there’s repeating information or tabular information on a document, but we can’t use a table element as referenced in the first basic video of table extraction. So we have to use a strategy using repeating groups and repeating groups gives us the ability to grab that repeating information but with maybe a little bit more intelligence or a little bit more complexity. So as you can see on some of my samples here, I have a situation where we have documents that have information and tables in this first page is pretty easy, but then we start getting in situations like what you see here, where we have some information, at the top, and then then there’s some white space. And then there’s another section and some additional repeating information towards the bottom.
And although in some situations we can use tables for this, it’s, it’s probably more appropriate that we use some sort of repeating group elements so that we can tell the software how to find a row and repeat itself as that row continues. So let’s walk through this. I’m just going to go back to our first page here and I did set up prior to this video, the ability for us to ignore a header on these documents. Every single page of mine has a header and just for creating a very simple video, a very pointed video. I’ll go ahead and ignore how we did that, but really all we’re doing is we have our group here where we’re ignoring the top header so that doesn’t add any complexity to our video here of using repeating groups. So the very first thing I’m going to do is figure out how we can map a table and when it comes to a table we want to anchor the table by its columns initially.
So we know that name is a very good anchor and as I go across this I can see that name isn’t always listed, but if I can find the name column once, then I can tell where this. In this case, this price list, I can see where that column is referenced on each of them of these subsiding samples here. You can see here is another document with a name, and then of course it goes down. And then the last document here has also a name column. And then they’re also structured very similar. In this case what I’m going to do is find us the name column. So I’m just gonna create a search element and it’s going to be static text. And I’m going to say go find me the name column. So I can even call this our static name and I’m going to tell the software to find the name column. Once again, when you create a FlexiLayout, your documents will be different than mine and some of your concepts may be a little different than mine, but the way we attack them as probably very similar. So use your own names and your own types of elements. But in my case, I want to anchor off the names, so I’m going to tell the software here to always find it on the first page.
And then I’m also going to tell it, always just go find the point at the top. So if we do see the word “name” multiple times in a document, which should probably be fairly common. And as this type of document we’ll tell it just always give me the one that’s nearest to the top. In other words, go find the one that’s at the very top of the document. Here are the very top name column. So there we’re going to get the name column. Now it’s very important that we find a field like price to make that our anchor per row. Once again, we use the name field to be our anchor per table. But now we want to be a price field to be our anchor per row. Cause that’s very consistent. Now just because I know how the software works, I’m going to create what we call a separator cause I’m going to have the software repeatedly find me this row, but I need to somehow find the separator here and that separator will tell me where the price begins. So I’m just going to create an element and it’s going to be a separator. A separator is simply a line and I’m just going to give it a name and it’s going to be a vertical separator.
And then we have our relations here. I’m just going to tell it to use the name field and go to the right so that we find the separator to the right. So go find the name field and we’re going to say to the right and I’ll probably give it some sort of offset. And once again your documents may be a little different, but I’m just going to let it kind of push that over to the right there a little bit. So we have some room. And lastly what I’ll do here is say go find me the one nearest to the name. Now this is making the assumption that I have a good solid barrier here or a line in between the name and the price column. So I’m just going to match this document so you can kind of see what we did here.
And that’s the column. So we found name and we find a separator. So now I can tell the software, now that we know where name is and we know how to section off the page because we have the separator, now we can get into some intelligence. So what I’m going to do is create a repeating group and now we’re going to start working in building this table. When we create the group, we’re just going to call it our table field. And we’re going to create some intelligence here. So we’re going to tell the software to ignore. In my case the header, and this is just once again special to my documents here.
So we’re just going to tell the software to ignore all instances of this header so we don’t get any confusion. And I’m going to tell the software that it’s going to find these below that name field. So we’re just going to say below. And so the software is going to say, okay now I have repeating information that’s going to be below the name field. Pretty simple here. We have our table. Now let’s focus on price. Cause once again, prices are anchor. If I can find a price, I can find everything else related to this row. So what I’m going to do is create an element and we’re going to just in this case consider it a character string. And this is going to be our anchoring field. So I’m just going to call it cs price. Cs stands for character string. And I’m going to add my own alphabet here because I know we have some common characters that we find. And let’s just add common things that we see in documents or in prices here.
And then lastly, I’m gonna create a new relation. I’m going to say, okay, now that you can find these characters, go find it to the right of the name of this separator here.
I’m just going to go ahead and apply this and just so you can see it take place, I’m going to go ahead and walk through this. So here’s our name column. You can see it highlighted, here’s our separator. And then lastly you can see our price and we’re able to capture all of these currencies. And then if I of course match another page, it’ll be very similar because I have a name and I have a separator and I have the prices as well. So now we have price and that’s going to be our anchor. That’s our row anchor. So what we would commonly like to do as a best practice is we create a group and we’re gonna call this group our row. And this will be how we define the full row. So in this case, our row, we actually have properties on a given row and we’re just going to tell the software, don’t find the row if you don’t find the price. In other words, you don’t even attempt to grab the row if we don’t have a price. And maybe something else we’ll do is we’ll say go find this information if it’s below the price name.
We’re just going to give it some room here to create its own square so that we can tell the software how to anchor in and kind of rectangle in this given row. We’re going to say go get the price and let’s say it’s below the top of the price. Maybe even give it some offset here, a negative offset. So that just gives us a little bit taller of a rectangle and then we’ll say it’s above the bottom in a similar fashion of price.
Now we have a group and that group defines that whole row. What we’ll do here, actually just just go ahead and run one. Now if I go into my table in my repeating group and I want to find the row there, you can see in gray how the software is now mapped out the row. So it found the price and now that we’ve found price we’ve structured in to actually find the whole row itself and now we can be more intelligent within the group without adding a lot more complexity. We’re just going to go ahead and add what we call a character string and we’re going to call this the name column and we’re just going to tell it to grab any character it finds. But the important part about the name is that we’re going to grab the one that’s to the left of price. So in this case we’ll just simply tell the software to grab price and go find me this to the left of it.
And just to be a little bit more intelligent here, we’re going to find it nearest to the page, right edge.
Now what we’re going to do is add an element for the part number and once again, that’ll be a character string. And we’ll call this cs part number. Once again, we’ll grab any characters we can, we can add a little bit more intelligence because now that we’ve found the name, we can say it’s to the left of the name.
And also just to be careful, we’ll say go get me the the character string closest to the left of the page. And that’ll just kind of give us a little bit more insurance there that we’re grabbing the right fields. So what I’ll do now is go ahead and mask that first one and I’ll show you here. We’ll dive into some of the rows. And as you can see, not only am I grabbing the price, but we’re also grabbing the part name and a part number. The cool part about this is is as I match a whole document, I’ll show you here all of the elements. You can see we’re grabbing all of these elements off the table, even when the document spans multiple pages. So this is a very cool way to do it. And of course we can process these through the other samples. At this point, what we will do is create a repeating group block and that will repass the information back to the FlexiCapture application. So I’ll call this our repeating group table. We’ll give it a source element here you can see we’re going to tell the software to grab it top to bottom. That’s kind of very helpful because typically when we read a table, we want to read it. As we’re seeing it on the screen, we will then add the additional fields.
And now we have our blocks created. At this point, what we would do is we would save our results, go to file and export and we’ll generate that AFL file that we’re familiar with when we are working with FlexiLayouts. From here what we would do then is we would create a document definition, of course if you’ve done this before, I’m actually going to show you something pretty cool about repeating groups that I think is very helpful, especially in a lot of business ways that we read a repeating group or a table. So what I’m just going to do is go ahead and create us a new document definition. I’ll load that FlexiLayout just so that we have it convenient to us.
And you can see here we have our table and I have a test sample. I’m just gonna go ahead and run this test sample just so you can see typically how a repeating group looks and you can see we have each row and it’s outlined here and as I click, the software will highlight for me where it’s at. Now this is a default way of how a repeating group looks. Now, sometimes we like repeating groups to look like tables. So in this case, and this is a very neat feature, you can right click and say, show as table. So now instead of breaking them out into separate, repeating groups with repeating rows, it will actually format it as a table. So now when we run a test here, you can see it looks like a table. It feels like a table just as if the user was reading it on the documents. So this is a very cool way and flexible way to extract information from tables, especially with repeating groups, because we have a lot of control over how we structure it. And sometimes that gives us an advantage instead of using a table element. I hope you have enjoyed this video. If you have any questions, please feel free to leave a comment for us. Thank you so much!