Batch Type Video Image

ABBYY FlexiCapture 12 Video – Batch Types

Watch our video in order to learn how to use batch types in ABBYY FlexiCapture.

Hello. Today we’re going to talk about batch types and how we can use batch types within use of our ABBYY FlexiCapture projects. Now in order to set up a batch type, we will go through a set of menu options here and it actually will make sense after we have a quick discussion on why we use batch types in the first place. And really the reason why we set up batch type is because we want to process the same or even similar documents, but we may have a different workflow or we may have image pre-processing rules. We may want to handle things just slightly different. Even though the concept of how we extract this information is fairly the same as any other default. We may want to just do something a little bit special with a certain type of documents. So what we do here is we go to projects and we go to batch types.

When we go to batch types, we’ll create a new one and you can see here. Now we’ll get into the differences in setup. So by default, ABBYY has a default workflow and default rules for all of these settings. But by setting up a batch type, we now control individually what happens for this specific style of batches. So you can see, we can control image processing options. We can control certain recognition options. Now very commonly we would use the ability to select or even filter based on document definitions. So maybe we only want documents coming in to the software using this batch type to recognize a small subset of document definitions. That’s actually very common and that’s something that you should be aware of. Next we have control over some verification options if we want to do verification differently with these style of documents than maybe our other options here. And then lastly, we have export options and workflow. So it’s very common honestly that we would have workflow options differ just slightly based on a certain batch type.

So what I’ll do is I’ll hit next and of course I can name a batch type and I can also set up some other priorities here. Now, one of the important things to know is that a batch type can control its priority, which is actually something that you may want to be cognizant of because you may want documents that have a certain batch type flag to be processed at a higher priority or even verified at a higher priority. Also, we have access to what are called registration parameters and registration parameters are those items that kind of go along with a batch. So we can kind of keep flags so we can track all sorts of cool things in registration parameters and that information rides along with the batch so that we have access to those independently behind the scenes.

So once we finish our batch type, we now have a batch type set up. Now the common ways to use batch types happen actually during scanning when we create image import profiles. If we’re using the web service API as well. It’s very common we would designate a batch type. So, for example, let me open up our scanning station here and what I will do is select our project, which I’m in here and then you’ll see I have batch type settings and you will also now see that I have my batch type located here. So this is one example of when we can designate it. This is just at the scanning stage, but I’ll actually create an image import profile for you real quick and I’ll show you that we can designate items coming in through a specific channel to use a batch type as well. And that is typically done right here.

So you know different ways that we would use batch types. In fact, one of the common reasons we use batch types is so that we have access to registration parameters or that information that we want to go along with the batch upfront. Sometimes as a best practice, we would actually create what we call a main batch type because out of the box the software has a default batch type. But we don’t control some nitty gritty things like registration parameters in that default. So sometimes we would actually create a brand new batch type and name it main and then we have access to the same exact settings that any other document coming into the software will. To really understand the differences though these tabs are truly the differences. So we have image processing, we have recognition. One thing that you didn’t see when we walked through the wizard is event handlers.

So we can control programmatic operations differently with documents of a certain type and that’s a pretty important indicator or differentiator when we’re processing documents. We want to be able to control those sorts of things. But once again, all of these other tabs you saw as we were working through the actual wizard to set up the batch type. So once again, batch types are used when we want to do something special as far as these settings go versus kind of what we would consider the default settings of documents. But we still want to extract the same information. We probably want to do the same sort of exporting with those in that detail, but we just want to control the flow or the image processing or the events just slightly different than we do the defaults. So batch types are very common and just probably something you want to be able to make sure you utilize throughout your processing using ABBYY FlexiCapture. I hope you enjoyed this video. Please reach out to us if you have any questions. Thank you.

Creating Your First NLP Project

ABBYY FlexiCapture 12 Video – Creating Your First NLP Project

Watch our video in order to learn how to create your first Natural Language Processing (NLP) Project in ABBYY FlexiCapture.

Hello. Today we’re going to create our first NLP project together using ABBYY FlexiCapture. The first thing we’ll do is create a brand new document definition and we want to select the semi-structured or unstructured option. We’ll hit next. We will not load a FlexiLayout in this video, but we may potentially upload a sample image just so we have something that we can review together on the screen and in this case we can give the name of the document definition something appropriate to our business. Once we hit finish, the document definition editor will open and we’ll be able to modify some settings from there.

Now what we have on the screen here is a lease agreement and what we’re going to do is pull some information maybe out of the premises details. Perhaps we want this address right here, this address that’s located on the lease agreement and we want to pull a full term potentially on the lease agreement as well. So the one thing that I will do is I will right click create a field. And what we need to do is we need to map out what we call segment fields. The segment field is a parent. So for example, this address, which in this case is 950 Emerald Dreams Drive is a child of this parent, which I will call premises. So in this case we call that parent a segment. And so what we will do is we’ll give it a proper name. And the goal here was that we’re going to map the whole paragraph or phrase here for the section called premises.

And what we want to do is make sure we have these two options checked. Now we need some place that we can map just the address itself. So what in this case we’ll call it just the premise or premises. And this time this is not a text segment. So we want to make sure we keep that off. However we do want to make sure that we have this can have region option on. And then lastly I would like to pull out this whole term. So if we find this whole clause here, I want to extract it and actually this is very common where we would want to grab the whole clause and perhaps do a legal review on a specific clause. So in this case, since we only care about the parent, we will just grab the parent itself. I’m going to put the letters SEG for segment at the end of it just to mark that we actually do have a segment there. I will apply and hit okay. So now that we have our fields at least determined and properly set up, what we will do is we will right click and go to properties on the section. On the section we have an NLP tab and what we will do is we will create what’s called a segmentation model.

So we’re going to give it a name. We’ll keep the source as the section. The model type in this case should be segmentation and I will hit next. Now what this means is that as the NLP model is learning. We’re going to train it which segments to populate and since we know which segments because we named them appropriately, we will have the software provide those result fields as the premises and the term. The next thing we will do is we will set up an extraction model and once again remember I said since we’re extracting the premises and the premises comes from a segment which would be its parent, we just want to make sure we tell that software that that is the proper source here. This is extraction because we’re going to extract data, we don’t want just a parent clause and in this case we’re going to put the result field in the premises. Okay, so now I have an NLP model set up to extract three different fields, two segments and one actual field that we’re going to process. At this point we will save and publish our document definition.

Once we have our document definition published, we will go to a field extraction training batches under the view menu. We will right click and create a new batch for this demo. We’re going to go to the lease agreement and we have to select a variant. And in this case, I’m going to select the default variant. The next step is very, very important because we have to tell the software that we’re using NLP training for this training batch, and in this case we just want to trigger this NLP batch flag here on the batch itself. So I’m just going to click that option. Next thing we will do is load some sample images.

All right, now that we have some sample images, I’m just going to mark some of these documents to be unused. In other words, I don’t want to affect the training model for them. Now, understand in a true NLP production and situation, we would load a lot more samples, even potentially hundreds of samples per document type because the NLP model does need a good variety to learn from itself. So truly training the software on five samples is not sufficient enough for a real world environment. But I’m doing that today to show you the initial steps to create an NLP project. So the next step we’ll do is we’ll just go ahead and force a match on these document definitions. And what this will give us the ability to do then is to teach the model. And by lassoing the proper data.

Once all the documents are matched, we can now go lasso the proper data and what we’ll do is we’ll double click our first document and we will simply teach the software through manual lassoing how this is done. So I’m going to click here the premises segment. Remember I said for segments we want to provide the parent. So I’m going to actually lasso the full data here. For the premise I’m going to lasso just the specific data we want for that. So we got the extraction value and we got the segment. Segment being the parent. And then lastly for the term I will select the whole field because in this case I want the whole clause, which is the segment and I don’t need to particularly extract any specific detail out of that segment. So I will do this now for the remaining five.

Okay. And now that we’ve lassoed all the proper detail to train the NLP model, we will simply right click in the white area and train. This training is also accessible through the menu as well. Now to give you a little caveat, the training can take quite a while when we do NLP modeling, so just be aware that sometimes training, this detailed can take quite a bit depending on the number of segments or extraction details you have, so be prepared that you may wait several minutes to even hours to train a full NLP model.

Once the NLP model has been trained, we will give the software the chance to match our unused documents to see and test the results. Once the documents are matched, we can double click each document and see the results. It’s pretty neat because even though I just mapped the fields for these five top documents here, you’ll see here that I’m getting the whole premises parent the actual value that we want out of the premise. By the way, even if the text spans either multiple lines or multiple pages. And then of course, lastly, we’re grabbing any term that’s referenced, and you can see that here as I continue going down through the results. So a lot of cool different ways that we can use NLP in the field, of course with a lot of varying results and varying formats. There are some peculiar ways that we need to set it up and make sure that you follow the steps in this video to do so. But I think you’ll enjoy the results for natural language processing. I hope you enjoyed this video. If you have any questions, feel free to reach out to us. Thank you.

Configuring Multipage Documents in FlexiLayout Studio

ABBYY FlexiCapture 12 Video – Configuring Multipage Documents in FlexiLayout Studio

Watch our video in order to learn how to configure multipage documents in ABBYY FlexiCapture FlexiLayout Studio.

Hello. Today I’d like to show you how we can process a multipage document or even a single document that may have multiple individual documents in it such as a PDF that contains multiple documents that are referenced within it. What we’re going to do is create a new FlexiLayout project. We’ll give it a proper name and then we’ll move on to a properties window. This properties window shows us down here at the bottom, a checkbox that says allow multipage documents. This is very, very important. In this given scenario, we do want to set a proper minimum number of pages and maximum number of pages. If we know a maximum. Sometimes in a business situation we may not be able to predict how many pages or how much the maximum pages of a document can be, and in this case, if we don’t know the maximum, we can write I N F stands for infinity and that means the software is going to allow us to have a variable number of documents or even pages within a document.

The next step that we’ll do is we will upload a document. I have a PDF here that I’m going to show you. This PDF obviously has multiple pages. This is the price list document, and so I have prices that span multiple pages. However, you’ll notice that this is the beginning of a brand new price list even within the single documents. So in today’s video we have one PDF that contains multiple pages but even contains multiple documents with multiple pages. So what we’re going to do is upload this document. I’m just going to drag and drop the document and let the software process this.

Now that the documents are processed, what we are going to do is we’re going to assign proper search elements to these. Specifically a header and a footer. Header and footer are critical in to telling the software where a document starts and stops, it even is critical when we want to automatically split a document and process them accordingly. For today’s demo, I’m actually going to disable the footer. If you can predict a footer of a document or a set of documents, it’s very important that you do so, but in today to keep it a simple and short demo for you, we will show you how to add a header. What we’re going to do is we’re going to add a static text header and we’re going to look for the word price list at the beginning of each document that I’ve showed you here. So I’m just going to call this my “Price List” header.

We’re going to look for the word “Price List”. And before I move on, I’m going to make this a required element. So in other words, in order for a document to be considered a header, it will have to find the word “Price List” on the document in some location. Now that I’ve done that, I’m going to go ahead and show you what happens if we right click and match the whole batch of documents here. First off, what I’ll show you is there are lines that come up here. The blue line indicates what I would refer to as a baseline or even a reference line. That is what we are telling the software we know about the document and how a document starts and stops. The red line indicates that ABBYY is recognizing something different than what we are considering the reference or the baseline and the reason why that is is because we’ve uploaded a document with multiple documents within it so we haven’t been able to split it at this point.

Even though the software after we’ve right-clicked and matched is saying that it recognizes that perhaps the word “Price List” is actually located on page one and page four of the document. Therefore making it two separate documents. So in this case it may be helpful for us as end users to go ahead and set what we call a reference document. And what we do is we right click on those pages and we will assemble to a reference document. Now I’m going to uncheck this last one because I disabled my footer, but what we’re telling here is we’re telling the FlexiLayout how to reference this or actually to say it another way, what is the truth or what is the reality of how these documents are split and now that I’ve processed them and set that reference document, you can actually see here that instead of a red line we have a green line and the green line is simply an indicator that ABBYY FlexiCapture agrees with how we’ve split the document.

So if we select them all and right click match, you can see once again the items in blue are what I told the software, how it should be in reality and green is what ABBYY actually is performing automatically. So in this case, when we see a green line that matches the blue line, that means the software agrees with what we’re telling it should be the actual splitting. To emphasize though when we’re processing a multipage document, the most important things you should be able to do is set up the header and a footer if applicable on the footer. But header and footer is what tells the software when to start and stop a new document. And in the multipage or multidocument world, this is absolutely critical. From here, of course, we can process these, we can add additional elements. If you’re not sure how to control additional elements, please reference our video libraries for additional feedback there and how to videos. But once we’ve been able to split the document and have them referenced correctly with these green lines matching our blue lines, that means we now have the ability to process a multipage and potentially even a multidocument single document. I hope you’ve enjoyed this video. If you have any other questions, please reach out to us. Thank you.

creating document sets in abbyy flexicapture

ABBYY FlexiCapture 12 Video – Creating Document Sets (Updated)

Watch our video in order to learn how to create document sets in ABBYY FlexiCapture 12 which captures a set of related documents.

Hello. Today we are going to learn about document sets within ABBYY FlexiCapture. Now, some way that we look at document sets is it’s actually easier just to see it as a finished product before we dive into the creation of such a thing. So as you can see here, I have a batch and I have a mortgage document set, and within my document set I have multiple separate documents. Now the concept of a document set is that we can have a package or a set of documents that flow together throughout the software, throughout workflow and throughout the export process. And it’s critical that they stay together. And this is very common in mortgage processing, loan processing, but typically any situation where as a business we want documents to stay as a group together. And this is how the software displays it to us. So we have a document set and we have additional documents that belong to the set.

If I double-click the top level, I will see what we call a summary tab, which is optional. But in this demo today we do have a summary tab. And then of course I have the subsequent documents that flow around. I can double-click them and open them up independently or I can use the summary tab and double-click them and open them up directly from the summary tab. I’m a big fan of this because I like the ability that if I double-click anywhere on the summary tab in a field, the software will not just only open the given document but will open the document with the field that I selected or double-clicked on. So the summary tab gives us a lot of capabilities. Also just like any other part of FlexiCapture, if we want to perform comparisons or if we want to create rules at the summary level, we can do that as well.

So once again, document sets allow the software to maintain a group of documents as it flows through workflow. Now what I’m going to do is actually show you how we go throughout the process of creating a mortgage document set. Before I do so I want you to understand that when we create a document set, we’re showing fields from a group of documents of course, and in order for a field to be accessible to us. We must create fields as index fields, so in order for a field to be listed as an index field, we just need to go into the given document definition and I’ll show you here real quick and just double-click the field and make sure that that field is an index field here. Okay. Once we do that at the document definition level, we can reference that at a document set level. So let’s do that. Now let’s create our mortgage document set. What I’m going to do is I’m going to hit new, I’m going to go to document set. I’m going to add a summary. This is optional, but for today’s demo we will. We select the different documents that we want associated with the document set and we create our document set with a given name.

At this point, our document definition editor opens and on the right we see our document structure. So the software is going to provide us a summary tab because that’s what we selected and we will also have reference to additional fields here from the subsequent documents that we’ve created. Now if I go to testing and run test, the software will show me what that summary tab would look like, which is pretty neat because we may decide to show or reorder things to an end user. And we also have the ability to link additional fields that we’re maybe not going to display. So if I wanted to link or show an existing field on the summary tab, I can do that by right-clicking on the summary and selecting the proper field that I want to display here as well. So another way that we can do that and make it very easy for an end user to see all of the fields at one time.

Lastly, I will mention that a document set once again is just a combination of separate documents and just like any other assembly situation that we have throughout FlexiCapture, we have that same option at a document set level. So it’s very important to understand that you may want to tweak the assembly options that you have here. We may have some sections that are optional and hence they have an option of having a minimum of zero or they may be always required in a given document set or package. So important for you to understand that the assembly rules that happen throughout traditional FlexiCapture processing also happen at a document set level as well. And lastly, I’ll just kind of mention to you that just like traditional FlexiCapture, we have the ability to create rules, compare fields, perform math, do database lookups, and once again, every field will have the ability for us to create rules here just by double-clicking on the specific document section I can create rules and do all kinds of neat stuff that we do in traditional FlexiCapture as well, but do that for a whole document set.

Okay. So once we’re done creating a document set, we’ll publish to our document definition. And once we start processing documents, they will show up accordingly, where we will have a high level option here, which is the document set and then each independent document will show up there as well under that document set. Well, I hope you’ve enjoyed this video. If you have any questions, please feel free to reach out to us. Thank you so much.

ABBYY FlexiCapture 12- Operator (End-User) Verification Options

ABBYY FlexiCapture 12 Video – Operator (End-User) Verification Options

Watch our video to learn about the three verification options you can use in ABBYY FlexiCapture: manual, group, and field verification.

Hello. Today I’m going to show you the three different options we have when it comes to performing verification on documents within ABBYY FlexiCapture. The first way I’ll show you is manual verification. The next one is group verification and the last will be field verification. Now for manual verification, we simply open a document within a queue, which you can see here on the screen and you can go one by one by reviewing the documents. Now the benefit of this is that you get a good overall sense of what was extracted. You get to see the whole document, you get to see every single field that was extracted. The consequence of this is that sometimes the process can be a little bit slower because as you can see anything in red the software is highlighting as a lack of confidence. In other words, it questions how well it read that given character and the fact that it had a question about it, it’s going to highlight that in red for me.

So I would go one by one through those errors and as I click the software will take me to that spot and I will resolve the error. So that’s our first one. We call this manual verification and you can see that here also if we do have an error on the document, like a business rule error, then those errors are located at the bottom. So sometimes that actually is helpful to do verification this way, especially if we have a number of different rules that are violated. And also if we have a limited document set as well.

The next way I’ll show you is group verification. And to do that we simply highlight the documents and we will go to verification and run data verification. When we do group verification, there’s a couple of things you should know. First off, the software labels it as group verification. And the reason why I point that out is because you can actually do a mix between group verification and field verification. In fact, in this demo you will see a mixture of group verification and manual verification or field verification here as well. So in group verification the software says, Hey, this is what we’ve recognized as a checked checkmark field and so every single checkmark and all of these documents in the batch, it’s pulled off for me and it’s telling us and giving us the option to see them all at once. If at any time I want to override a valid checkmark, say for example, maybe this one here is not a valid one, I can simply click on it and put it in question or fix it and actually by clicking it, it may be a little bit difficult to see in the video, but it’s actually saying that this is no longer checked or if I want to follow up with it later, I can actually click it again and there it will put a red checkmark and that red checkmark is just simply just me saying that I don’t know the answer right now and I will come back to that later. I can also right click and show the character image. So if I need to get a little bit more specific on what’s checked there and to have a little bit more broader sense of what that specific document and what that specific checkmark is, I can do that as well.

Now grouped verification can do much more than checkmarks. So for example, here’s one that’s very similar where the software saying, okay, now I’m showing you the unchecked checkmarks, but it can also show me the different characters. So the software is saying it recognized these characters in the documents as zeros and I can simply review them and see what’s going on and see if I agree. And I would agree. In this case, the software recognized this character as one and these characters at two.

Now as you and I can probably see the software mischaracterized these two characters, and in fact I might even right click and show them to you quickly that you and I can probably tell that the number is one. So if I want to correct them, all I need to do is type over it. So I’m going to fix them and say Nope, the software recognized them as two, but I’m going to correct them and put them as one. So I’m just literally typing over the character and correcting them to the number one. These look like threes and these were characterized as fives and so on and so on. In fact, here’s a good example where the software is recognizing these characters a six but you and I as humans can probably tell that this is actually a five. And so we want to actually force that to be a five by typing the number five over that box.

So this is an example of group verification I’m going to go through because I want to get to field verification. Like I said, the neat part about group and field verification is that we get them both in the same screen so we can use them in combination. So here’s an example of field verification. The software tells me what field I’m looking at and I can also see the given text here and I can see the context of the document below. So this kind of gives me a cool little snippet so I can understand where I’m at in the document so that if this has an incorrect reading I can override that. So if I need to override that I can simply just type over it. So it uses like any common editor of text to just simply write over and override what the software is telling us the options are.

Now if the software is correct, in this case it would be Fremont is an actual right value, then I don’t need to touch it. Only if it is wrong, do I touch it. And you can see the benefit of this is that if we have personnel that want to just be able to stay on their keyboard, they don’t want to have to click a mouse to go to the next, there are keyboard shortcuts and this whole process of correcting data is actually very quick to the human that is very comfortable with doing this. So this is an example of field verification. The cool part about this is that you can use all of them in one. We have clients that use only manual verification. And then we have clients that use a hybrid of these different verification types. So when you’re developing what the operators should verify and how they operate to perform their verification, just note that you have a few different options here.

Not every business does it the same way and that’s perfectly okay, but just we want you to make sure that you understand the different options as far as verification goes and the different ways you can use these interfaces to speed up the verification. And the last thing I will explain is that the really cool part of this is that I can see a percentage here of how far I am through this batch and how much is verified, and I can actually see the specific location of where I am in the batch as I’m verifying this exact value. So a lot of different ways that we can speed this up. I hope you enjoyed the video. If you have any questions, please let us know. Thank you so much.

ABBYY FlexiCapture 12- Creating a Classification Training Batch

ABBYY FlexiCapture 12 Video – Creating a Classification Training Batch

Watch our video in order to learn how to create a classification training batch in ABBYY FlexiCapture 12 which teaches the classifier system.

Hello. Today I’m going to show you how we create a classification training batch within ABBYY FlexiCapture and the concept of creating this classifier is the ability to teach the software how to determine document types automatically so that a human doesn’t have to be involved in telling the software the document type that is processing. So what I have are three separate types of documents. You can see them listed in my document definitions menu here and I’m just going to double click and show you that each document definition is blank. There are no samples, there’s not even fields here listed on the side. And what we’re going to do is tell the software using samples, how to determine the document type. So what I’ve done is I’ve created a classifier training batch and in just one second I’m going to show you how we load images. But I want you to understand that when we do classification, sometimes it’s helpful that we have documents listed in a folder and sub-folders with the document type listed separately. And the reason why we do that is just so that it’s easy for us to process them and keep the truth documents in the proper spot where they’re supposed to be.

Now on the classifier batch, before we get started, I want you to understand also that there are settings called “Classification Profile”. Also we have what’s called a recall or precision priority and it’s important for you to understand and research these types of settings and the effects thereof because they may impact how the software is looking at your documents and how it’s training and reviewing the document types and using that classification technology. But for today’s demo, I’m going to keep these defaulted so I’m just going to open my classifier batch and you can see here I’ve already put some documents in my batch and once we’ve loaded them, sometimes it’s nice to use our sub-folders to name the class. I’m actually going to click that and you can see it’s very quick. The software comes through and determines the class for us. Just for today’s demo, I’m going to properly set the section. I’m going to click on the document and set the section here on the right. Once I’ve done that, we can modify the state, but we’re going to leave these all “For Training”. In other words, the software is going to literally train itself using these documents. The other setting that you would get is “For Testing” and that can be changed on a per document basis as well. But for today’s date I’m going to use “For Training”. Okay, so I’m going to go ahead and select these and I’m going to tell the software to train.

Once the software is done performing its training, I’m going to simply hit a classify button up here on the menu. And what the software is going to do is use the logic that it trained itself on to determine the result class. The reference class is sometimes what we call a truth class. So that’s the actual answer. And then the result class is what the software is using and telling us is what it’s thinking that document type is. And that classify button, it gives us the ability to run what the software believes. The document type is here and use its own training.

Now that it’s done, you can see our result class matches everything and we’re good to go. There is a benchmark tool that you may want to use and determine how the software is reading a group of documents. But let me just share with you. When we use classification training, it’s very important that we potentially run classification training over hundreds of documents per document type. So even though I only have 11 that’s actually a very small amount of samples. In a real world scenario, we would use hundreds per document type to train the software because a given variety is good and the software needs to understand the different formats potentially that a document type may have. Now before we run some actual documents and determine how well we did as far as training, and I will tell you that it’s important that at this point, you go to “Project” and “Project Properties” and you make sure that we have a classification batch selected here as a classifier. You may need to do this on the batch type as well if you’re using batch types within your project. Okay, so I will go ahead and select that and I’ve had documents loaded here in a working batch for me and I’m just going to simply right click them and recognize what the software is going to do at this point is use that logic that we set up that classifier to determine what kind of document type these are.

So now that the software’s perform recognition, we can see it determined a document type per document and of course we do have these separated, so you can see there’s our banking applications and our questionnaires and our tax documents down here at the bottom. So now we continue processing documents around that classifier so that the software can determine automatically what those document types are. Once again, we would use potentially hundreds of documents samples to train the software and then from this point on we can tell the software and teach it using field extraction batches and other videos that we’ve produced, how to automatically using machine learning tell the software where to extract the data per document type, or even what we call a variant of the document type. So I’d hope you’ve enjoyed this video. This is a really quick preview of how we set up a classification training project, and if you have any questions, feel free to reach out to us. Thank you so much!

Thumbnail of ABBYY FlexiCapture 12- Four Simple Steps of OCR Processing video

ABBYY FlexiCapture 12 Video – Four Simple Steps of OCR Processing

Watch our video in order to learn how to perform OCR Processing in Four Simple Steps in ABBYY FlexiCapture 12: Import, recognize, verify, and export.

Hello. I want to walk you through the four steps that we use when we do any sort of document automation or OCR process. Now the four steps are: we import documents, we let the software perform recognition, which is step number two. Step number three is we perform a verification step where if and only if we need to, a human will be involved in the verification of the OCR results. Then lastly, we export those results. Now what I’m going to do is I’m going to send some documents in, so this would represent the import step in the process. There are multiple ways to get documents into the software, but for today’s demo I’m going to show you how we can just right click and send documents in manually. Now to explain the other steps quickly, we can also allow the software to monitor a folder.

It can monitor an email box, you can scan documents, but for today’s demo we’re going to keep it solely focused on just uploading them from a tool that we offer here where you can just right click and send the documents in. I’m just going to simply select the project and then I’m going to go ahead and send these documents over to ABBYY FlexiCapture. At this point, the documents go into ABBYY and they become available to us after the recognition step is done. Now the recognition step happens behind the scenes, so there’s nothing to really see there besides ABBYY is using its intelligence to perform the data extraction that we’ve told it to. And then the next step verification is where we will go into a queue and look at the results. We can double click every single document. We’ll see the results here on the left and the copy of the document on the right and we would perform a manual verification and specifically look at items that maybe have these red flags so that we can put a little bit of attention to those business rule errors that are happening on documents.

The very next step. Then once we are done performing the verification on every document is we hit close task. This will allow the documents to be exported. Now when we export we have a lot of different options, but for today’s export I’m just going to show you that we’ve exported both a copy of the data so the data that we extracted on the document can be exported and a copy of the actual document itself is also exported as well or available to us for export. Now just a little caveat with exporting is that we have a ton of flexibility with exporting just like any other part of the application so we can export to flat files like what I’m showing you in this demo, but we can also export to databases and web services. We can call other applications in your internal business as well. So a lot of options here, but this is the four steps and once again that is import, recognition, which it happens behind the scenes where the software does its extraction.The third step is verification where we allow a human to be involved in reviewing the results. And then lastly, the export step, which gives us both a copy of the data that it extracted and the actual document itself. And typically that would be in some searchable format, so we can store that in a document repository. Thank you very much!

Thumbnail for ABBYY FlexiCapture 12- Creating Alternative Layouts video

ABBYY FlexiCapture 12 Video – Creating Alternative Layouts

Watch our video in order to learn how to create an alternative layout in FlexiLayout Studio in the ABBYY FlexiCapture intelligent capture system.

Hello. Today I’d like to show you how we can use ABBYY FlexiLayout Studio to create an Alternate FlexiLayout or what we sometimes refer to as an Alternative FlexiLayout. And the reason why we would have an alternate is because we want to extract the same information from a variety of different forms, but the look and the feel and the structure of those forms are drastically different. And in this case we would want to tell the software that we’re going to use alternates and we’ll direct the software, how to figure out which alternative to use. The first thing I’m going to show you is what the documents look like. I have two transcripts and one is just from a generic like a homeschool document and the other one is from a document from Doe High School and I’ve already created the first one for us so that we don’t have to worry about this and waste video time to do so.

But this is the typical structure where you would add your search elements, define the header so that we can figure out which form this is and process those accordingly. Now I’m just going to add an alternate tier by right clicking at the very top of the tree and go in to add FlexiLayout Alternative. When I do that I will get a new set of search elements and you can see here I now have brand new search elements. And at this point I may want to rename these so that I can tell later on downstream which alternate is selected by the software. In this case, our typical document processing and FlexiLayout design happens just the same, so we will define a header and the header is very crucial because this is going to tell us how we locate this document and what makes it stand apart differently than the other ones.

And then of course we can have a footer. I disabled it in this case, but we can reference other footers. Now for ease of this video, I’m not going to describe how we would go and process the results accordingly, but you can see, I’m trying to find a student name and a GPA and just like any other FlexiLayout, we can add logic to go find that information using labeled fields and relations and all those cool little features that we have within the FlexiLayout Studio. But for today’s demo, I’m not going to do this. I’m just going to create the placeholders. So I’m just going to say this would be our student name and then of course we would provide logic and how to extract that in your forms and maybe I’m just going to select a different one for GPA. So now that we have the information being extracted and of course you can test and tune your documents, we would create a block. When we create a block, we will add the type of block, for example, student name, and this is where it actually becomes pretty important. We want to tell the software for this layout. So when you go to the Homeschool layout, I want you to reference the Homeschool student name and then you want to select alternative layouts and make sure you set the source element as well. And the software will automatically, of course with intelligence go to the proper FlexiLayout search element for us here.

So now I can see in the Doe Alternate FlexiLayout, I want to get it from this source and in the Homeschool I want to get it from this source and just to keep it very basic, we’ll do another one. This is going to be our GPA.

And in the Homeschool one we want to reference this GPA and in the Doe one we want to reference this GPA from this source element. So creating these different Alternate Layouts now has given us a lot of flexibility in where we control how the software extracts that information. Now let me just be clear. The normal designing and logic of a FlexiLayout still happens. So you want to define your headers and footers when and if applicable. You want to define your search elements and your relationships, and your grouping. Just like normal, but the only thing we’re dictating here is which FlexiLayout Alternate is being used. Now some things that is handy to do when you have an alternate is to be able to set the layout here. So we know that this one is going to be Homeschool.

And we know that this one is going to be Doe and the cool part is is now when we match, we can see here some logic that tells us when a document is referenced. So if it’s a certain check mark, we can tell that the software has referenced those for us. Now, just like normal FlexiCapture Studio, we would want to process these and test and make sure that what’s matching is green. And then of course we would export it. And just to show you on the other side how things look, I will open our project setup station.

So in our project setup station I’ve created a document definition and uploaded that FlexiLayout that we exported. But to show you the results here, you can see that even though I have one document definition, the content and location of those fields differ based on the layout. So this is a very cool way for us to kind of be able to dictate which layout the software is using, but have one FlexiLayout so that way we don’t have to have multiple document definitions for the same content. And so later on downstream we can have the same workflow, the same rules, the same export path that every other document in this document definition will have. So it’s a really good way for us to have one document definition and apply one FlexiLayout with Alternates here. So I hope you enjoyed this video! Please feel free to reach out to us if we can help you with anything!

Thumbnail for ABBYY FlexiCapture 12-Creating Tables in FlexiLayout (Basic) video

ABBYY FlexiCapture 12 Video – Creating Tables in FlexiLayout (Basic)

Watch our video to learn how to create a basic table in the ABBYY FlexiCapture FlexiLayout Studio application and the power of this innovative solution.

Today, I’d like to show you how we use the FlexiLayout Studio and our FlexiLayout out templates to extract tables or tabular information from documents. Now I’m going to show you my samples real quick. I have a document that has two pages worth of tables. We have a part number, a description or a name, and then we have a price and it’s two pages. It’s two pages long. And then my last sample here is another table where we have a name or description and we have a price, but we don’t have a part number. And so we want to extract this information from the documents. And make sure that we process them accordingly. So what I’m going to do, and this is probably one of the most important first steps that we can do, is we need to add a block. And this is going to be a table block. And what we’ll do here is we’ll create our columns. So I did share with you, we have a part number, we have a name, it’s a name or a description, and then we have a price. Okay, so we’re simply going to define our fields here and just to keep this video a little bit speedier, we’ve already assumed that we know the headers or how a document starts in our case, but let’s go ahead and add a table element. Now when we add a table element, we can of course give it an intelligent name and we can tell the software here about the columns. But if you realize it’s actually looking for a block, that’s the block that we created in the first step. So we’re just simply gonna go tell the software about that block. Now of course we’d want to name this something intelligent, like a table block, or something similar to that. So you can see here we have part number, name and description, and a price.

If you look at the properties of these, you can see we can get very, very specific. Now for ease of our first table, we’re going to keep it very simple and we’ll just simply use keywords as part of the name. But you can see, we can reference other elements of a document to tell us when a table starts honestly and when it even stops. So a lot of different configuration we can take here. I won’t do it for ease of this demo, but obviously super flexibility here when we’re extracting tables for this demo, I know that the word description is sometimes found or the word name is sometimes found, so I’m just simply going to tell the software. This can be name or description by using our pipe symbol there.

We have other columns that are pretty relevant to us. If there’s a fixed column order, we want to tell the software that, and I’ll just double click here so you kind of see what we do, but it’s really just defining your own order. You can have an array of these, so if you have multiple different ways, a document table comes in, you can of course set multiple different column orders for that. Here we’ll tell the software to use a header and we can even tell the software to look for a footer that is optional by default and we can tell the software how to detect rows. Now you can understand for this demo that I’m keeping it very simple and these are very simple tables, but I want you to understand the amount of flexibility you have here. As you just looked at those three tabs, there’s a lot of different options that we can use using source elements and other search elements that we have to get our tables zoned in here for us.

So once again, I have our columns here applied. I’m just going to go ahead and set these up to be processed. I’m just going to right click on our first item here and match. And what I’ll show you here is that we were able to extract the table. Now you can of course double click on the table and see some more of the specifics here. And you can see we’ve got our part number column, our name slash description column and our price column. And then here on my other sample, I will go ahead and match this one as well. And by clicking on the table element, I once again have the name, description, and the price and realized how in my demo here I don’t have a part number field. And there’s configuration of course that can either let that be optional or force it to be required.

But this is pretty awesome in how we extract a table. It’s very easy and it’s very flexible in the way we do it; a lot of different ways that we can configure it to do what we want it to do. But once I’ve extracted the table here, just as our usual practice, we can export this to our AFL file and then upload that to a project. And now we’re processing tables accordingly. I hope you enjoyed this video. Please let me know if you have any questions and thank you so much!

Thumbnail for ABBYY FlexiCapture 12- Creating Tables in FlexiLayout Studio (Advanced) video

ABBYY FlexiCapture 12 Video – Creating Tables in FlexiLayout Studio (Advanced)

Watch or video in order to learn a more advanced method for creating tables in the ABBYY FlexiCapture FlexiLayout Studio by utilizing repeating groups.

Hello. Today I’m going to explain to you how we create tables but more advanced. And sometimes we get in situations where there’s repeating information or tabular information on a document, but we can’t use a table element as referenced in the first basic video of table extraction. So we have to use a strategy using repeating groups and repeating groups gives us the ability to grab that repeating information but with maybe a little bit more intelligence or a little bit more complexity. So as you can see on some of my samples here, I have a situation where we have documents that have information and tables in this first page is pretty easy, but then we start getting in situations like what you see here, where we have some information, at the top, and then then there’s some white space. And then there’s another section and some additional repeating information towards the bottom.

And although in some situations we can use tables for this, it’s, it’s probably more appropriate that we use some sort of repeating group elements so that we can tell the software how to find a row and repeat itself as that row continues. So let’s walk through this. I’m just going to go back to our first page here and I did set up prior to this video, the ability for us to ignore a header on these documents. Every single page of mine has a header and just for creating a very simple video, a very pointed video. I’ll go ahead and ignore how we did that, but really all we’re doing is we have our group here where we’re ignoring the top header so that doesn’t add any complexity to our video here of using repeating groups. So the very first thing I’m going to do is figure out how we can map a table and when it comes to a table we want to anchor the table by its columns initially.

So we know that name is a very good anchor and as I go across this I can see that name isn’t always listed, but if I can find the name column once, then I can tell where this. In this case, this price list, I can see where that column is referenced on each of them of these subsiding samples here. You can see here is another document with a name, and then of course it goes down. And then the last document here has also a name column. And then they’re also structured very similar. In this case what I’m going to do is find us the name column. So I’m just gonna create a search element and it’s going to be static text. And I’m going to say go find me the name column. So I can even call this our static name and I’m going to tell the software to find the name column. Once again, when you create a FlexiLayout, your documents will be different than mine and some of your concepts may be a little different than mine, but the way we attack them as probably very similar. So use your own names and your own types of elements. But in my case, I want to anchor off the names, so I’m going to tell the software here to always find it on the first page.

And then I’m also going to tell it, always just go find the point at the top. So if we do see the word “name” multiple times in a document, which should probably be fairly common. And as this type of document we’ll tell it just always give me the one that’s nearest to the top. In other words, go find the one that’s at the very top of the document. Here are the very top name column. So there we’re going to get the name column. Now it’s very important that we find a field like price to make that our anchor per row. Once again, we use the name field to be our anchor per table. But now we want to be a price field to be our anchor per row. Cause that’s very consistent. Now just because I know how the software works, I’m going to create what we call a separator cause I’m going to have the software repeatedly find me this row, but I need to somehow find the separator here and that separator will tell me where the price begins. So I’m just going to create an element and it’s going to be a separator. A separator is simply a line and I’m just going to give it a name and it’s going to be a vertical separator.

And then we have our relations here. I’m just going to tell it to use the name field and go to the right so that we find the separator to the right. So go find the name field and we’re going to say to the right and I’ll probably give it some sort of offset. And once again your documents may be a little different, but I’m just going to let it kind of push that over to the right there a little bit. So we have some room. And lastly what I’ll do here is say go find me the one nearest to the name. Now this is making the assumption that I have a good solid barrier here or a line in between the name and the price column. So I’m just going to match this document so you can kind of see what we did here.

And that’s the column. So we found name and we find a separator. So now I can tell the software, now that we know where name is and we know how to section off the page because we have the separator, now we can get into some intelligence. So what I’m going to do is create a repeating group and now we’re going to start working in building this table. When we create the group, we’re just going to call it our table field. And we’re going to create some intelligence here. So we’re going to tell the software to ignore. In my case the header, and this is just once again special to my documents here.

So we’re just going to tell the software to ignore all instances of this header so we don’t get any confusion. And I’m going to tell the software that it’s going to find these below that name field. So we’re just going to say below. And so the software is going to say, okay now I have repeating information that’s going to be below the name field. Pretty simple here. We have our table. Now let’s focus on price. Cause once again, prices are anchor. If I can find a price, I can find everything else related to this row. So what I’m going to do is create an element and we’re going to just in this case consider it a character string. And this is going to be our anchoring field. So I’m just going to call it cs price. Cs stands for character string. And I’m going to add my own alphabet here because I know we have some common characters that we find. And let’s just add common things that we see in documents or in prices here.

And then lastly, I’m gonna create a new relation. I’m going to say, okay, now that you can find these characters, go find it to the right of the name of this separator here.

I’m just going to go ahead and apply this and just so you can see it take place, I’m going to go ahead and walk through this. So here’s our name column. You can see it highlighted, here’s our separator. And then lastly you can see our price and we’re able to capture all of these currencies. And then if I of course match another page, it’ll be very similar because I have a name and I have a separator and I have the prices as well. So now we have price and that’s going to be our anchor. That’s our row anchor. So what we would commonly like to do as a best practice is we create a group and we’re gonna call this group our row. And this will be how we define the full row. So in this case, our row, we actually have properties on a given row and we’re just going to tell the software, don’t find the row if you don’t find the price. In other words, you don’t even attempt to grab the row if we don’t have a price. And maybe something else we’ll do is we’ll say go find this information if it’s below the price name.

We’re just going to give it some room here to create its own square so that we can tell the software how to anchor in and kind of rectangle in this given row. We’re going to say go get the price and let’s say it’s below the top of the price. Maybe even give it some offset here, a negative offset. So that just gives us a little bit taller of a rectangle and then we’ll say it’s above the bottom in a similar fashion of price.

Now we have a group and that group defines that whole row. What we’ll do here, actually just just go ahead and run one. Now if I go into my table in my repeating group and I want to find the row there, you can see in gray how the software is now mapped out the row. So it found the price and now that we’ve found price we’ve structured in to actually find the whole row itself and now we can be more intelligent within the group without adding a lot more complexity. We’re just going to go ahead and add what we call a character string and we’re going to call this the name column and we’re just going to tell it to grab any character it finds. But the important part about the name is that we’re going to grab the one that’s to the left of price. So in this case we’ll just simply tell the software to grab price and go find me this to the left of it.

And just to be a little bit more intelligent here, we’re going to find it nearest to the page, right edge.

Now what we’re going to do is add an element for the part number and once again, that’ll be a character string. And we’ll call this cs part number. Once again, we’ll grab any characters we can, we can add a little bit more intelligence because now that we’ve found the name, we can say it’s to the left of the name.

And also just to be careful, we’ll say go get me the the character string closest to the left of the page. And that’ll just kind of give us a little bit more insurance there that we’re grabbing the right fields. So what I’ll do now is go ahead and mask that first one and I’ll show you here. We’ll dive into some of the rows. And as you can see, not only am I grabbing the price, but we’re also grabbing the part name and a part number. The cool part about this is is as I match a whole document, I’ll show you here all of the elements. You can see we’re grabbing all of these elements off the table, even when the document spans multiple pages. So this is a very cool way to do it. And of course we can process these through the other samples. At this point, what we will do is create a repeating group block and that will repass the information back to the FlexiCapture application. So I’ll call this our repeating group table. We’ll give it a source element here you can see we’re going to tell the software to grab it top to bottom. That’s kind of very helpful because typically when we read a table, we want to read it. As we’re seeing it on the screen, we will then add the additional fields.

And now we have our blocks created. At this point, what we would do is we would save our results, go to file and export and we’ll generate that AFL file that we’re familiar with when we are working with FlexiLayouts. From here what we would do then is we would create a document definition, of course if you’ve done this before, I’m actually going to show you something pretty cool about repeating groups that I think is very helpful, especially in a lot of business ways that we read a repeating group or a table. So what I’m just going to do is go ahead and create us a new document definition. I’ll load that FlexiLayout just so that we have it convenient to us.

And you can see here we have our table and I have a test sample. I’m just gonna go ahead and run this test sample just so you can see typically how a repeating group looks and you can see we have each row and it’s outlined here and as I click, the software will highlight for me where it’s at. Now this is a default way of how a repeating group looks. Now, sometimes we like repeating groups to look like tables. So in this case, and this is a very neat feature, you can right click and say, show as table. So now instead of breaking them out into separate, repeating groups with repeating rows, it will actually format it as a table. So now when we run a test here, you can see it looks like a table. It feels like a table just as if the user was reading it on the documents. So this is a very cool way and flexible way to extract information from tables, especially with repeating groups, because we have a lot of control over how we structure it. And sometimes that gives us an advantage instead of using a table element. I hope you have enjoyed this video. If you have any questions, please feel free to leave a comment for us. Thank you so much!