ABBYY FlexiCapture Video – Invoice Splitting

Watch our video in order to learn how to configure ABBYY FlexiCapture to split invoices, one of the most basic questions we see in AP automation.

Hello. Today, I’m going to discuss one of the most popular questions we receive when it comes to invoice automation, and that is, how do we configure ABBYY FlexiCapture to split invoices? Well, this is a very tricky question, because everybody wants to do this with their own different logic, so we need to remember that FlexiCapture provides a workflow engine that may be used to split these invoices. Now, we control the logic of splitting that invoice, whether it be by page number or maybe we’re looking for an instance of an invoice number on every page. We can use a combination of pieces of logic using the workflow engine within FlexiCapture to determine this.

Now, what I’m going to do is I’m actually going to show you how I set up a workflow to do this within my project. Once again, the way you implement it and some of the logic you use to do automatic splitting may differ, but nonetheless, the product does support it. So what I have is a batch type, I call it my invoice splitting batch type. If you have done any investigating into ABBYY, you realize that batch types are a way for us to have a same project, but we can use multiple different workflows and kind of segregate documents that way through workflows on each of their own batch types. So I have an invoice splitting batch type. And what you’ll notice here, very quickly, is that we have a custom workflow. This custom workflow has three custom options that we’ve added. Two of these are scripts, those are the user types, and then we use recognition step, which is an out of the box type.

Now, the first thing we’re going to do is split the batch into single page documents. The next thing we’ll do is we’ll tell the software to recognize every single page document. Now, what it’s going to do here is obviously look for the vendor and those type of details, but it’s also going to look for the common fields that we find on invoice, whether that be invoice number, dates and those sort of things. Next, we’re going to analyze every single page and determine when we can group those documents together. Here’s special hint, this is where the custom logic of the application comes into place. You may do your invoice splitting different than me, and that’s okay. Neither of us are right or wrong, but that’s the power behind the engine here. And then the last step, which is an out of the box step here, is a recognition step, and you’ll see this actually happen the second time, because we did it earlier. Now that they’ve been merged together, the recognition step here will allow us to really have the software recognize the document again, and therefore, we get good, clean invoice extraction results.

So, that’s what we have going on here. I’m going to just show you quickly how things happen within the invoice separation stage. I have a script. This is not rocket science, however, I’m not going to go into every single line of the script, but I want you to know that we have a script that separates each page and then creates a document from the page, hence why we have this term, ‘create document from page’ here in the script. Remember, the recognition step then actually does the out of the box recognition for every single page. And then lastly, we determine how documents are combined. Now that happens through a pretty advanced script. You’ll see that script here, and once again, I’m not going to go through line by line, because this is custom, it’s really up to you on how you implement this. But nonetheless, there’s a custom script here that we use for batch processing that determines when documents are merged. And then once they’re merged, we will go into recognition.

Let me just show you how this works. What I’m going to do is I’m going to create a new batch, which I’ve already done. And I have this batch of the invoice splitting batch type. I have a document that has four invoices here. Actually, let’s me just show this to you very quickly. All right. And you can notice that they’re separate invoices because I’ve on purpose made the invoice number different for every single one of them. And remember that in my logic, that’s how I’m determining when to separate documents. And you’ll see, there’s a different invoice number there on that document, a different invoice number there, and then a different invoice number there.

I’m just going to go ahead and load this and we’ll let the software perform the separation, the merging and then the final recognition of this process so that we have this invoice split. Now, what you’ve seen is that we had four different documents in a single PDF, and now the software has taken that batch and actually separated those documents here into four separate invoice documents. And we can see here that the invoice number then is extracted properly for the document, and that’s really the difference field that I made here on this specific sample. You’ll see that as I click around the software then it has recognized that invoice number separately for each of those documents there.

Now once again, this is my example of how we can do invoice splitting. And that logic is really up to the scripts and different pieces of conditional statements that you want to put in place that determines whether you use common header fields, like invoice number, or whether you use page number fields. But nonetheless, it’s really up to you on how you implement it, but the software absolutely can support it. It is very common best practice, however, that even though the software performs a splitting, it may be worth your while to add what would be called an invoice separation verification queue, where the user is involved in verifying that the software did actually perform a correct separation. That would be considered a best practice, if you’re able to implement that within your organization. And that way, if there was a piece of logic that was missing for a given invoice or a given vendor, then at least you’d be able to catch that before it goes to a true OCR verification, and then of course, to the ERP or accounting system thereafter.

I hope you enjoyed this video. If you have any questions for us, please reach out to us. Look forward to working with you. Bye now.

10 thoughts on “ABBYY FlexiCapture Video – Invoice Splitting”

  1. Hi,

    Just found your script and it seems that in the first splitting stage there is a small error. I’ve tested it and when processing a document with 10 pages (5 invoices – 2 pages pr. invoice) it did’nt split it correctly.

    I’ve made some changes and for now it’s working as it should.

    Here is the code:


    foreach (IDocument document in Documents)
    Processing.ReportMessage(“Total pages in original document: ” + document.Pages.Count.ToString());
    int iPageCount = document.Pages.Count;
    for( int i = 0; i < iPageCount; i++)
    //Processing.ReportMessage("Value of i: " + i);
    //Processing.ReportMessage("Pages in the document: " + document.Pages.Count.ToString());
    Batch.CreateDocumentFromPage( document.Index+i, document.Pages[0] );


  2. I am using R3 version and have added a user type processing stage in my project workflow but now able to see the Script tag in there. So I am not sure where I need to place the code for splitting document.


Leave a Comment