ABBYY FlexiCapture Web Services API Example

Introduction

ABBYY FlexiCapture solves business challenges by capturing documents, extracting information from them, and using that information to drive business processes. Advanced automatic document classification and learning features along with out-of-the-box solutions for common business challenges like accounts payables have helped to establish ABBYY FlexiCapture as one of the most cutting-edge tools that companies can use to transform their business.

ABBYY FlexiCapture provides built-in tools for capturing documents from a variety of sources such as file servers, e-mail, or even FTP sites. Information may then be sent in a variety of formats to databases, file servers, document management systems, or SharePoint servers. A typical ABBYY FlexiCapture configuration consists of one more servers that collect data from server sources. Rules are established using a workflow-driven system that determines how the documents and processed as well as providing access to those documents that require manual verification. Web-based as well as thick client verification stations are used by the employees who perform these optional verification steps. The data that is collected is then sent to the desired destination systems along with searchable copies of the original document images if desired.

ABBYY FlexiCapture provides a web service API that can be used to submit documents for processing as well as retrieving the results of the processing. It is also able to alter the natural routing of documents through the document processing workflow. The purpose of this document is to present a use case for the ABBYY FlexiCapture web service API in which documents are submitted to ABBYY FlexiCapture from within an application and the results are returned to the same application immediately. A sample ABBYY FlexiCapture project will be used for this demonstration. A TIFF or PDF document will be submitted. Data will be extracted from the sample document and the results will be returned in XML format.

Introducing the ABBYY FlexiCapture Web Service API

The ABBYY FlexiCapture web service API is a Microsoft ASP.NET web service. It is provided with the distributed version of ABBYY FlexiCapture. It is not provided for the standalone version of the product. Some of the features provided are as follows:

  • Access to ABBYY FlexiCapture system information
    • Batch types
    • Projects
    • Document definition names, IDs, versions, and state of enablement
    • Processing stages
    • Users and user roles
    • Workflow information such as queue names and Ids
  • Access to batch information
    • Lists of batches, batches at workflow stages, or single batches retrieved by their ID
    • Ability to participate in ABBYY FlexiCapture workflow through access to batches, tasks, and queues if the workflow has been configured properly
    • Access to properties on each batch such as name, batch type, and properties
    • Ability to change the batch properties
    • Read/write access to attachments for the batch as well as for documents contained
  • Batch submission
    • Submission of batch pages from disk files or I/O streams
    • Creation of documents within batches and the ability to override assembly rules
    • Third party imaging tools image filters could be applied to clean up document images before they are submitted. This may be critical for applications receiving fuzzy images from sources such as mobile phones
  • Ability to delete some objects
    • Batches, documents, and pages
  • User statistics
    • Access to some basic information such as batch, document, and page counts by user
  • Processing Results
    • Access to XML-based index information for documents in workflow Verification queues
    • Access to exported document results
    • Extracted document metadata as text, Microsoft Excel, XML, etc.
    • Searchable versions of the original documents
    • Documents in long term archival formats such as PDF/A
    • Must be setup properly for each document type

Limitations

The ABBYY FlexiCapture web service provides access to XML-based information for documents that are currently in workflow Verification queues as long as the option to enable web stations is enabled for that queue. This information is limited to what would be necessary to display a batch with a set of documents and their corresponding fields. The information for each field is limited to the verification state of the field, the value, the value expressed as text, the page and block information, and a 0/1 mask for each character position in the extracted field value. There is additional information available for each field that is not available in the XML. Additional workflow steps may be created which use scripts to access this information and save to a web service, database, text-based source, or even attach it to the batch as attachments which could then be retrieved using the API.

The ABBYY FlexiCapture web service does provide a technique for updating document indexing field values, blocks, and pages using the FileService handler. That will be demonstrated in a future whitepaper. These operations are performed using HTTP POST operations. It is possible to record field-based training information using this service. This whitepaper demonstrates how to submit documents for processing and how to obtain the full processing results in XML. This is done by establishing an XML export for each of the document definitions that are desired to be processed. This XML does contain a complete set of field-based information versus the subset of information that is available through the use of the FileService handler and a POST request.

The main ABBYY FlexiCapture 11 web service is SOAP-based. A REST architectural style of service is not offered for that version of the product. ABBYY provides access to a WSDL for the web service that describes the available web service calls along with the complex types that they utilize as well as the breakdown of the complex types into simple SOAP data types. SOAP uses XML by definition.

ABBYY FlexiCapture 12 also provides a JSON web service. That will be documented and demonstrated in a subsequent whitepaper. There are only minor differences between the FlexiCapture 11 and FlexiCapture 12 web service APIs.

Novel Usages

The ABBYY FlexiCapture web service can be called from any application that is capable of calling a web service. A complex Java J2EE enterprise application could use the web service to submit documents for processing and even retrieve the document results interactively.

Documents requiring very careful handling due to security concerns including military or medical documents could be submitted to FlexiCapture for processing without having to be copied to a shared file server folder or be transmitted through a potentially insecure email channel. This provides the ability to meet security standards like as those imposed by the US DOD or by laws such as HIPAA which if violated could result in expensive lawsuits, jail time, or for security related documents — wars!

The ABBYY FlexiCapture web service may also be called from within an ABBYY FlexiCapture workflow. This provides an ingenious way to reprocess documents that have been improperly classified. The documents could be forced into a different FlexiCapture project or batch type so that they will be recognized properly. They could also be sent to a completely different ABBYY FlexiCapture server.

Many document and content systems provide built-in triggers, renditioning, or content crawling tools. If these tools can call a web service then they could submit documents to ABBYY FlexiCapture, extract field data from the documents, save the extracted field data, and also return a searchable version of the original document to the repository.

Mobile applications could be created which allowed images from mobile cameras to be processed by ABBYY FlexiCapture. It is possible to return the processing results directly back to the user of the mobile application while they wait. Or to return an acknowledgement that the document has been received and then publish the extracted document information into a cloud-based accounting or other system.

About the ABBYY FlexiCapture Web Service API

The ABBYY FlexiCapture web service API is published under the “Server” virtual directory in Microsoft Internet Information Server (IIS) beneath the top level FlexiCapture 11 or FlexiCapture 12 directory.

The URL for the web service for ABBYY FlexiCapture 11 is: http://localhost/FlexiCapture11/Server/WebServices.dll?Handler=Version3http://localhost/FlexiCapture11/Server/WebServices.dll?Handler=Version3

Substitute the server name as appropriate with the proper server name or DNS name. Likewise, if a site certificate is being used, then swap “http” for “https.” And if the default port has been changed from the standard port, enter that after the server or DNS name following a colon (server:port).

The web service uses the same authentication methods as the ABBYY FlexiCapture thick client tools such as the project setup station or web-based verification station. The remainder of this document will assume that the appropriate security has been established for each user that will be accessing the ABBYY FlexiCapture web service. If Windows authentication is used then generally a domain or computer name will be required to be passed in the login string.

Deploy the Sample Project

This whitepaper demonstrates the use of the ABBYY FlexiCapture web service using a specific example project. The sample project is available for download using a link that is available after you register on this web site. The sample project extracts several fields from the United States Food and Drug Administration 3662 form. Unzip the example project. There will be two folders – FlexiLayout and Project. The “FlexiLayout” folder contains the ABBYY FlexiCapture flexilayout that was used to create the document definition that is used in the example project. The ABBYY FlexiCapture project is located in the “Project” folder then in the subfolder named “FDA3662.” Use the ABBYY FlexiCapture Project Setup Station to open the project file that is named “FDA3662.fcproj.” Upload the FDA3662 project to the ABBYY FlexiCapture application server by using the menu option “Upload Project to Application Server” that is located in the file menu. When prompted to do so, open the newly published project.

Use the ABBYY FlexiCapture Administration and Monitoring Console to establish the desired security for the new project.

This project has been configured so that the processing results will be stored on the FlexiCapture application server. That is done by providing a blank Export root path in the project properties as shown here.

web services api figure 1

Also, the export settings have been established in the “FDA3662” document definition such that no UNC path information has been provided in the export path. If an export path is provided that contains a UNC path then the FlexiCapture server will export directly to that path. And the exported document results will not be directly accessible from the web service. The export format has set to XML with the options shown below. Notice that the XML schema option was selected.

 web services api figure 2

 

web services api figure 3

 

Getting Started with the Web Service API

Obtain the WSDL using the following address:

FlexiCapture 11: http://localhost/FlexiCapture11/Server/WebServices.dll?Handler=Version3WSDLhttp://localhost/FlexiCapture11/Server/WebServices.dll?Handler=Version3WSDL

Either browse the WSDL URL from the desired development environment or else save the WDSL locally to a file from a browser and then ingest that file. For this example, the WSDL was saved locally and a VB.NET wrapper class was created using the Visual Studio.NET wsdl.exe tool. This command may be launched from the VS developer command prompt. The first parameter should be the path to the wsdl file. Then use the /language option to select the language (cs for C Sharp, vb for VB, js for JavaScript, vjs for Visual J#, or cpp for C++). Finally use the /out parameter to specify the full path to the output class module. Use double quotes around directory names in case they have spaces. Example:

Wsdl.exe “C:\Development\ABBYYFCWSDEMO\FlexiCaptureWebService.wsdl” /language:vb /out: “C:\Development\ ABBYYFCWSDEMO \ FlexiCaptureWebService.vb”

Create a new project. This whitepaper demonstrates a VB.NET Windows forms application. Add this file into a new project. The top portion of the file that was added to the new Visual Studio project is shown below. Again, this was created in the VB.NET language so the new project should be a VB.NET project of some type such as a console application or like the example that will be shown here - a Windows forms application.

 

'----------------------------------------------------------------------------------------------

' <auto-generated>
'             This code was generated by a tool.
'             Runtime Version:4.0.30319.42000
'
'             Changes to this file may cause incorrect behavior and will be lost if
'             the code is regenerated.
' </auto-generated>
'----------------------------------------------------------------------------------------------

Option Strict Off
Option Explicit On

Imports System
Imports System.ComponentModel
Imports System.Diagnostics
Imports System.Web.Services
Imports System.Web.Services.Protocols
Imports System.Xml.Serialization

'
'This source code was auto-generated by wsdl, Version=4.6.1055.0.
'

 ' ' ' <remarks/>
<System.CodeDom.Compiler.GeneratedCodeAttribute("wsdl", "4.6.1055.0"),   System.Diagnostics.DebuggerStepThroughAttribute(),
 System.ComponentModel.DesignerCategoryAttribute("code"),
 System.Web.Services.WebServiceBindingAttribute(Name:="FlexiCaptureWebServiceSoap", [Namespace]:="urn:http://www.abbyy.com/FlexiCapture"),

 

Steps to Submit a Document to the ABBYY FlexiCapture Web Service API

For ease of example, add a button to the new Windows Forms application main form. The following sections will guide you through the steps that are necessary to submit a document for processing and to return the results. The steps shown could be implemented in a class or other module and do not present every option that is available with the web service API. The structure names such as “InputFile” refer to the definitions that were supplied by the web service through the WSDL which was consumed in the project. This example has been kept as simple as possible.

The steps that will be demonstrated are as follows:

a.      Instantiate the web service and connect to the server

b.      Find the project ID for the desired project

c.      Create a new batch and load a file from disk

d.      Submit the batch

e.      Wait for the batch to be processed

f.       Access the XML results

g.      Interpreting the results

h.      Example utilization of the results

The example assumes this import:

 Imports System.IO

 

More details are shown for each step below.

Detailed Steps With Sample Code

a. Instantiate the web service and connect to the server

The OpenSession method requires a role type and a workstation type. For purposes of this example a predefined role type of 12 is used which means “User Station Operator.” And a station type of 10 which is used to represent an external user station. The predefined role and workstation types are documented in the ABBYY FlexiCapture developer’s help. The URL will vary based on the version of FlexiCapture and the type of authentication used (if using FlexiCapture 12).

FlexiCapture 11: http://localhost/FlexiCapture11/Server/WebServices.dll?Handler=Version3http://localhost/FlexiCapture11/Server/WebServices.dll?Handler=Version3

The host name, domain, username, and password values should be changed.

 

Dim objWebService As New FlexiCaptureWebServiceApiVersion3

Dim objCredentials As System.Net.ICredentials = Nothing
Dim iSessionID As Integer = 0

objWebService.Url =                                  "http://localhost/FlexiCapture11/Server/WebServices.dll?Handler=Version3"
objCredentials = New System.Net.NetworkCredential("domain\username", "password")
objWebService.Credentials = objCredentials
iSessionID = objWebService.OpenSession(12, 10)

b. Find the project ID for the desired project

To submit a document for processing, the numeric ID number of the desired FlexiCapture project must be found. This will be saved to an integer variable. The web service will be called in order to fetch the full set of projects that has been installed in the server. Then the proper project will be located based on looking for the name “FDA3662” for this example.

Dim objProjects() As Project = {}

Dim iProjectID As Integer
objProjects = objWebService.GetProjects()
For Each objProject In objProjects
    If objProject.Name = "FDA3662" Then
        iProjectID = objProject.Id
        Exit For
     End If
Next

c. Create a new batch and load a file from disk

The path to the file should be provided to replace “path to file.” This is the image or other file that is desired to be processed and it must include the full path to the file. For the sample to work either use the “FilledOutSample.tif file or create a new sample by filling out the FDA 3662 form. The example shown here attaches the file to the batch. The web service API does allow a document to be created and the file to be attached to the document. For purposes of example the document assembly rules that are in place in the project will be used.

The batch type for the new batch is left as the default. The web service API provides a method named “GetBatchTypes” that can be used to fetch the complete set of batch types (name and ID) that are available. If the batch type on the batch needs to be changed, set the BatchTypeId property on the batch to the desired batch type ID that was obtained by looking for the name in the set of available batch types.

The name of the batch is being changed from the default to the name of the document being submitted in this example for demonstration purposes.

 

Dim objBatch As New  Batch
Dim objFCFile As New File
objBatch.Name = System.IO.Path.GetFileName("path to file")
Using stream As  FileStream = System.IO.File.Open("path to file", FileMode.Open)
   objFCFile.Name = "path to file"
   objFCFile.Bytes = New Byte(stream.Length - 1) {}
   stream.Read(objFCFile.Bytes, 0, objFCFile.Bytes.Length)
   stream.Close()
End Using
Call objWebService.AddNewImage(iSessionID, objBatch.Id, objFCFile)

d. Submit the batch

The OwnerID property on the batch is forced to zero so that the batch is not owned by a single user and therefore hidden from other users. Notice that batch is updated on the server after the property value has been changed locally due to the nature of SOAP web services. This is a necessary step to perform before the batch is first closed, then submitted for processing. Each of these steps is important. If the OwnerID property is not updated on the server for the batch then the batch will be hidden from the users. If the batch is not closed before it is submitted then it will be stuck locked in a “scanning” state and will not be processed until it has been forced to the next state in the workflow.

 

The session – the connection to the server is still alive at this point. And it will be left open so that the results of the processing may be retrieved.

objBatch.OwnerId = 0

objWebService.UpdateBatch(iSessionID, objBatch)
objWebService.CloseBatch(iSessionID, objBatch.Id)
objWebService.ProcessBatch(iSessionID, objBatch.Id)

e. Wait for the batch to be processed

The batch will be processed according to the steps that have been built in the workflow. The sample project has been setup so that there are no workflow steps enabled between the document recognition and the export step (see the image below). The example code below demonstrates how to wait for the batch to reach the stage named “Export.”

web services api figure 4

There could be situations that occur where the batch does not reach the export stage in a reasonable amount of time so that should be considered. Production program code would never be written to create a potential infinite loop. This is code meant as an example only.

There are web service calls available to find the list of batches that are waiting at each stage. A production process would most likely have a separate process that watches for batches to be processed and then pick up the processing results. But this example was intended to be an interactive sample with a wait taking place for the results to be returned.

The processing stage list is fetched once and the external ID number for the Export stage is found. Then a simple wait is done for the batch to reach the stage by refreshing the batch repeatedly and comparing the external ID number of the Export stage to the current external stage ID on the batch.

Dim objProcessStages As ProcessingStage()
Dim objStage As ProcessingStage
Dim iStageExternalId As Integer
objProcessStages = objWebService.GetProcessingStages(iProjectID, 0, 0, "")
For Each objStage In objProcessStages
   If objStage.Name = "Export" Then
     iStageExternalId = objStage.ExternalId
     Exit For
   End If
Next
objBatch = objWebService.GetBatch(objBatch.Id)
While objBatch.StageExternalId <> iStageExternalId
   Threading.Thread.Sleep(1000)
   Application.DoEvents()
   objBatch = objWebService.GetBatch(objBatch.Id)
End While

f. Access the XML results

The web service provides access to the results that were produced for each document. For this example, only the results for the first document will be retrieved. The example code provides the information for extending this retrieval for each document. The results of the processing in XML format are returned to a file in the temporary directory. The filename is saved to the string strXMLFileName. The XML schema is also returned as an XSD file in the temp directory. Because these files are returned as streams it is not necessary to write this files to disk. They could be utilized directly in memory.

 Dim objDocs() As  Document

Dim strDocFileNames() As  String
Dim strXMLFileName As  String
Dim strXSDFileName As  String
Dim objMemStream As MemoryStream
Dim objFile As  File
objDocs = objWebService.GetDocuments(objBatch.Id)
For iDoc As Integer = 0 To UBound(objDocs)
      strDocFileNames = objWebService.GetDocumentResultsList(objDocs(iDoc).Id)
      For iExportDoc As Integer = 0 To UBound(strDocFileNames)
            Select Case System.IO.Path.GetExtension(strDocFileNames(iExportDoc)).ToUpper
                   Case ".XML"
                            objFile = objWebService.LoadDocumentResult(objDocs(iDoc).Id, strDocFileNames(iExportDoc))
                            objMemStream = New MemoryStream(objFile.Bytes)
                            strXMLFileName = System.IO.Path.GetTempFileName + ".xml"
                            Dim oFS As New FileStream(strXMLFileName, FileMode.Create)
                            objMemStream.WriteTo(oFS)
                            oFS.Close()
                            objMemStream.Close()
                   Case ".XSD"
                            objFile = objWebService.LoadDocumentResult(objDocs(iDoc).Id, strDocFileNames(iExportDoc))
                            objMemStream = New MemoryStream(objFile.Bytes)
                            strXSDFileName = System.IO.Path.GetTempFileName + ".xsd"
                            Dim oFS As New FileStream(strXSDFileName, FileMode.Create)
                            objMemStream.WriteTo(oFS)
                            oFS.Close()
                            objMemStream.Close()
            End Select
      Next
      Exit For
Next
objWebService.CloseSession(iSessionID)

g. Interpreting the results

The XML schema that was obtained from the processing should be used to interpret the results of the processing and to deserialize the XML results. The results of processing the sample document are shown here. The top-level node is “Documents.” This is followed by one child for each document (in this case a single document). The document type is “FDA3662” and the values returned are shown. The suspicious symbols references provide a means for identifying data that was not recognized with perfect confidence. Each column shows “0” or ‘1” with “0” meaning perfect confidence and “1” meaning that the character in that position of the result was questionable and should be verified. The position information is returned for each field that was returned. That includes the index of the page, the top, bottom, left, and right pixel coordinates.

 

<form:Documents xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:addData="http://www.abbyy.com/FlexiCapture/Schemas/Export/AdditionalFormData.xsd" xmlns:form="http://www.abbyy.com/FlexiCapture/Schemas/Export/FormData.xsd"><_FDA3662:_FDA3662 xmlns:_FDA3662="http://www.abbyy.com/FlexiCapture/Schemas/Export/FDA3662.xsd"><_Document_Section_1  addData:BlockRef="0001"><_Accession addData:BlockRef="0002"                         addData:SuspiciousSymbols="000000">123109</_Accession><_Manufacturer addData:BlockRef="0003" addData:SuspiciousSymbols="0000000000000000000000000000000100000000000000000000000000000000000000" addData:RecognizedValue="Jones Manufacturing Corporation. Inc 123 Main Street Anytown, FL 32966">Jones Manufacturing Corporation. Inc 123 Main Street Anytown, FL 32966</_Manufacturer><_Submitter addData:BlockRef="0004" addData:SuspiciousSymbols="000000000000011000000000000000000000000000000000000" addData:RecognizedValue="Joseph Jones 126 East Main Street Anytown, FL 32966">Joseph Jones 126 East Main Street Anytown, FL 32966</_Submitter><_EmailAddress addData:BlockRef="0005" addData:SuspiciousSymbols="0001000100">This email address is being protected from spambots. You need JavaScript enabled to view it.</_EmailAddress><_Official addData:BlockRef="0006" addData:SuspiciousSymbols="0000000000">Big cheese</_Official><_TestingAddress addData:BlockRef="0007" addData:SuspiciousSymbols="00000000000000000000000000000001000000000000000000000000000000000000000000000000" addData:RecognizedValue="Jones Manufacturing Corporation. Inc. 12 West Industrial Drive Anytown, FL 32966">Jones Manufacturing Corporation. Inc. 12 West Industrial Drive Anytown, FL 32966</_TestingAddress></_Document_Section_1><addData:AdditionalInfo><addData:BlocksInfo><addData:Blocks Id="0001"><addData:Block PageIndex="1"><addData:Rect Bottom="3300" Top="0" Right="2550" Left="0"/></addData:Block></addData:Blocks><addData:Blocks Id="0002"><addData:Block PageIndex="1"><addData:Rect Bottom="732" Top="700" Right="899" Left="779"/></addData:Block></addData:Blocks><addData:Blocks Id="0003"><addData:Block PageIndex="1"><addData:Rect Bottom="1124" Top="946" Right="1161" Left="521"/></addData:Block></addData:Blocks><addData:Blocks Id="0004"><addData:Block PageIndex="1"><addData:Rect Bottom="1459" Top="1283" Right="874" Left="523"/></addData:Block></addData:Blocks><addData:Blocks Id="0005"><addData:Block PageIndex="1"><addData:Rect Bottom="1636" Top="1600" Right="809" Left="522"/></addData:Block></addData:Blocks><addData:Blocks Id="0006"><addData:Block PageIndex="1"><addData:Rect Bottom="1816" Top="1782" Right="705" Left="522"/></addData:Block></addData:Blocks><addData:Blocks Id="0007"><addData:Block PageIndex="1"><addData:Rect Bottom="2297" Top="2122" Right="1164" Left="520"/></addData:Block></addData:Blocks></addData:BlocksInfo></addData:AdditionalInfo></_FDA3662:_FDA3662></form:Documents>

h. Example Utilization of the results

The picture below shows the results of a program that was written in Visual Studio to interpret and then display the results of the processing. Any questionable characters are highlighted in black and red. The XML results provide every piece of information that is necessary in order to utilize this results in a fashion such as this:

web services api figure 5

Conclusion

The ABBYY FlexiCapture web service API may be used to process documents and return the results immediately with a few lines of code.

How User Friendly Consulting Can Help

Contact us regarding purchasing ABBYY FlexiCapture, watching a live demo, or even obtaining a copy of the software for testing on your own server.

We provide consulting services for ABBYY FlexiCapture, including both the installation and administration of the product well as assisting with or performing development using the web service API. Please reach out to us if you would like to have us solve your document conversion or data extraction challenge or just for help with properly configuring your existing ABBYY FlexiCapture system. We provide a wide range of consulting options including ABBYY trained and certified personnel as well as a wide range of training options to get your employees up to speed on the product very quickly. We also distribute other ABBYY products such as ABBYY Recognition Server that are geared towards bulk document conversion or language translation.

Information about the Author
Joe Hill
About Me
Joe is the chief technologist for UFC, Inc. He guides the decisions on which products UFC offers as well as research on new software applications under the Jovation and MuWave trademarks. Joe began his career at the former Michigan Bell, now AT&T. He earned a bachelor of science in computer science engineering at Western Michigan University. Joe's personal interests in volunteering as an emergency medical technician, the volunteer fire rescue service, and leading worship in his church.
Some of My Other Articles

Attachments:
FileDescriptionFile size
Download this file (White-Paper-ABBYY FlexiCapture Web Service API Example.pdf)White-Paper-ABBYY FlexiCapture Web Service API Example.pdf 499 kB
Pin it