icon phone 248.447.0100  email sales@ufcinc.com 

Glossary of Document Capture and Computer Programming Terms

Document Capture and Content Management Terms

Terms about document capture, enterprise content management and forms processing software.


0-9   A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

ABBYY FlexiCapture (FlexiCapture)

ABBYY Flexicapture is a forms processing and document capture system produced by ABBYY, USA. The system includes support for fixed forms, semi-structured forms, and unstructured forms. The latter two categories are managed through the creation of something called a Flexilyout template which is then imported into the tool called the document definition editor. The document definition editor runs inside the administration console and the document definition editor is the only tool required to configure fixed form processing.



Active X is a programming framework which is used to create reusable software objects in such a way as to be independent of the programming language. They are often used to deploy web components through the use of an installation action in Microsoft Internet Explorer.




An application programming interface (API) is a source code-based specification intended to be used as an interface by software components to communicate with each other. An API may include specifications for routines, data structures, object classes, and variables. An API specification can take many forms, including an International Standard such as POSIX or vendor documentation such as the Microsoft Windows API, or the libraries of a programming language, e.g. Standard Template Library in C++ or Java API. An API differs from an ABI (Application Binary Interface) in that the former is source code based while the latter is a binary interface.




ASCII stands for the American Standard Code for Information Interchange. It is a character-encoding scheme originally based on the English alphabet where ASCII codes represent text in computers, communications equipment, and other devices that use text. Most modern character-encoding schemes are based on ASCII, though they support many additional characters.



A batch is a term used in the document capture field. It is the highest level “container” of a group of documents being scanned or imported. It is the parent category to images, documents, and indexes. The properties of a batch may include: 1. Batch profile and batch name, 2. Scan date and time, and 3. Scan user name, logon ID, or Email address.


Batch Separator Sheet

A batch separator sheet is a printed form designed to provide a form family a fast and accurate method of entering control information for a given batch of forms. In addition to human readable printed information about the batch category it frequently contains a bar code which can be recognized during the scanning process. This greatly simplifies scanning because these pre-printed separator sheets can be added as the stack of documents are being prepared for scanning and then multiple batches can be completed at one time.

You can print your own batch separator pages online for free using UFC's MuWave Barcode printing application. Visit our barcode printer application online at: Print Barcode Pages Online  We also market a commercial version of this excellent product. Read more about MuWave Barcode Generator.

Be patient, the server on which this product is running is pokey slow. Sorry about that, it's a free service.  

Commit Phase

The last phase of batch processing prior to data output, where output files (e.g., TXT, CSV, XML, PDF) and archive images are written to the appropriate directories, databases, or ECM system.

Document Capture Workflow

At a higher level, document capture is the method of obtaining the electronic document classification, data capture and processing for entry into ECM, ERP, accounting or other back-end systems. A document capture workflow allows document processing to be automated from the moment paper enters an organization. Quillix provides a graphical capture workflow design tool which provides an intuitive method for designing capture workflows.

Find out more about how UFC can deploy a cost effective and powerful capture solution for you by visiting our web page: Quillix Web Capture

Enterprise Content Management (ECM)

Enterprise Content Management (ECM) is a formalized means of organizing and storing an organization's documents, and other content, that relate to the organization's processes. The term encompasses strategies, methods, and tools used throughout the lifecycle of the content.

ECM includes much more than just storing a document. It includes all of the properties and methods necessary to index, provide version control, retrieve, add annotations / comments, provide security access controls, audit trails for documents, and a myriad of other things.  


Enterprise Resource Planing (ERP)

Enterprise Resource Planning systems "integrate internal and external management information across an entire organization, embracing finance/accounting, manufacturing, sales and service, customer relationship management, etc. ERP systems automate this activity with an integrated software application. Their purpose is to facilitate the flow of information between all business functions inside the boundaries of the organization and manage the connections to outside stakeholders. ERP systems can run on a variety of computer hardware and network configurations, typically employing a database as a repository for information." [Definition courtesy of AIIMAIIM]


Fixed Form

A fixed form is a document in which the data fields are located in the same place from page to page. This makes it possible to configure a template consisting of X and Y coordinates, width, and height. OCR or ICR software can then be used to extract information from each of the zones defined on the template. 



An image is a logical container for files. It is a parent to image files and universal files. An image consists of file consisting of image 1, image 2, etc. Properties of an image include filename, image type, Has Image, Has Universal, Original Filename. A universal file is something other than an image format file. Image format files are files ending with identifiers such as TIFF, JPG, or GIF.

Image over Text

Image over text is a technical term for a specific format of electronic document generally associated with the PDF specification. An image over text PDF is a clever method of imbedding searchable text behind the scanned image of a document. This handy type of PDF document is created by first scanning the document, then running it through and OCR engine. Next a mapping is created for each word from the OCR text to the zone from which the text was located on the scanned image. As a result when the PDF document is displayed it can be searched for words and phrases. When a search term is located within a PDF viewer such as Adobe Reader, the location of the search term within the document can be display. Perhaps one of the most useful attributes of the image over text PDF is that the textual data from within the document can be added to the index of an enterprise document system or content search engine. This makes available all of the text from within the scanned image available for searching by users trying to locate a document.



An index is what is known as metadata, meaning “data which describes data.” An index can exist as multiple levels such as a document index or batch index. The properties of an index can include attributes such as: Name, Value, Source (OCR, Typed, Barcode, etc.), Location (Page, X, Y, Length, Width), and Barcode Type.

Intelligent Character Recognition (ICR)

ICR is an acronym which stands for Intelligent character recognition. It is a handwriting recognition system that allows fonts and different styles of handwriting to be learned by a computer in order to generate a textual value of a scanned section of handwritten text. ICR is most frequently used to decode predefined areas on fixed forms. ICR is not frequently applied to decode an entire page of handwritten text, and almost never applied to analyze a page of mixed machine printed and handwritten text. 



Lookup is a utility available within the Quillix Web scanning client which runs in Internet Explorer. MuWave Lookup from UFC is a tool used inside of the Index tab that helps the user to index information faster. MuWave Lookup can be configured so that you can enter in one index value from a page (such as an invoice number) and all the other index values will be automatically filled in with information from a database (such as company name, invoice total, etc.).


Mark Sense

Data confined to one or more selections in a series, as in a survey. The data is selected by checking a box or filling in a bubble. For example, a survey may include gender information where a respondent fills in the bubble next to an 'M' or 'F' on the survey to indicate his or her gender.

A related term is OMR or optical mark recognition, which is the software used to extract values from this type of marking. 

example of a mark sense marking type

Illustration: Example of Mark Sense Marking Type 


Metadata is technically, "data about data." Metadata in the capture world is a transaction, system, and document data captured during scanning and passed to an data capture solution for further processing, including the document set, batch number, operator ID, bar code(s) and more.


MuWave, pronounced "mew (like the sound a cat makes) + wave, is a trademarked line of products developed by UFC, Inc. The current product line is arranged primarily around the Quillix software suite but also includes several stand alone products. One of the newest products is an OCR product which offers non-per page pricing.


MuWave Script

MuWave Script is a VB scripting language built into MuWave QSX modules. This functionality was developed in order to extend the capability of the MuWave QSX products without the need to customize each product for unique customer needs. MuWave Script allows a programmer to write straight-forward script commands to perform system operations inside the QSX module. This method of providing built-in customization capability is unique to UFC's QSX modules.


Optical Character Recognition (OCR)

OCR is an acronym for optical character recognition. It is a computer system for the translation of scanned images of typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping system in an office, or to publish the text on a website. OCR makes it possible to edit the text, search for a word or phrase, store it more compactly, display or print a copy free of scanning artifacts, and apply techniques such as machine translation, text-to-speech and text mining to it. OCR is a field of research in pattern recognition, artificial intelligence and computer vision. Source: Adapted from Wikipedia, 3-3-2012.

Optical Mark Recognition (OMR)

Optical Mark Recognition. The process of data selection from a list of options on a document, based on the presence or absence of a mark next to item(s) on that list. See also "mark sense."


In the document capture world a page refers to a single image from either a scan or an electronic document. If an electronic document such as a Word doc is printed each of the images becomes a logical "page" when converted to an image. There can be interesting ramifications of this for example when an Excel spreadsheet isn't properly formatted for printing. When a robot in the Quillix system prints the spreadsheet it uses an Office API call to complete the process. So if the margins and page size are set incorrectly the printed result may not be very desirable! There are really no good work-arounds for this problem so be sure when you check these electronic documents into a document management system you properly format them for printing!

Patch Code

A parallel pattern of alternating black bars separated by spaces and placed near the leading edge of a paper document. Sometimes used to separate documents and batches or to perform identification. Patch codes are a set of 6 distinct barcode patterns (1, 2, 3, 4, 6 and T) that are typically used as document separators when scanning.



Quillix is a document capture system produced by Prevalent Software of Colorado Springs, Colorado. This  document capture system is an application that works to electronically "grab" documents from a variety of sources, in both printed and electronic formats. It is a "distributed" capture process management system meaning that it allows the capture of documents from non-centralized locations through the Internet. The servers themselves are then centrally located, although there may be a web servers deployed at each scanning location in order to increase the speed in which the documents are processed.


Quillix Server Extension (QSX)

QSX is short for Quillix Server Extension and was coined by Prevalent Software as a means of adding plug-in capability for their Quillix Web Capture product. MuWave QSX modules are those offered by UFC, Inc. under the MuWave brand. All QSX modules are developed to run on the Windows server platform and add substantial functionality to Quillix. All of the MuWave QSX modules include a set of common features including MuWave Script and built-in support for the MuWave Reports product.

UFC provides a wide variety of QSX modules:


Remote Capture

A means by which an organization can scan documents remotely, either from branch offices around the world or simply from another floor. The scanned images re transmitted via secure Internet connection for data capture processing at a centralized location, such as corporate headquarters. Advantages include eliminating the delay and expense of postage and faxing. Quillix is a remote capture system.

Semi-Structured Document

Semi-structured documents contain common data elements but the data has a different location, from document to document. For example, nearly every invoice contains data such as a P.O. number and an invoice total, but it is in a different location on each invoice depending on the vendor. Because of these location differences, it is not feasible to use templates to capture data from semi-structured documents.

Structured Document

Strucutred documents are standardized forms that come in the exact same format or layout every time. Examples of structured documents include credit applications, surveys, and order forms. The data to be captured is always located in the same place on the form. To eliminate the need to manually enter the data from structured forms, a template is created to define each of the individual data fields to capture, like name, address or Social Security Number. Document capture or forms processing software can then capture that information at the same location every time.

Two Way Match of AP Invoice and Purchase Order

Two way matching is a process is done by an AP clerk or analyst to match the invoice being paid (a bill) with the goods originally ordered (a purchase order). In the most typical case a there is a purchase order that has already been created by the ordering company. Then an invoice was sent by the supplying vendor and received by the first company to be paid. When the AP clerk begins the process from a scanned invoice they have an image of the invoice say their right hand computer screen, and the purchase order from their ERP or accounting system on their left. They go into the PO and they evaluate all of the line items on the invoice for possible payment including quantity, unit price, and total. If the line items on the invoice match the line items on the purchase order, and they have the receving document receipt showing all of these items were actually received (three way match), then invoice can posted and paid - meaning the bill can be created and posted in the accounting or ERP system. Frequently an automated system like ABBYY FlexiCapture is used to extract data from the invoice and provide the user with an easy to use comparison display showing values in conflict or the results of other validation or workflow rules determined during the configuration process of the AP invoice processing software. 

Unstructured Document

Unstructured documents are forms and documents where the desired data can be located in varying positions on the page of the same document type. An example of unstructured documents is and an EOB (Explanation of Benefits) document.

Display #