By Jim Hill Download PDF version
Summary: Anyone new to data capture will be faced with an immediate decision of choosing either a fixed or semi-structured approach to extraction of data from a form. What constitutes a fixed form versus semi-structured and what are some guidelines for distinguishing them in ABBYY FlexiCapture?
One day, my office phone rang from someone who saw information about our products on our company web site. As a new sales person tasked with selling document capture, I was used to answering these type of calls; but this time the caller threw in an interesting spin. They needed ballpark pricing right now. They then went on to explain that they needed to extract data from a particular document and export it to a database. I began to ask them about their document and how the information was laid out on the page. Was the information always in the same position on the page, or did it move around from document to document? Were there multiple versions of the document and what was the annual page volume that they expected? These qualification questions were required because of the nature of the product I was going to recommend depended very closely upon the type and location of the data to be extracted from the form. Once I determined that they were most likely looking at a form in which the data moved around from document to document I was able to provide them with the ballpark pricing they required. Without that information, a verbal estimate would have been impossible. Why? As you will learn in this article, the structure of document is extremely important in determining the type of technology used to extract data from within the documents.