Document Capture Software and Content Management System Tips

Whitepaper from UFC

By Joe Hill   Download PDF version

Summary:

ABBYY Recognition Server converts paper or electronic documents into compressed, searchable, archive compliant files.  It also provides the ability to extract text and barcodes from paper or electronic documents and return the results in XML format. This not only includes the textual content that was found but also more detailed information such as the paragraph, line locations and the text formatting.  This information may be used to construct additional applications that perform operations such as document indexing and searching.

This document will show how the ABBYY Recognition Server web service API can be used to convert a document into an archival format.  In addition, it will show how to extract text and barcodes from a document and how to use this information to power a document search application.  This example is shown in the context of Microsoft Visual Studio.NET but the concepts apply to any development and runtime environment that can call a web service such as Java.  Example VB.NET code will be presented for example purposes only.

Whitepaper from UFC

By Jim Hill   Download PDF version

Summary: Anyone new to data capture will be faced with an immediate decision of choosing either a fixed or semi-structured approach to extraction of data from a form. What constitutes a fixed form versus semi-structured and what are some guidelines for distinguishing them in ABBYY FlexiCapture?

One day, my office phone rang from someone who saw information about our products on our company web site. As a new sales person tasked with selling document capture, I was used to answering these type of calls; but this time the caller threw in an interesting spin. They needed ballpark pricing right now. They then went on to explain that they needed to extract data from a particular document and export it to a database. I began to ask them about their document and how the information was laid out on the page. Was the information always in the same position on the page, or did it move around from document to document? Were there multiple versions of the document and what was the annual page volume that they expected? These qualification questions were required because of the nature of the product I was going to recommend depended very closely upon the type and location of the data to be extracted from the form. Once I determined that they were most likely looking at a form in which the data moved around from document to document I was able to provide them with the ballpark pricing they required. Without that information, a verbal estimate would have been impossible. Why? As you will learn in this article, the structure of document is extremely important in determining the type of technology used to extract data from within the documents.

Whitepaper from UFC

By Jim Hill   Download a PDF Version

Introduction to ABBYY FlexiCapture

ABBYY FlexiCapture is a comprehensive forms processing and document capture solution. It includes standard import features for scanning documents, receiving documents from email, processing images from a watched network folder, and manually processing images. Standard export features include output of files to a wide variety of formats, enterprise content management systems, and Microsoft SharePoint. The product is available in two types of architecture depending upon your needs. The standalone system is for a single user processing smaller volumes of documents from a single workstation. The distributed version is the scalable client-server application that runs in Microsoft IIS. The distributed flavor provides for large scale processing of large volumes of documents spread across multiple servers for load balancing and failover in a Microsoft Server cluster.  Within the standalone and distributed installation types are additional considerations including whether there is need to process fixed forms, semi-structured forms, or unstructured forms. The last two types require the use of the Flexilayout Studio product and licensing. Structured forms can be processed by the standard versions of FlexiCapture standalone and distributed. Both versions include a tool called the FormDesigner for the creation of “scannable” paper forms.

UFC Whitepager

White Paper: Data Capture and Document Management Systems - 10 Tips and Information Nuggets That Will Save You Time, Money, and Hair

Customizing Your Document Management System – Be Careful What You Wish For

By Jim Hill

If I had a dollar for every time that I have heard another consultant or myself say to a client – "Try to customize your system as little as possible" I would be a rich man. But if I had another dollar for every time that a client failed to take that advise, then I would be a filthy rich man. On the surface it is easy to say that you will not customize your document management system or any system for that matter. But in reality, no system will match your processes exactly and the clamor of users to go back to processes and functionality that they are used to having will often win out over keeping your document management system non-customized. So what is a systems project manager or IT department head to do?

UFC Whitepager

White Paper: Data Capture and Document Management Systems - 10 Tips and Information Nuggets That Will Save You Time, Money, and Hair

Make Sure to Monitor Server RAM and Hard Disc Fragmentation for Optimal Data Capture and Document Management System Performance
by Joe Hill 


Somewhere in the world today, several IT analysts have less hair because their data capture or document management system slowed to a grinding halt. A critical part of any data capture and/or document management system is handling server performance. For any system that is live and successfully in production, there are some key aspects that need to be monitored to maintain optimal server performance over the lifetime of the system. In the beginning of a data capture and document management system's existence, life is good because system performance is usually at its peak with freshly loaded operating data and relatively low usage of a newly adapted system. However, an inadequately monitored system or worse, a 'set it and forget it' attitude towards the server, will result in a quick fall from that peak and lead to performance degradation. In other words, life will no longer be so good.

UFC Whitepager

White Paper: Data Capture and Document Management Systems - 10 Tips and Information Nuggets That Will Save You Time, Money, and Hair

How to Calculate an ROI for a Document Management System

By Jim Hill

Most people in the business world know that ROI stands for Return on Investment. In its simplest terms it means "what am I going to get in terms of a return for money that I am about to spend". And every business and person, for that matter, wants to know what they are going to get in return for spending money. But sometimes trying to put together a project's ROI can seem like a daunting task. For that reason, we are providing at the end of this article a simple ROI calculator that you can use. It's as simple as going to our website and obtaining it.

UFC Whitepager

White Paper: Data Capture and Document Management Systems - 10 Tips and Information Nuggets That Will Save You Time, Money, and Hair

Choosing the Right Image Resolution – Part 2
by Travis Spangler

In Part 1 of the section entitled Choosing the Right Image Resolution you learned why storage sizing and image processing were important when dealing with image resolution and your scanning operation. In this Part 2 of the same named article – Choosing the Right Image Resolution – you will discover how document transmittal time and scanning time must also be taken into account if you want your image resolution and scanning system to be in sync.

UFC Whitepager

White Paper: Data Capture and Document Management Systems - 10 Tips and Information Nuggets That Will Save You Time, Money, and Hair

Choosing the Right Image Resolution – Part 1
by Travis Spangler


When you think of the word 'resolution' what do you think of? Maybe you think of New Year's resolutions, or maybe you think of resolution in the context of business law or mathematics. Or maybe you think about the British royal navy ships built in the 1600's and 1700's that bore the name of HMS Resolution. However, for the capture and document management world, most of us usually think of resolution in conjunction with image measurement which describes the detail an image holds. Simply stated, a higher resolution means more image detail. But determining how much detail is enough and understanding that there are trade-offs between detail, costs, and system performance are important points to consider when you are dealing with data capture and integrated document management systems.
Typically measured in DPI (Dots per Inch), resolution is a measure of how many pixels are contained in a given inch of linear space. A pixel is a single point in an image composed of varying values of red, green and blue. To put it simply, a pixel is one point of color. For example, a 300x300 DPI image will contain 300 pixels each horizontally and vertically for a total of 90,000 pixels per square inch of image. The higher the pixel density, or DPI, of an image, the more detail that image contains and generally speaking the better it will look. As a side note, it can be easy to confuse DPI with image size, such as the numbers often expressed for computer displays. A 1024x768 computer display is not an expression of its resolution, but rather the total number of pixels in the display.
But enough of the technical image resolution jargon. What does all of this mean to you and your scanning operation? Well, it means that determining the proper image resolution is not straight forward and that there are several things to consider: storage sizing, accuracy of processes such as Optical Character Recognition (OCR), document transmittal time, scanning time, and end user viewing quality. With each of these factors affecting the other it can be difficult to strike a balance.

UFC Whitepager

White Paper: Document Capture and Document Management Systems - 10 Tips and Information Nuggets That Will Save You Time, Money, and Hair

Understanding and Effectively Using Document Indexing In Document Capture Software
By Joe Hill, CTO


All of us, at one time or another have been frustrated looking for documents that we know we filed in a safe place on our computer but could not find them. Or maybe you have experienced the frustration of looking for some document that had data or information in it that you wanted to recover or use again and couldn't because you could not recall where the document, article, or file was located that contained the information. Those frustrations can be largely eliminated through web based scanning and using effective document indexing in a content management system,. But there is more behind the document indexing curtain than one may first imagine and pulling back the curtain can expose opportunities and challenges.

UFC Whitepager

White Paper: Data Capture and Document Management Systems - 10 Tips and Information Nuggets That Will Save You Time, Money, and Hair

When Purchasing a Data Capture System Focus on the Journey Rather Than the Destination

By Joe Hill

Here is an old but good philosophical question that is often asked: What is more important to you – the journey or the destination? Some people after thinking for a while will answer -- the destination, while others, after doing some pondering will answer -- the journey. I'm not sure that either answer is right or wrong, depending on your perspective, but in the data capture world it appears that too many people are worried about the destination rather than the journey and while both are important, I have always thought that concerning yourself more with the journey will lead you to the right destination. The analogy is that many organizations focus on the export of data and its resulting file format (the destination), rather than how that data is generated in the first place (the journey). This can be a dangerous proposition as it can result in too much focus on how the data will be formatted, instead of how accurate that formatted data is in the first place.

UFC Whitepager

White Paper: Data Capture and Document Management Systems - 10 Tips and Information Nuggets That Will Save You Time, Money, and Hair

By: Joe Hill, CTO, UFC, Inc.

When you think about a data capture system, what goes through your mind? Maybe you think about the reduction in paper you will realize, easier access to content, or more space due to fewer filing cabinets. While these may be worthy thoughts, a feature rich and correctly utilized data capture system can be so much more.


Capture More Than Paper
When most people think of data capture they immediately think of capturing paper documents. While capturing paper document and the goodness that comes with that (fewer filing cabinets, less clutter, a 'green' initiative), more and more capturing documents other then paper is just as important. For instance, capturing emails, which have proliferated in recent years, has become a segment of the enterprise content management (ECM) onto itself.

UFC Whitepager

White Paper: Data Capture and Document Management Systems – 10 Tips and Information Nuggets That Will Save You Time, Money, and Hair

Tip 1: Buying a Data Capture or Document Management System on Price Alone Can be Dangerous to the Health of Your Business
by Joe Hill 


Today's economy has everyone, consumers and businesses alike, pinching their pennies; only spending when absolutely necessary. Everyone is watching their cash, regardless of the size of the purchase. And technology products are not immune to the economic downturn or purchasing behaviors of consumers or businesses either. But letting today's down economy drive a long term business decision, such as buying a data capture or document management system, could leave you in an unhealthy position when the economy turns around. For example, you may be tempted to purchase a cheaper document management system now that just meets your present needs with your current volumes but doesn't scale well or have advanced features like workflow. In that case we caution 'Buyer beware'. Because in one to two years you may be stuck trying to explain to management and users why your system can't handle the increase in volume or why you can't make the business more efficient because your system doesn't have advanced web enabled capabilities and business process management functionality.