ABBYY FineReader Server – How to Perform a Document Repository Audit

Learn how to perform a document repository audit with ABBYY FineReader Server. Find and calculate files that need to be recognized, converted, and/or changed to a supported format.

Hello. Today I’d like to show you a new feature in ABBYY FineReader Server, which allows us to do what we call a document repository audit. This audit allows us to find and calculate files that need to be recognized, files that need to be converted, or even files that are in an unsupported format so that we can understand the amount of page count that may go into processing these files or a given directory. So the next thing we will do is create a new repository audit workflow, and that’s by clicking this button here. I can, of course, give it a workflow name and I will point it at a directory. Or I can even point it at a FTP folder, a secured FTP folder or SharePoint. But for today’s demo, we’ll keep it simple. We’re going to point it to a given directory and we have the ability for the software to locate duplicate files as well. So I’ll go ahead and turn that on. Next thing I will do is click “Okay”. And at this point, if we want to process and run this audit workflow, all we need to do is right click and go to “Start”. The software will then start its processing and auditing of this directory.

Now once completed, you’ll see that the audit details has a status of “Finished”, and we can see the statistics of this specific folder. We can see which files need OCR conversion. We can see which ones are images versus PDFs, which ones have a text layer on them already that we would consider of good quality. We can see the different formats of documents that we want to extract. And we can see the other details here. Sometimes it’s also nice to understand if there are duplicate files that could potentially be processed throughout this run. And what we’ll do is we’ll open up our report here and you can see a duplicate files report. So the software will come through and document which files could potentially be reported as duplicates. So at this point, we have access to all of our auditing, all of our duplicates’ reports, and we have the ability to really analyze a repository, to see the impact on pages, the impact of the number of files. And we can do all sorts of things. We could understand the impact that that may have on licensing or that may have on the amount of time it could take to process these given formats. I hope you enjoyed this video. Please reach out to us if you have any questions. Thank you.


Travis Spangler

Travis writes articles dealing with various technical aspects of document capture and forms processing. He is fluent in Microsoft.NET and holds several certifications including ABBYY FlexiCapture and IRISXTract. As general manager and sales director, he controls the daily operations as well as manages customer accounts to ensure both customers and prospects are receiving the very best from UFC, Inc. Travis has many years of experience with document capture software and content management systems. He also has wide areas of expertise including custom functions in ABBYY FlexiCapture, email API's, Microsoft SQL Server Reporting Services, and many other applications and platforms. He has integrated Amazon Web Services EC2 instances with several applications including the company's CRM system.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.