Watch our video to learn how to perform a document repository audit with ABBYY FineReader Server. Find and calculate files that need to be recognized, converted, and/or changed to a supported format.
Hello. Today I’d like to show you a new feature in ABBYY FineReader Server, which allows us to do what we call a document repository audit. This audit allows us to find and calculate files that need to be recognized, files that need to be converted, or even files that are in an unsupported format so that we can understand the amount of page count that may go into processing these files or a given directory. So the next thing we will do is create a new repository audit workflow, and that’s by clicking this button here. I can, of course, give it a workflow name and I will point it at a directory. Or I can even point it at a FTP folder, a secured FTP folder or SharePoint. But for today’s demo, we’ll keep it simple. We’re going to point it to a given directory and we have the ability for the software to locate duplicate files as well. So I’ll go ahead and turn that on. Next thing I will do is click “Okay”. And at this point, if we want to process and run this audit workflow, all we need to do is right click and go to “Start”. The software will then start its processing and auditing of this directory.
Now once completed, you’ll see that the audit details has a status of “Finished”, and we can see the statistics of this specific folder. We can see which files need OCR conversion. We can see which ones are images versus PDFs, which ones have a text layer on them already that we would consider of good quality. We can see the different formats of documents that we want to extract. And we can see the other details here. Sometimes it’s also nice to understand if there are duplicate files that could potentially be processed throughout this run. And what we’ll do is we’ll open up our report here and you can see a duplicate files report. So the software will come through and document which files could potentially be reported as duplicates. So at this point, we have access to all of our auditing, all of our duplicates’ reports, and we have the ability to really analyze a repository, to see the impact on pages, the impact of the number of files. And we can do all sorts of things. We could understand the impact that that may have on licensing or that may have on the amount of time it could take to process these given formats. I hope you enjoyed this video. Please reach out to us if you have any questions. Thank you.
Watch our video to learn how to use the ABBYY FineReader Server Document Conversion Service to send, perform, and receive the OCR output of documents.
Hello. Today I’d like to show you the FineReader Server Document Conversion Service. This is a really neat tool so that end users have the ability to use ABBYY FineReader to drag and drop documents, get the OCR performed on a document and get the output result. So as you can see I’ve navigated to our Document Conversion Service and I’m going to simply drag and drop some documents into the service window. The software’s going to show me those documents and then I can click next. The next thing it will do is it will allow me to choose output settings and you can see we have the available output settings that you can document in the software and of course any recognition language that we have available within the software. The next step is we will get our results and when we get our results, the software will simply process those asynchronously and as we process those, the software will report back which ones are completed.
Now once completed, you’ll see that we have our file names, we have them in the format or formats that we’ve requested. And of course I can download the sample file here just to kind of show you that we’ve performed an OCR on that given sample. So there’s our final product there. This is a really awesome way for us to allow end users to access what has previously been known as kind of a server based OCR, but this provides the end user to do it on demand.
Now the one administrative thing I will discuss in today’s video is that the software uses by default a workflow that is set up within the software. So in my demo today we’ve used the default workflow, which if I open up the administrative console, you’ll see I have a default workflow. If you would prefer to use another default workflow or another workflow, generally you can control those settings using the program files. And you can see that here I’m actually going to navigate to our settings file and I can open this file and you’ll notice here that we have different settings that we can control. So that the site obeys whichever workflow in whichever formats we want to allow. So administratively, that might be nice if we want to allow only certain file formats or types of documents to be uploaded and recognized against using this Document Conversion Service. I hope you’ve enjoyed today’s demo. If you have any questions, please reach out to us. Thank you so much.
Watch our short video in order to learn how to make non-searchable documents searchable with ABBYY FineReader Server.
Hello. Today I’d like to show you our ABBYY FineReader Server product. Now this product is a fantastic solution if you’re looking to make non-searchable documents into searchable documents and perhaps maybe even you have a variety of different formats that you’re dealing with as well. Now as you can see in front of us, the FineReader Server product comes with an administration console, which is where we administer how documents are coming in, what happens during process, and also what happens as they’re exported. Everything that happens here is done via a workflow and in essence this workflow is really where the nitty gritty of the solution takes place. So I’m going to show you an example workflow and there are seven tabs. There’s a general tab, which is where of course we put a name to the workflow. Also we determine the input source.
So is it a folder? Are we grabbing information from emails? Or is there something like a SharePoint system where we need to grab them from a document repository. But in today’s demo, I’m just going to quickly show you how we use a traditional hot folder to grab documents from. The input tab tells us where to grab those documents and in this case I’m using a shared folder, but we also can use FTP in this sense and although there are a lot of other settings here, I won’t necessarily go into the specifics with, I want to give you confidence that this software has been used to process billions of pages, so there are a lot of different settings you can use to kind of tweak it and tune it to work in your environment here. The next step is processing and this is where we determine the languages and also the OCR quality that we expect to receive out of the application.
The next step is how do we separate a document? Do we keep individual files? Do we want to use bar codes? Do we want to be able to use blank pages as a separator? There are a lot of different ways that organizations separate documents, but you can see here we have the ability to handle a variety of those different methods. Quality control tells us when and if we want a human involved in the process as part of a traditional OCR process. It’s important that we understand there’s a confidence level that the software provides and if for some reason there is low confidence for a given character or given document, we may want a human or a staff member to review the OCR results and therefore, any low confidence information is corrected before it’s exported into its final format. And so this tab on the quality control gives us the ability to set some of those settings here.
The indexing tab gives us the ability to set up different document types and therefore different fields for each document type. It also allows for a human to interact with the software and provide the ability to manually index a document. And then lastly we have an output tab. This output tab tells us what kind of format we want to standardize as the output. And you can see here I’m exporting a document as both a PDF and I’m exporting the text to a text file as well. We support a lot of different output formats. You control all of them and actually each of them have their own settings and destination locations that you can customize. So I won’t go into each one of those individually. But understand, once again being used over billions of pages, there’s a lot of ways that you can customize and make sure you’re getting the proper tuning and output format that you’re expecting.
So as you can see from a workflow perspective, it’s a very, very easy process for us. We set up these tabs, we’ve set up our settings, we hit okay and we move on. And actually what we’re going to do is show you this specific workflow in action. So I’m going to provide some documents to an input folder. Remember the input folder is where we’re told the software to grab the documents that we want OCR performed on. So I have some documents that are on my clipboard. I’m just going to drop them into our input folder and the software is going to monitor. As you see it just did that. It grabbed those documents and now it’s performing OCR on them. And now if we go look at an output folder, you will see that we will have both our PDF versions of these as well as a text version.
So the original documents that I sent in for OCR were JPEG formats, but now we are sending them out as PDFs and I also have access to all of the text on the documents. So now I can refer and use a control F to find different information on these documents. And just for fun, I did have the software provide the text file of each document so that you can rest assured that we’re extracting all the information on each file as well and grabbing that text. So ABBYY FineReader Server is a fantastic solution in this bulk OCR scenario where we want to make a non-searchable document or documents searchable and therefore be able to use them in downstream systems. Hope you enjoyed this video. If you have any questions, feel free to reach out to us. Thank you so much!
Watch this video on how to configure ABBYY FineReader Server (formerly Recognition Server) to submit documents to SharePoint Online. The process is simple, efficient, and flexible!
Hello, in this video I am going to show you how ABBY recognition server integrates with SharePoint online. Now this is a very neat integration. It’s very quick and it obtains high quality OCR results for us. What you see on the left here is what we call an input folder, or a hot folder. This is where we’re gonna simply drag and drop folders that we want to be OCRed, and then it put, here on the right, into SharePoint online. Now the cool part that I’m going to show you, is that I’m going to be dragging and dropping what we call TIF Image files, into the hot folder. In the process of OCRing them, we will also convert them to a searchable PDF. And that document will be stored in SharePoint as a PDF, also searchable, so that we can find the content at a later time.
So all I’m going to do is copy and paste some files into the input folder. They will not stay there long. You can see now they are already gone. What the software is doing right now, is converting those two pdf files, and once again making them searchable for us. So, if we go over here and we refresh our SharePoint Online site, you can see that I now have three PDF files. In fact, if I maximize this here, you will see I have those files right here, and I can simply click on them. And if I zoom in here just a little bit, you can see that I can highlight the text. Meaning that we do have searchable content and that SharehPoint will be able to crawl and index that content for us.
So, it’s really that simple. All I did is drag and drop into a hot folder and now they are there in SharePoint. Now I wanna show you a little bit behind the scenes because I want you to understand how easy this is. This is what we call the administration council and recognition server. And what I did is, I just completed this document to workflow and you can see here, I’ll just run through the steps very quickly. On the first input, we just tell the system what are the files that we’re going to capture. Where are they located and which ones do we want to process. For this one, that you just saw, we’re just saying. “Hey, we want every file in there.” We can tell the OCR results how well we want them to be captured. Do we want high quality results, or do we want high speed, or do we want somewhere in the middle.
And also we can target the language on the software here as well. If we have barcodes and things like that, we would also process them here. We can tell the software how we want to separate the jobs, and for this one we just said, “Hey, for every file going into that hot folder, we want you to create a job, or a file in the output.” Now we can look at quality control, so for example if we wanted staff to be involved before it ended up in SharePoint, we have the ability to stop and require what we call verification in the software. And we can do that based on the criteria that you see here, whether it’s on all documents, or if it’s just based on a certain arrange there of low confidence characters.
We can also handle exception and things, just different ways in what you want to control it there. If we wanna index the document, for example I have invoice documents that we process. Maybe I wanna index them by invoice number or even invoice date or vendor. We have the ability to allow a user to do that. In this case we did not, we bypassed indexes, but we can stop the process and require an AP clerk or another clerk, just randomly processing other documents to provide the index and information. And then you can see here, we have our output. The output here simple as saying, “Look, I want a PDF document.” and if I HIT edit, you’ll see a little bit of this information showing up here for us. You can say, I want a PDF document. You can say, I wanna save this in the SharePoint library. And then we simply provide the URL of where we’re going to have the documents live.
What library, what folder and those kind of things. We also have full control over down here, of the name you [inaudible 00:03:54] of the documents. So we can even use index information that we captured, to process those for us. And it’s really that simple, this is how easy it is. Honestly, probably setting up SharePoint Online, OCR process. This simple is probably no more than a 15 minute process, and that includes installing the software. So, a very simple and easy to use interface here from a administration perspective. And the cool part is then, we have all of our searchable content, and SharePoint Online in the cloud, so it’s accessible by any staff distributively, so. And that’s ABBY recognition server, I hope you enjoyed it. I hope you understand it. Such a neat and easy to use, and easy to implement product. And please contact us today to learn more.
Watch this video to see how your enterprise may process both structured and unstructured documents quickly and cost effectively using award winning ABBYY FineReader Server (formerly Recognition Server).
Good afternoon, this is Jim Hill from UFC. This afternoon I’m going to show you ABBYY Recognition Server processing some unstructured documents. These types of documents will include things like letters, contracts, and we will also process some structured documents including invoices. Let me introduce ABBYY Recognition Server, talk about some of the features, the advantages and disadvantages of using recognition server to process unstructured documents.
This is one alternative. There are many different alternatives for processing unstructured documents. Let’s look at the demo. What’s I’m going to show you is the ABBYY Recognition Server. I’m just going to show you the quick … This is what the server looks like. This is where you set up your workflows. Let me just quick show you the workflow. From left to right, the name of the workflow, where the documents are coming from, in this case a shared folder, what you want done to the documents, what the language is, the level of OCR you want for example, would you like to optimize for quality or speed, here I have it set in the middle, how are you going to separate the documents many times. It’s simply one document per file. It you’re processing a PDF file, that can be how it can be done. If you’re scanning documents, most frequently we’d use a barcode separator sheet, which we would provide to you when you want to scan documents.
Quality control talks about if you want to verify the data coming off, which is verifying that the OCR engine produced the correct data. Indexing, now here’s where you set up the different document types. I’ve got three document types set up, different index values for each. There is a contract. We have an agreement number and we have a data for the contract. It takes a couple minutes to set those up. What do you want to do with the output file, that’s the last tab. We have the output data going to Microsoft Word. There’s a variety of options. We can set it to XML files, which I also have selected here. We can save the original images. We will show you that as we run some sample batches.
What I’m going to do right now is I’m going to show you some samples. Here’s the folder that is being watched. As I move in some sample documents, I have these sample documents here in this other folder. Let me just show you what one of those documents looks like. Here’s a contract. We’re going to extract the agreement number off of this contract. And some other forms, an invoice, there is a letter. These are all going to be processed together in one big blob as we would expect in an unstructured document processing system. I’m just going to select these documents, copy them over to the watched folder. You notice they disappeared quickly. ABBYY has already processed those documents.
Now what’s happened is we go to our verify, or actually our document indexing station. Since I did not select, tell the system to do any verify functions. What I’m doing here is just entering in index information. This index information will be put into an XML file. It will also be carried in variables that can be used for programming purposes. I’m also putting that index information into the file name. What I have set here, I have these documents set to go to Microsoft SharePoint so that you can have access to these documents anywhere that you have access to your SharePoint server. In this particular case, it’s Office 365 SharePoint, but any SharePoint system will work.
I can clearly see that this is an invoice document. It’s just asking me now what is the invoice, what’s the vendor. I’m just going to say it’s JL Smith, obviously that’s the vendor, and the date, so I’m going to find the invoice date. A caution on the date, it says month, month, day, day, year, year. You need to be careful that this was set correctly. In this case what I can do is just click and keyed the date, and then just move it around a little bit, accept the document. That’s pretty quick.
Now it moves on and it’s going to recognize another document’s pending, a sales agreement. I know that’s the sales agreement. I’m just going to go ahead now and select that field for the agreement number. There’s the agreement number. Okay, contract seller signature date. I’m going to jump to the last page, obviously that’s where the signature is going to be. Now again, I grabbed a little extra data and that format needs to be in month, month, day, day, year, so there’s a little manual intervention required here. What will happen is we can program this to strip out characters once they’re processed. I’m just going to go ahead and put it into that correct format, accept the document. Now the system is not learning how to process these documents. Each and every time I process these documents this is how I’m going to have to do it. You can see that this is a letter, and it’s looking for the sender. Well I’m going to have to go to the last page to find out who sent it. There’s sender. Who’s the addressee, we can see there’s the addressee and there’s the letter date again with our format correct 12/09/2006, accept the document. Then that is it for the batch.
Now what we will do is go to our SharePoint folder where we expect the document results to come up, and there’s the document results. We see a contract, a letter, and an invoice. Let’s just pull up one of those documents. Here’s the contract document showing the exported data coming out of the contract. You see what it did, it just grabbed all the text of the document. If I need to manipulate this data into another system, I can copy paste the system. Now of course we could program the system to put this data somewhere. It will require a little bit of custom programming in the system. Scripting is built into the system. Also, the index data that we entered here is written into the file name. ABC123 was the sales agreement number, and the data of execution was 3/15/2006 as you see that. Similarly for the invoice, you’re going to get the invoice data. It does a very good job of extracting the detail of the invoice, and same thing for the letter.
Well that concludes the ABBYY recognition server demo. I just want to say that there are many different alternatives that we can offer. This is one cost effective way to process unstructured data. We will also show ABBYY FlexiCapture, which is a much more powerful system, but does require more complex setup and is much more expensive. Please get back with me if you have any questions, but we wanted to show this so that you would be aware of the unstructured processing capabilities in ABBYY Recognition Server. Thank you very much.
Watch this video from ABBYY USA to learn how ABBYY FineReader Server (formerly Recognition Server) provides network enabled, high-volume capture and OCR processing of images. This document capture solution processes images through high quality OCR while providing flexibility for export of images and metadata. See more at https://www.ufcinc.com/Data-Capture-and-OCR/abbyy-recognition-server