Have you ever been looking into Optical Character Recognition (OCR) trying to find out more information about it, and you notice that everything you come across mentions that you have to know the difference between fixed vs. semi-structured form design. “I just want to find the right software!! Why does this have to be so complicated?” you might think to yourself (If you are starting your OCR search here are some more things you should be prepared to know). Well, then you’ve come to the right place. By the end of this article you will know the difference between these two forms, understand why everybody asks these questions, and be better prepared moving forward with your OCR search.
So, what is the difference between these two types of forms? It all comes down to where the information you want to extract is located.
Fixed Forms: Like the name would imply, the fields for the data you want to collect will be in the exact same place every time no matter the circumstances. Think of a Federal tax form. No matter what year you fill it out or what state you’re in when you pick up your form, each specific answer is in the exact same location every time. This doesn't mean that each document has to be the same nationwide or even year to year. The documents just have to have the fields in the same location for every time a document is scanned for your specific project. So, another example of a fixed form would be a survey your school or company might send out to students. Every form that is sent out and returned for that specific survey would look identical, therefore making that scanned document a fixed form.
Semi-Structured Forms: As you might have gathered, when you’re dealing with these types of forms, the data field isn’t in the exact same place every time (if it were then that would make it a fixed form). So, what makes these “semi-structured” forms then? Well, while these forms may not have the same field locations every time, they do have the same field “indicators”. What’s a field indicator? Think of it as the title of the field. Some common ones are “Name”, “Address”, “Account Number”, etc. Think of any invoice you might have seen or job application you’ve filled out. Each one of those documents always asks the same thing, and always has a field for you to put your response. Those fields aren't always necessarily in the same location though.
So, now that you know the difference between the two, why is it that people keep asking you which type you intend on scanning? It all comes down to the intelligence of the software. I know we all think that computers are geniuses and have all the answers, so what do I mean by intelligence? Well, believe it or not, not all software is made equal. The software is designed for specific tasks. So, for example, if we only need the software to look in the same spot every time to collect data then that’s a much easier task than having to look through an entire page for data. For a software to scan through an entire page (or multiple pages) for certain keywords (like “Name” or “Address”), that would require the software to be more intelligent (have more complicated coding). So, when somebody ask you what type of document you’re looking to extract data from, what they’re really trying to do is find out how intelligent your software needs to be in order to ensure that you get the perfect software.
Hope this article shed some light on this topic. If you’d like to have a more in depth look at Fixed vs. Semi-Structured forms and how they each interact with ABBYY FlexiCapture (an OCR software we offer), read our White Pages article here. If you happen to be in the market for (or just need consultation with) an OCR software, please feel free to contact us and let us help you make sure you’re getting the most out of your OCR experience.