Geschiedenis en toekomst van ABBYY FlexiCapture
The Emergence of ABBYY FlexiCapture
ABBYY FlexiCapture has been the inspiring product for other manufacturers to develop the next generation of Data Capture software. ABBYY FlexiCapture was the first Data Capture solution to utilize an entirely new architecture. This solution aligned with the spirit of the times, where increasingly better Data Capture results were expected from our technical efforts. Let’s take a closer look at the extraction of “unstructured data” as a compelling example. What exactly is “unstructured data,” and where does the innovation lie?
Before diving deeper into the innovative world of unstructured document recognition, a brief look back at the pre-1990 innovative Data Capture software solutions is in order.
Let’s examine Data Capture applications in office automation environments, such as invoice recognition, as an example. This was already a significant Data Capture software application in the 1990s. At the time, “Regular Expressions” (Regex) were used to extract invoice data.
Regular Expressions Explained
A Regular Expression is a short definition (the expression), often abbreviated as “RegEx.” To describe this abbreviation simply, you can think of “RegEx” as a mix between computer programming code and a mathematical formula, like the ones you learned in school. In short, it’s a brief text with variables indicating what you’re searching for—in other words, the condition that must be met to accept a particular result.
At its core, it works quite simply: you draw a zone around the field to be recognized—for example, the date field on an invoice. Then, in a script box provided by the program, you define the desired Regex expression applicable to that specific date field. This way, you can extract data from any document, such as an invoice in this case. That was Data Capture in the 1990s—data extraction primarily enabled by the use of Regular Expressions.
ABBYY FormReader and Regular Expressions
The turbulent 1990s, when office automation and related initiatives sought demonstrable efficiency, marked the origin and emergence of EasyData.
EasyData Engineers were already involved early on in the implementation of Data Capture software products. ABBYY was already a leading manufacturer at the time, primarily known for offering the best OCR (Optical Character Recognition) products on the market. Initially, ABBYY coupled its OCR technology with FormReader, a package that allowed specific field extractions based on Regular Expressions (there they are again). This was also a period when Data Capture found new possibilities, and customers were already pleased if we could retrieve basic document data and archive it correctly. Extending this functionality, you could say that document classification was already being achieved with FormReader!
Unstructured Data
In summary, ABBYY FormReader was a product that Data Capture engineers could work with exceptionally well. The package allowed engineers to define individual zones, with Regular Expressions as an additional technique.
The goal was to extract the correct value from scanned documents. Even then, EasyData distinguished itself with smart data extractions based on ABBYY FormReader and the applied RegEx formulas. We structured unstructured data primarily through field recognition. By analyzing what we encountered on documents, we could automatically determine whether it was a shipping note, invoice, contract, or another type of document.
This approach changed entirely with the arrival of ABBYY FlexiCapture!
ABBYY FlexiCapture Architecture
As a natural consequence of ICT advancements, ABBYY FormReader evolved into the groundbreaking ABBYY FlexiCapture around 2005. Once again, this was a progressive development for engineers working on Data Capture challenges. These engineers continuously refine their skills to intelligently meet the ambitious demands of daily incoming requests. The new ABBYY FlexiCapture technology provides additional functionality for projects where no automated solution was previously conceivable. Unstructured data is now within reach of capable Data Capture engineers.
ABBYY Technology Makes the Difference
The key difference from FormReader was, first and foremost, the elimination of dependency on zonal OCR recognition. With ABBYY FlexiCapture, the entire document is initially processed using ABBYY’s superior FineReader OCR technology. Only afterward does the recognition of unstructured data come into play. Once the full document has undergone OCR text recognition, ABBYY FlexiCapture can immediately attempt a preliminary document classification. For a Data Capture engineer tackling your document challenges, this is a time-saving feature—especially when the underlying document profiles already describe the different document types. From its introduction, FlexiCapture has stood out as a highly accurate Data Capture product.
Let’s Search for Invoice Numbers!
Suppose, for example, we need to find a word like “Invoice number” as part of an invoice recognition task. FlexiCapture actively searches the surrounding area of that word for “Invoice number.” This approach provides significant advantages for engineers aiming to deliver a flawless FlexiCapture project. Thanks to this document definition technique, invoice recognition remains robust—even if the layout changes, it has no impact on the final result. Recognize the invoice, regardless of its layout. Has the invoice layout changed? The invoice number will still be found without errors! Now, it no longer matters if the word appears in a completely different invoice layout.
Take Control of Your Unstructured Data
ABBYY FlexiCapture stands out as a precise Data Capture product, scalable for any document imaging and data extraction application. Since its introduction, ABBYY has set the standard with FlexiCapture in software solutions that structure data. EasyData immediately adopted the ABBYY FlexiCapture SDK and integrated this technology into EasySeparate. Until 2020, EasyData supported ABBYY FlexiCapture in EasySeparate. After 2020, EasySeparate transitioned to Machine Learning technology—yet another technological leap forward!
ABBYY FineReader OCR for FlexiCapture
The OCR engine powering ABBYY FlexiCapture is FineReader. This recognition technology automatically processes documents with superior OCR quality, converting text—regardless of document structure, language, or content—into usable and accessible data. The extracted data is made available for any downstream application, with FlexiCapture offering versatile export capabilities.
And where export limitations might arise, EasyData proves its added value.
Self-Learning Classification with FlexiForms
Some ABBYY partners refer to this technology as “self-learning.” This self-learning aspect lies in ABBYY FlexiCapture’s ability to search for data beyond rigidly defined OCR zones. However, EasyData avoids using the term “self-learning.” Our impression is that while this technology represents a significant improvement, it still requires human expertise in the process.
With the advent of Machine Learning, EasyData’s opinion on “self-learning” has shifted. In practice, we now see that self-learning technology has indeed become viable.
ABBYY FlexiCapture vs. Machine Learning
Machine Learning algorithms are not always suitable for achieving results. To apply Machine Learning effectively, you need training examples—after all, the system must have something to learn from. Without sufficient document samples, FlexiCapture technology remains a better choice.
ABBYY FlexiCapture ScanStation
FlexiCapture ScanStation simplifies scanning via any TWAIN-, ISIS-, or WIA-compatible device. The ScanStation module is available in both thick- and thin-client versions.
ScanStation features individual scan profiles, allowing predefined settings to be applied to specific documents before scanning.
When scanning a new batch of documents, users only need to select the appropriate profile from a drop-down menu.
ABBYY FlexiCapture Scanstation
FlexiCapture Scanning Station maakt eenvoudig scannen mogelijk via elk TWAIN-, ISIS- of WIA-compatibel apparaat. De Scanning Station module is verkrijgbaar in thick en thin client-versies.
Het Scanning Station beschikt over individuele scanprofielen, waarmee vooraf gedefinieerde instellingen voor toepassingen kunnen worden toegepast op specifieke te scannen documenten.
Bij het scannen van een nieuwe set documenten hoeft de gebruiker alleen het juiste profiel te kiezen uit een vervolgkeuzemenu.
ABBYY FlexiCapture ScanStation
FlexiCapture ScanStation makes scanning simple. You can operate FlexiCapture using any TWAIN-, ISIS-, or WIA-compatible device. The FlexiCapture ScanStation module is available in different versions, always tailored to your organization’s needs and applications.
ScanStation includes customizable scan profiles, allowing you to predefine settings for specific document processing workflows. When scanning a new batch of documents, users only need to select the appropriate profile from a drop-down menu.
Web ScanStation
FlexiCapture also supports browser-based document scanning—a major advancement. In the past, scanning was only possible locally, with the scanner directly connected to a computer. Today, online scanning is fully viable thanks to modern technology.
Now, you can securely scan documents from your scanner directly to the internet. An added benefit is that you no longer need to run all follow-up applications on your own computer. EasyData has even developed custom scanning enhancements for several clients in this area. If you require web scanning functionality, we can advise you on the available options. Better yet, with EasyData’s licensing approach, online document scanning becomes both technically feasible and cost-effective.
Image Enhancement
Preloaded or scanned images can now be easily enhanced before further document processing. Scan enhancement technology, combined with the latest Machine Learning techniques, continues to evolve. EasyData is constantly working to refine scan preparation and image improvement.
To learn more about intelligent image enhancement and how EasyData provides solutions in this area, follow this link.
Unstructured Data: Transforming Recognition into Structure
The recognition phase begins with classification—what type of document needs to be identified? This document classification is first handled by ABBYY FlexiCapture technology. If that doesn’t suffice, we step in with practical solutions like Regular Expressions.
After document classification comes the OCR-based recognition stage. Based on the FlexiCapture document definition, individual documents and their corresponding pages are identified. Once classification is complete, we also know where specific fields are located. For document fields that aren’t immediately recognizable, FlexiCapture technology lends a helping hand.
In summary:
- Text and data extraction form the foundation for document classification.
- Document classification enables automated validation.
- Documents meeting the criteria defined in the FlexiForm never require manual review by the FlexiCapture operator.
All these stages are executed simultaneously in an automated workflow.
Document Classification
ABBYY FlexiCapture can automatically assemble multi-page documents from a mix of individually scanned pages. Document separation during scanning can be indicated using separators—such as page numbering, dedicated separator sheets, or advanced ABBYY classification algorithms that detect which pages belong to different documents.
This ensures seamless processing, even for complex, unstructured data.
Document Classification
ABBYY FlexiCapture can automatically assemble multi-page documents from a mix of individually scanned pages. Document separation during scanning can be indicated using separators—such as page numbering, dedicated separator sheets, or advanced ABBYY classification algorithms that detect which pages belong to different documents.
This ensures seamless processing, even for complex, unstructured data.
Terminology for Unstructured Data
Here, we use the terms FlexiForms and document definition interchangeably—both refer to the same FlexiTemplate. A FlexiTemplate defines the text to search for on scanned pages and the unstructured data to extract from them. While this might sound a bit complex, creating a FlexiTemplate can indeed be a sophisticated process.
Scanned Document Verification
FlexiCapture offers multiple methods for efficient document verification. Covering all options in a single article would compromise readability, but we certainly don’t underestimate the importance of explaining these verification features to our readers.
The goal of this article is to provide a clear explanation of how ABBYY FlexiCapture approaches document verification—ensuring accuracy while maintaining an intuitive workflow.
Scanned Document Verification
FlexiCapture offers multiple methods for efficient document verification. While covering all options in detail could compromise the article’s readability, we fully recognize the importance of explaining these verification features to our readers. This article aims to clearly explain ABBYY FlexiCapture’s approach to document verification.
Group Verification
Group verification is ideal for checkmarks and numerical values appearing across all documents in a batch. Identical characters from an entire document batch are displayed together for streamlined validation.
Field Verification
Field verification checks individually defined data fields one by one. Fields can be linked to specific databases, making it easier for operators to select correct values. Based on the extracted data, the system can either auto-select the proper value or provide input suggestions to operators.
Verification in Document Window
Recognition results for all required fields are displayed simultaneously alongside the original scanned image. Incorrectly recognized information (like handwritten text or notes) can be manually corrected in the fields.
Automated Export of Scanned Documents
FlexiCapture can be configured to auto-export processed documents. In the document definition properties dialog, enable Automatic Export under Export Settings. This eliminates the need for operators to manually trigger exports.
While FlexiCapture provides standard export options, the possibilities aren’t limited to these. Using the ABBYY Export Component, EasyData can implement custom exports via:
-
Our Business Connector
-
FlexiCapture’s built-in export scripting capabilities
Web-Based Management & Monitoring Console
FlexiCapture includes a 24/7 web-based console for remote oversight. Administrators can:
-
Manage user permissions
-
Review event logs
-
Access standard reports or generate custom performance analytics
ABBYY FlexiCapture Email Alerts
Administrators can configure email notifications for critical events, including:
-
Errors and system failures
-
License expirations
-
Page limit thresholds
-
Database overflow warnings
-
Low disk space alerts
-
Access permission requests
-
Failed login attempts
This ensures proactive system management and quick response to operational issues.
Use Case of ABBYY FlexiCapture
ABBYY FlexiCapture is primarily a data extraction (Data Capture) software application for organizations. ABBYY FlexiCapture gives engineers the freedom to apply optical character recognition (OCR) functionalities more intelligently. At the same time, users benefit from the intuitive way the technology unfolds for the operator. FlexiCapture automatically extracts information from submitted documents by pulling unstructured data based on predefined Regular Expressions (RegEx). These rules are an excellent first step toward smart field recognition and a successful Data Capture project!
Utilize a Keyword Directory
A next step for further optimization, for example, is implementing a keyword directory tailored to the project. Such a keyword directory should naturally align with the Data Capture task. With this keyword directory, the system can then actively search for the location of data on a page.
For ABBYY FlexiCapture, ready-made modular solution packages are available, such as FlexiCapture for Invoices and FlexiCapture for Mailrooms.
Features of ABBYY FlexiCapture
The core functionalities of this technology can be summarized under the following key concepts:
Document Classification – Documents determine which fields to recognize based on their classification. Aligning these document classes with the corresponding data extraction rules is the challenge we gladly take on.
Exporting Data – Once the data is located and recognized, it must be exported somewhere. Exporting to a database, XML file, or Microsoft Excel are obvious options.
Comparison Capabilities – ABBYY FlexiCapture’s document classification features can be used to extract field values from document sets and then compare them.
Flexi Project Example
A loan application will contain individual documents. Separate rules can be configured to extract the relevant value for each specific field from every document and then compare them. This allows any discrepancies to be presented to the operator during the document verification phase.
In the image below, the components of the FlexiCapture Server are graphically displayed.
Customization and Integration Capabilities
EasyData tailors ABBYY FlexiCapture to fit any workflow. Among other technologies, we leverage ABBYY’s advanced solutions. If you’ve read the article, you now know that EasyData also offers a wide range of capabilities for recognizing scanned documents.
Additionally, EasyData is recognized as a technical leader in image enhancement technology.
Image enhancement and document recognition can be adapted to individual workflows and processing scenarios. Scripts executed between processing phases enable FlexiCapture to adjust document handling to virtually any requirement a client may have.
Scripting allows the extension of standard workflows by enabling:
- Custom processing logic
- Advanced data validation
- Seamless integration with external systems
ABBYY FlexiCapture Engineers On Demand
EasyData has the in-house expertise to assist ABBYY Partners individually in strengthening their market proposition. Suppose you’re handling a project that requires specialized ABBYY knowledge—EasyData provides the solution. Our ABBYY specialists collaborate globally with other ABBYY Partners to ensure successful implementations.
Have questions about the possibilities? We’d be happy to provide further details!
Your successful implementations contribute to the success of us all!