applies to version: 8.2.x; author: Filip Jawień
The technical side of WEBCON BPS 8.2 has been massively enhanced when it comes to automatic document recognition and registration.
A completely new and original mechanism was constructed based on neural networks.
A basic use case scenario involving this recognition mechanism would be scanning a company’s physical documents to transfer their information into electronic WEBCON BPS forms. In cooperation with OCR engine, the system transfers scanned imagery into text, then through the use of neural networks, recognizes certain data and phrases.
- WEBCON BPS 8.2
- ABBYY FineReader 11 with active license – component responsible for generation of text layer. In order for the component to work as intended, the text layer must be generated by ABBYY FindReader 11
- OCR AI Project provided by WEBCON – component responsible for recognizing and reading text, then entering the retrieved data into configured form fields.
To complete the installation successfully, the folder containing ABBYY FineReader 11 installation files is necessary. These files should be placed in subfolder named ABBYY found in the WEBCON BPS installation folder.
After launching the WEBCON Business Process Suite Install wizard, pick advanced installation – select the necessary WEBCON BPS elements and additionally select ABBYY FineReader Engine 11.0.
When all WEBCON BPS components are successfully installed, an extra window for ABBYY FineReader 11 will appear on the screen
To correctly install the FineReader 11 component, a product key is required.
There are two types of said key:
- Stand–alone– a USB key which must be plugged into the machine on which FineReader will be installed. This key supports one file processing service.
- Network license – The USB key must be plugged into the license server. Supports multiple services.
Enter a valid product key, as well as the install path for FineReader 11.
Additionally, for the full installation select the “Install license server” and “Install hardlock key drivers”checkboxes.
The “Use network license” should be checked only when using a network license, it will enable the bottom-most field where the license server name must be entered.
Check that all fields have been filled out correctly, then click “OK” to proceed with the installation process.
During the installation you will be asked to assign privileges for FineReader 11. Accepting to assign them now will open up the DCOM application, it is necessary to (at the very least) give privileges to the service account. This is mandatory for the proper cooperation of FineReader 11 and WEBCON BPS.
The license should activate automatically, however in case the need arises to manage FineReader 11 license keys manually:
After FineReader 11 finishes installing, go to the destination folder where FineReader 11 was installed:
C:\Program Files\FineReader 11\Bin64
Then open LicenseManager application
Check if the installed license is activated. If the licenses field is empty, it is necessary to activate it by clicking the “Activate license…” button
Once more, choose the license type and enter the product serial number. If given number is correct, the activated license should be visible in the License window.
How to configure WEBCON BPS to work with FineReader 11
After successfully installing FineReader 11, it must be correctly configured to work with WEBCON BPS.
In Designer Studio, navigate to the System settings tab, then click Services configuration and select OCR component – FineReader.
Afterwards, click the Services configuration node, and then the Services node to unroll a list of all services. Find the desired service to open its configuration, then select the OCR AI checkbox under Service roles.
The Configuration tab is used define the number of threads for each individual OCR function. For maximum effectiveness, it is recommended to set the number of the threads to be one greater than the number of cores in the processor.
In order to fully utilize OCR AI functionality, an OCR AI project has to be added. In order to add OCR AI project, move to OCR AI Projects tab in System settings and choose a folder containing a project.
OCR AI Project is a component responsible for recognition of certain document fields. It is a kind of template, which shows the system where it should look for the chosen fields.
In order to run OCR on documents other than invoices, a new OCR AI project has to be created. If you wish to create a new project, please contact WEBCON.
If the project loaded correctly, click Save button. When saved correctly, the system will be ready to run the recognition process on files, using OCR AI.
Sample OCR WorkFlow
The sample workflow is made out of 7 steps:
- Registration – Start step. Used to fill in basic information and add attachments that intended for be processing.
- Awaiting for text layer – system step, waits to process files (adding text layer) with FineReader component. When the processing is finished, the system will automatically move the workflow to the next step.
- Awaiting for TAX ID recognition – system step, waits for files to be processed by the OCR AI component. In this particular case, the system looks only for the distinguisher, (which, in this case, is set as Seller TAX ID). Thanks to this configuration, the system will decide if a custom OCR AI project has to be used in following step. When the processing is finished, the system will automatically move the workflow to the next step.
- Awaiting for OCR AI – system step, waits for files to be processed by OCR AI. When the processing is finished, the system will automatically move the workflow to the next step.
- Verification – OCR verification step. Allows to validate the results retrieved by OCR AI actions.
- Awaiting for OCR AI learn – system step, waits for the learning process to complete. When this process is concluded, the system will automatically move the workflow to the next step.
- Archive – positive final step
Generate text layer
This action adds files to the Text layer queue. For the example given below, this action will process all supported (jpg, pdf and tiff) files attached to the workflow document, after text layer is generated, it will overwrite the original files.
Action may process only individually chosen files. The system may choose them based on their type, category to which they belong or be filtered by a regular expression.
After text layer is generated, action may overwrite files, create their new version, or create new attachment for which it is possible to configure several properties such as their name, description and category.
This action also allows us to set a priority for files. The range of priority values goes from 1 to 10, where 1 is the highest priority. If Night OCR option is selected, the priority is defined as 11. Such files will only be processed at night/after hours, when exactly these events occur is configured in the schedules section in system settings.
Seller TAX ID recognition
Action searches for a “distinguisher” field (In this case: Seller TAX ID) the value of this distinguisher field will then tell the system which custom OCR AI project to use on the file for best results.
For this action, we leave the distinguisher field empty, as it should be recognized by one, custom project.
You may now map a fields to the desired BPS Form field. If TAX ID field is checked, every non-numerical character will be deleted from the TAX ID number.
This WorkFlow employs two identical actions. The first one leads from “Awaiting text layer” step to “TAX ID recognition” step, and the second one is used to assess the effectiveness of OCR AI learning, leading from “Awaiting for Teach OCR AI” step, to “Awaiting for OCR AI recognition” step.
OCR AI recognition step
This action recognizes all the remaining fields. The distinguisher is set as Seller TAX ID, which has already been retrieved by the previous action – TAX ID recognition. System will choose a custom OCR AI project based on the value of the distinguisher field.
In such a case, OCR AI project including all remaining fields should be used.
OCR AI learning actions
Example of WorkFlow uses 2 OCR AI learning actions.
- Learn OCR AI – used to teach full OCR AI project in order to improve recognition effectiveness
- Learn TAX ID – used to teach Seller TAX ID project.
Every teaching action in this example is an “on path” type action. To choose learning mode, user has to choose a path with correctly configured action.
OCR Verification step
OCR Verification step is used to verify and correct data recognized by the OCR AI system.
To use the OCR Verification step, go to step configuration screen and then in General tab, move to Properties window and check “OCR Verification step” field.
OCR Verification mode is available only when attached file had been processed by OCR AI. If OCR AI recognition was omitted, verification mode will not be available.
OCR Verification form
OCR verification form offers additional features such as preview of fragment of recognized text and checkboxes allowing to choose fields for learning.
After clicking on the review of recognized text, a graphic viewer will automatically show you that fragment of text by enclosing it in red frame. Such view allows to define if recognized text is correct (not only if the characters are correct but also the format and spacing).
Revision of recognized data
If the system doesn’t find the correct fragment of text or it finds an incorrect one, there is possibility to modify such values.
To enter missing values, one has to click on the desired word. After clicking it, text will be automatically entered into recently selected form field.
After revising the fields, user can specify which fields need relearning/improving
Checkboxes will only be available after entering new values to form fields.
If the configuration is correctly completed, the system is ready to work with OCR AI components.