Portal search engine – how does it work?

Facebooktwittergoogle_pluspinterestlinkedinmail

Introduction

WEBCON BPS 2019 Portal introduces a brand new method of searching for instances in the database. Search is based on SOLR engine which allows for a quick search even in very large data collections.

 

Creating search phrases

By using SOLR technology, user can influence on how the engine searches indexed data and as a result – what the results are. Using appropriate parameters in the query allows for more precise results.

 

Below you can see basic operators which can be used by the user while creating a search query.

 

 

Action name Example How it works
Basic search Invoice Returns all instances with Invoice phrase including linguistic inflections. It means that results will also include instances with words like invoices, invoiced, etc.
Word groups search Cost invoice Returns all instances with words Cost and invoice, including linguistic inflections.

In this case depending on the configuration of „All words” option, results will include instances with all provided words (option checked) or at least one of them (option unchecked).

Complete phrase search ”Cost invoice” Using quotation marks operators causes return of instances with the provided phrase, however the result will also display linguistic inflections of each of these words. In means that results will display also phrases such as “Cost invoices”, etc.
Any character string Doc* Asterix character (*) replaces any character string in the searched phrase. Search results can be as following: Documents, DocReceived, Docertainthings.
Required phrase
+Invoice “+” symbol force the phrase behind it to be included in the searched data.

The results will now include Invoice phrase.

Excluded phrase
-Cost +Invoice “-” symbol forces the phrase behind it not to be included in the searched data.

The results will not include phrase Cost but will include Invoice phrase.

If this operator is to be used – turn off the option „All words”.

 

Operators such as +, -, * can be used only outside of quotation marks with option “All words” turned off. Providing character string which starts and ends with quotation marks like “cost invoice” will force search engine not to include special characters used between quotation marks and treat them as values to search for.

Complete list of the operators supported by the engine is located in SOLR technical documentation: https://lucene.apache.org/solr/guide/7_5/the-standard-query-parser.html

 

Search mechanics in WEBCON BPS 2019 Portal

WEBCON BPS system operation characteristics causes search results to be pre-processed before they are displayed to the user.

 

Extended search

One of the principles of operation of WEBCON BPS 2019 Portal search engine is so-called “extended search”. It works in such a way that if the query entered by the user will not provide any results, the system will automatically repeat the query but broadening it by * operator (any character string) added at the end of the basic query. In the end, the user will be presented with results with are a closest match to the anticipated ones.

 

Example A:

Invoice query does not return any results. System automatically creates new Invoice* query and repeats it.

User is being presented with the results of Invoice* query.

 

Example B:

Invoice query returns results. System does not create an additional query.

User is being presented with the results of Invoice query.

 

Ignoring the „stopwords”

„Stopwords” are words ignored by the search engine. Words such as conjunctions are not taken into consideration while searching for phrases.

 

Example:

Invoice as a cost or our cost invoice queries will return results with Cost invoice phrase.

 

 „Stopwords” list for English language:

a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with

 

Search relevance

Query provided by the user searches for all instances which fit the specified words. During the search, following instance details are taken into account:

  • Instance ID
  • Instance number
  • System elements names (process name, workflow name, step name, form name)
  • Form fields names
  • Form fields’ and items list values
  • Text content of the attachments

Each of these values has a relevance factor attached to it (instance ID – highest, attachment content – lowest) by which the final search result list is being sorted. If searched query was found in a couple of places (for example as a form field value and an attachments’ text) then the relevance factor is summed.

As a result the user is being provided with results sorted by instances which are best fitted to the search query.


Example:

456839 query can return a number of instances.

First one on the result list will be an instance with ID 456839.

Second will be an instance with number 456839.

Another one on the list will be an instance whose one of the form fields value is 456839.

The last instance will be one with text document including number 456839.

 

Search result display

Search results are displayed as a list with option to preview specific instance or its attachments.

Search results view also allows to change search parameters, make search key more precise, narrow down or sort results by search accuracy, creation or last modification dates.

Results are automatically grouped by key tags which are common to instances found and displayed as a filter. Each filter displays a total amount of 10 groups with the highest number of instances. Global choice fields can also appear as filters if they were used on instances forms. By click on such a group user can narrow down the results to instances created by specific persons or in specific application. While narrowing down system creates more groups allowing for more precise choice of specific criteria.

 

Linguistic inflections

Search engine takes into consideration linguistic inflections which doesn’t require the phrase to be extremely precise. For example providing phrase like Invoice will return results also with phrases like invoices, invoiced, etc.

Currently the system supports inflections for following languages: Polish, English, German, Spanish, French, Hungarian and Russian.

When searching for values in form fields or instances’ attachments, the language inflections according to the database language configured during system installation are taken into account. However searching for system objects (process/workflow/step name) considers inflections of all aforementioned languages.

 

Indexing

Data entered to the system by the users is indexed in real time which means that it can be searched for almost immediately after it appears in the system. However a few seconds may pass between entering values to the system by the user (form field save) and creation of a new index for a modified instance in SOLR base.

Index is created for values of all form fields and items list columns. All text document added as attachments to the forms are also indexed. Apart from text formats such as TXT, RTF, XML, HTML, SQL indexation is also done to documents in following formats: Word (DOC, DOCX, DOCX, ODT), Excel (XLSX, XLSM, XLS), PDF with text layer and e-mail message files (EML, EMLX, MSG, OFT, MBOX, TNEF).

Leave a Reply

Your email address will not be published. Required fields are marked *