Using regular expressions

Facebooktwitterpinterestlinkedinmail
Applies to version 8.1.x; Author Maciej Jarzębkowski

Brief introduction to RegEx

Regular expressions (RegEx for short) are a very precise way of filtering various types of documents. They are often used in scripts and for filtering logs, as well as for selecting multiple values from a list. Searching for regular expressions is often much faster than conduction a similar search using JavaScript.

RegEx are templates describing chains of characters, useful when searching for known and reoccurring character patterns. In WEBCON BPS, RegEx can be used to filter e-mail attachments in HotMailBoxes.


 

System settings

Depending on how we want to control the flow of messages, we can change the settings for every mailbox individually.

Go to the HotMailBoxes configuration in System settings.

1

 

In Connection settings, define connection parameters to MS Exchange and to the chosen inbox. For presentation purposes I’ve used my own inbox.

2

 

It is a good idea to create 3 separate folders (Source, Error and Archive). In certain circumstances, not having these 3 folders may cause issues, like messages sending in a loop – flooding the inbox.

3

 

Next, we should attend the section which defines the Working mode of this inbox. We can choose a mode from among the following options:

  • Start one workflow per email
  • Start one workflow per attachment
  • Join to element by barcode found in attachment
  • Join to element with ID found in email content

Now we can select, using RegEx, attachments which will be processed.

 

Syntax Basics

We will be using RegEx to select attachments. The default value: All is represented by the following expression: .*\..*

This somewhat odd formula is interpreted like so:
.   – Any single character
*  – Any number of characters (including no character)

\. – Defines exactly one period (full stop) character; in this context, it forces the sent files to have some sort of file extension.

Then the  .*  string is used again to force the file extension to have at least one character. Which is also why you can’t attach entire folders. In Windows systems, files always have an extension (sometimes they are hidden). However, in certain systems files are not required to have an extension, which may be a potential cause of problems.

RegEx String Notes
Your text Your text
[a-z] Range of lower case letters
[A-Z] Range of upper case letters
[0-9] Range of numerical digits
{} Number of characters from the given range e.g. {2-5,9} means from 2 to 5 or 9 characters.
. Replaces any one character
* Replaces any string of characters (also no characters)
+ Replaces any string of characters, must be at least one character.  .*  = + =  ([a-z][A-Z][0-9])


More information about more complicated combinations can be found on the web, e.g. on http://www.regexr.com/.


RegEx examples

We have the following list of attachments:

image005.jpg
image008.jpg
image009.jpg
image010.jpg
image011.jpg
Webcon_Some_Mail.eml
2015_09_029_INVOICE_MULTI.pdf

 

Here are some examples of RegEx and what they will filter:

image[0-9]{3}\…. Select files containing “image” followed by 3 digits. Additionally it will reject files with 4-character file extensions. ( \. defines exactly one period, and defines 3 symbols of any kind).
image[0-9]{3}\..* Select files containing “image” followed by 3 digits. Will accept any file extension, as long as it is at least 1 character long.
image[0-9]{3-8}\.+ Select files containing “image” followed by between 3 and 8 digits. Will accept any file extension, as long as it is at least 1 character long..
.*\.eml    Select all attached e-mails, regardless of their names.
.*\.pdf Select all attached PDF files, regardless of their names.

 

Testing your RegEx formula:

One of the easiest ways of testing the effectiveness of your filter is using it on http://www.regexr.com/. The site also contains interesting examples e.g. how to identify palindromes. If you prefer to keep things confined to your computer only, you can try testing your regular expressions with Notepad++.
In more complex scenarios, advanced regular expressions may potentially loop endlessly, causing an effect similar to a DoS (Denial-of-Service) attack. Microsoft SDL RegEx Fuzzer is a useful tool for checking your RegEx for such vulnerabilities.

 

What else can we configure?

We can set a certain filename requirement for the .eml (e-mail content) file to be uploaded.
4

 

If we selected “Start one workflow per whole email” mode, we can define a naming scheme for the created attachments. For example, we may want the name to contain the date and time of creation. It is always a good idea to add %FileName% at the end, if only to make sure that the attachment has the correct file extension.

If more than one attachment is detected during workflow startup, we can either add it to the started element or return an error – the email will be redirected to the (…)/Error directory – which can be a separate HotFolder. Emails sent to the Error folder will always contain all attachments.
5

 

Advanced settings

We can also set some restrictions via the advanced settings tab. From here we can limit the number of recipients via data source, e.g. only the clients of a company.
Limit of emails to download per day – Daily limit e.g. Information about winning a draw will be sent to the 10 (first) people.

Limit of emails to download in single iteration – This parameter is used to solve performance issues (e.g. faster servers can handle a larger value). Similarly to workflow elements – it may at times be a better to start a few at a time, thus ensuring that servers don’t run out of memory.

Number of days from the sent date after which emails will be ignored – e.g. e-mails older than 30 days will be ignored.

Setting a parameter to zero means no limits.

6
Analogously, it is possible to define behaviors for various “Main modes”, as well as determine the course of action if an error occurs.

 

Implementing changes.

Remember that the service must be restarted every time changes are made, this will load and implement the new HotMailBoxes settings.

7

The easiest way to do this is in: Control panel -> Administrative tools -> Services.

 

Leave a Reply

Your email address will not be published. Required fields are marked *