Skip Navigation
Expand
Using word boundaries in rules: Avoid matching text that is part of another word
Answer ID 2851   |   Last Review Date 12/18/2018

A string that I wish to match in a regular expression is also part of another word that I do not wish to match.

Environment:

Business Rules
All versions

Resolution:

You may wish to find a character string using rules and then perform some action (e.g. queue routing, assignment, set status, etc...).  It is best to use a regular expression operator in your rules to do this (which is highly recommended over the "Contains" operator). For instance, if you were looking for the words "Jump", "jumped", "jumping" in incoming emails, you could put the string 'jump' into the regular expression condition and find all of these.

In other cases (e.g. spam or profanity rules), you have a string that you wish to filter that also will show up as part of another word that you do not wish to filter.  For instance, the words "assistance" or "assume" would often be filtered if a profanity rule with a regular expression had the first three letters of these words.  This type of spam/profanity rule is common and causes many valid emails (containing the words "assistance" or "assume") to be filtered.

How do we get around this?  We have to take a different approach in the regular expression.  This involves the use of a word boundary around the regular expression string.  When using a word boundary in a regular expression, the regex engine looks for the string that you specify, but will only match it if it is surrounded on both sides by any "non-word character" (i.e. a space, dash, or a special character (!$%^.<>*|()?/\ : ;)). Please note that alpha-numeric characters and underscores are considered word characters.

A word boundary allows you to effectively match only the regular expression string, but not match other words that contain that character string (e.g. not match "assistance").

The format of a word boundary for the RightNow regular expressions engine looks as follows:

                [[:<:]]word_here[[:>:]]

For users of May 2017 version of the product the word bound is:

               \<word_here\>

NOTE: There are no spaces within or on either side of the word.

The RightNow product uses "ereg -i" regular expression engine.  The -i option makes the regular expressions pattern matching case insensitive (i.e. upper and lower case will match the same).  This means that the regular expression [[:<:]]jump[[:>:]] matches Jump, JUMP,JuMp, etc...  Because of the word boundary tags around jump this would not match "jumped".  In this example the 'p' in jump is next to the word character, 'e'.  In the profanity example above, the words "assistance" or "assumption" would not be filtered if you put the first 3 letters into a word boundary in a profanity rule.

To help ensure that you are not filtering emails/updates it is highly recommended that you test your regular expressions once you have put them in place so that you do not filter valid emails unintentionally.  Additionally, it is very important to have an address in the "Send Rejected Messages to" field on the incoming tab of the mailbox settings.  More on this can be found in the following answer: 

Answer ID 1976: Accessing discarded email messages

For more information on regular expressions, please refer to Answer ID 861: Rule is matching an incorrect expression.