Hello, how can we help you?

Recent Searches: Copilot - 2.2.2 - Workspace - 2.2.2 - Workflow - 2.2.2 - Training - 2.2.2

Assembling a regular expression

 

Prerequisite

  • Access to the File > Recognition profile (CAP004) menu.

 

Introduction

When configuring recognition profile indexes, it is possible to use regular expressions.

A regular expression is a notation for describing a pattern of characters. It serves to validate data inputs or to search and extract information in texts. For example, to verify if an entered piece of data is a number from 0.00 to 9.99, it is possible to use the regular expression ^\d,\d\d$ because the \d symbol is a wildcard character that matches one digit. 

The ^ and $ special characters indicate, respectively, how the string must start and end; without them, the numbers 10.00 or 100.123 would be valid because they contain digits that match the regular expression.

A metacharacter is a character or a sequence of characters with special meaning in the regular expressions. Metacharacters can be separated according to their use.

In regular expressions, the verb 'marry' is used as a translation for the match, in order to combine, fit, and pare.

 

Specifiers

Specify the set of characters to be married in a position.

Metacharacter Description

.

Wildcard: Matches any character except the \n line break.

[…]

Set: Matches any character added in the set. For example: 
[a-z] will accept strings with lowercase characters between 'a' and 'z', while [A-Z] accepts uppercase characters between 'A' and 'Z'. 
• [abcABC] will accept strings that contain only the 'a', 'b', 'c', 'A', 'B' and/or 'C' characters. 
 [123] will accept strings that contain only the '1', '2' and/or '3' characters; 
• [0-9] will accept strings with characters between '0' and '9'.

[^...]

Denied set: Matches any character that is not included in the set.

\d

Digit: the same as [0-9].

\D

Non-digit: the same as [^0-9].

\s

Whitespace character: space, line break, tabs etc.; the same as [\t\n\r\f\v].

\S

Non-whitespace character: the same as [^ \t\n\r\f\v].

\w

Alphanumeric: the same as [a-zA-Z0-9_] (but may include Unicode characters).

\w

Non-alphanumeric: the complement of \w.

\

Escape: annuls the special meaning of a metacharacter; for example, \. represents only a point, and not the wildcard character.
 
 

Quantifiers

They define the allowed number of repetitions for the regular expression right before it.

Metacharacter Description

{n}

Allow exactly n occurrences. For example: 
• [abc]{3}: Accepts strings containing 3 characters, such as 'a', 'b' or 'c', such as: aaa, abc, acb, bba, etc. 
• [0-9]{5}: Accepts 5-character strings between '0' and '9', such as: 11111, 12345, 15973, etc.

{n,m}

Allows at least n occurrences and at most m. For example: 
• [abc]{3,5}: Accepts strings containing between 3 and 5 characters, such as 'a', 'b' or 'c', such as: aaaaa, acbca, abc, acba, etc. 
• [0-9]{5,6}: Accepts strings containing 5 or 6 characters between '0' and '9', such as: 12345, 123456, 01030, 000000, etc.

{n,}

Allows at least n occurrences. For example: 
• [abc]{2,}: Accepts strings containing at least 2 characters, such as 'a', 'b' or 'c', such as: aa, abc, ccc, abcabc, etc. 
• [0-9]{2,}: Accepts strings containing at least 2 characters between '0' and '9', such as: 12, 123, 987654321, etc.

?

Allows 0 or 1 occurrence; the same as {0,1}.

+

Allows 1 or more occurrences; the same as {1,}.

*

Allows 0 or more occurrences.
 
 

Anchors

They establish reference positions for the matching of the remainder of the regular expression. Notice that these metacharacters do not match characters in the text, but rather with positions before, after, or between characters.

Metacharacter Description

^

Matches the beginning of a string.

$

Matches the end of a string; does not capture the \n at the end of the text or line.

\A

Beginning of the text.

\Z

End of the text.

\b

Boundary position: Encounters a match at the beginning or end of a string;

\B

Non-boundary position.
 
 

Grouping

It defines groups or alternatives.

Metacharacter Description

(...)

Defines a group, for the purpose of applying a quantifier, alternative or later extraction or reuse.

...|...

Alternative; matches the regular expression to the right or to the left.

\«n»

Retrieves the text matched in the nth group.
 
 

 

Examples of regular expressions

To provide a general idea, see some examples with a brief explanation:

Example 1

Regular expression:

\d{5}-\d{3}

 

The pattern of a zip code like 05432-001: 5 digits, a - (hyphen), and 3 more digits. 

The sequence \d is a metacharacter, a wildcard character that matches a digit (0 to 9). 

The sequence {5} is a quantifier: it indicates that the previous pattern must be repeated 5 times, so \d{5} is the same as \d\d\d\d\d.

 
 

Example 2

Regular expression:

[012]\d:[0-5]\d

 

Similar to the hour and minute format, such as 03:10 or 23:59. 

The sequence between brackets [012] defines a set. In that case, the set specifies that the first character must be 0, 1, or 2.

Inside the [], the hyphen indicates a range of characters; that is, [0-5] is a short form for the set [012345]. The set that represents all the digits, [0-9], is the same as \d

Notice that this regular expression also accepts the text 29:00, which is not a valid time.

 
 

Example 3

Regular expression:

[A-Z]{3}-\d{4}

 

It is the standard for a license plate in Brazil: three letters from A and Z, followed by a - (hyphen), followed by four digits, such as CKD-4592.

 
 

 

Conclusion

After assembling the regular expression, it is possible to apply it to the profile index configurations.


Was this article helpful?