In Chain Builder, you can use regular expression (regex) operators to match characters in text strings, such as to define patterns for:
- Mapping transformation rules for a Data Prep connector pipeline
- The File Utilities connector's Find, Find and replace, and Split file commands
- The Tabular Transformation connector's Column filter, Filter rows, Find and replace, Join columns, and Smart filter rows commands
Common operators
To define patterns to match, you can use these common operators:
Operator | Description | Example | Returns |
---|---|---|---|
^ |
Matches the beginning of a string | ^abc |
abc , abcdef... , abc123 |
$ |
Matches the end of a string | abc$ |
my:abc , 123abc , theabc |
. |
Matches any character as a wildcard | a.c |
abc , asc , a123c |
| |
An OR character | abc|xyz |
abc or xyz |
(...) |
Captures values in the parentheses | (a)b(c) |
a and c |
[...] |
Matches anything within the brackets | [abc] |
a , b , or c |
[a-z] |
Matches lowercase characters between a and z | [b-z] |
bc , mind , xyz |
[0-9] |
Matches any number values between 0 and 9 | [0-3] |
3201 |
{x} |
The exact number of times to match | (abc){2} |
abcabc |
{x,} |
The minimum number of times to match | (abc){2,} |
abcabcabc |
* |
Matches anything in the place of the *, or a "greedy" match | ab*c |
abc , abbcc , abcdc |
+ |
Matches the character before the + one or more times | a+c |
ac , aac , aaac |
? |
Matches the character before the ? zero or one times, or a "non-greedy" match | ab?c |
ac , abc |
/ |
Escapes the character after the /, or creates an escape sequence | a/bc |
a c , with the space matching the /b |
To use an operator's literal character within a pattern, not as regex:
- For a circumflex (
^
), period (.
), open bracket ([
), dollar sign ($
), open or close parenthesis ((
) or ()
), pipe (|
), asterisk (*
), plus sign (+
), question mark (?
), open brace ({
), or backslash (\
), follow it with the escape operator (\
). - For an end bracket (
]
) or end brace (}
), make it the first character, with or without an opening^
. - For a dash (
-
), make it the first or last character, or the second endpoint of a range.
Tip: All characters within brackets are taken literally, and not as regex operators. For example, [*\+?{}.]
matches any of the literal characters within the brackets.
Match start or end of string (^
and $
)
To match patterns at the beginning or end of the string, use the operators ^ and $ , respectively. For example:
Example | Matches |
---|---|
^The |
Any string that starts with The |
of despair$ |
Any string that ends with of despair |
^abc$ |
A string that starts and ends with abc —an exact match |
Tip: If neither ^
or $
is used, the pattern matches any string that contains the characters specified. For example, notice
—with no ^
or $
—returns any string that contains notice
.
Match characters (*
, +
, and ?
)
To match patterns based on a specific character, follow the character with the operator *
, +
, or ?
. These operators indicate the number of times the character should occur for a match—zero or more, one or more, or one or zero, respectively. For example:
Example | Matches |
---|---|
ab* |
A string that contains a , followed by zero or more b s—ac , abc , or abbc |
ab+ |
A string that contains a , followed by one or more b s—abc or abbc , but not ac |
ab? |
A string that contains a , followed by zero or one b s—ac or abc , but not abc |
a?b+$ |
A string that ends with one or more b s, with or without a preceding a ; for example, ab , abb , b , or bb , but not aab or aabb |
Match characters' frequency ({...}
or (...)
)
To match a pattern based on how often a single character occurs, follow it with the number or range of instances, wrapped in braces ({...}
). For example:
Example | Matches |
---|---|
ab{2} |
A string that contains a , followed by exactly 2 b s—abb |
ab{2,} |
A string that contains a , followed by at least 2 b s—abb , abbbb , etc. |
ab{3,5} |
A string that contains a , followed by three to five b s—abbb , abbbb , or abbbbb |
Tip: Always specify the first number of a range—{0,2}
, not {,2}
. Instead of the ranges {0,}
, {1,}
, or {0,1}
, you can use the operators *
, +
, or ?
, respectively.
To match a pattern based on how often a sequence of characters occurs, wrap it in parentheses ((...)
). For example, a(bc){1,5}
matches a string that contains a
, followed by one to five instances of bc
.
Match one of multiple patterns (|
)
To match one of multiple patterns—such as this
OR that
—use the OR operator |
. For example:
Example | Matches |
---|---|
hi|hello |
A string that contains either hi or hello |
(b|cd)ef |
A string that contains either bef or cdef |
(a|b)*c |
A string that has a sequence of alternating a s and b s, ending with c |
Match any character (.
)
To represent any character in a pattern to match, use the wildcard operator .
. For example:
Example | Matches |
---|---|
a.[0-9] |
A string that contains a , followed by any character and a digit |
^.{3}$ |
Any string of exactly three characters |
Match character position ([...]
)
To match a pattern based on the position of a character, use brackets ([...]
). For example:
Example | Matches |
---|---|
[ab] |
A string that contains either a or b ; equivalent to a|b |
[a-d] |
A string that contains a lowercase |
^[a-zA-Z] |
A string that starts with any letter, regardless of case |
[0-9]% |
A string that contains any single digit followed by a percent sign |
,[a-zA-Z0-9]$ |
A string that ends with a comma followed by any character |
Note: All characters within brackets are taken literally, and not as regex operators. For example, [*\+?{}.]
matches any of the literal characters within the brackets.
Match unwanted characters ([^...])
To match a pattern that does not contain characters, start the sequence with an ^ operator, and wrap it in brackets. For example, %[^a-zA-z]%
matches a string with any non-letter character between two percent signs.