A regular expression is used to search for a specific pattern from text.
john_doe
, jo-hn\\_doe
, and john12\\_as
, but not Jo
as it contains an uppercase letter and is too short.cat
means: the letter c
, followed by the letter a
, followed by the letter t
."cat" => The cat sat on the mat
123
can match the string "123". A regex is matched against the input string by comparing each character in the regex with each character in the input string one by one.
Regexes are normally case-sensitive, so the regex Cat
would not match the string "cat"."Cat" => The cat sat on the Cat
Metacharacter | Description |
. | Matches any character except a line break. |
[ ] | Character class, which matches any character enclosed in square brackets. |
[^ ] | Negated character class, which matches any character not enclosed in square brackets. |
* | Matches zero or more repetitions of the preceding subexpression. |
+ | Matches one or more repetitions of the preceding subexpression. |
? | Matches zero or one repetition of the preceding subexpression or specifies a non-greedy qualifier. |
{n,m} | Braces, which matches the preceding character at least n times but not more than m times. |
(xyz) | Capturing group, which matches the character "xyz" in an exact order. |
| | Alternation, which matches the characters before or after the symbol. |
\ | Escape character, which can restore the original meaning of metacharacters and allows you to match reserved characters [ ] ( ) { } . * + ? ^ $ \\ | |
^ | Matches the beginning-of-line character. |
$ | Matches the end-of-line character. |
.
, which can match any single character but not a line break or newline character. For example, the regex .ar
means: any character, followed by the letter a
,
followed by the letter r
.".ar" => The car parked in the garage.
[Tt]he
means: the uppercase letter T
or the lowercase letter t
, followed by the letter h
, followed by the letter e
."[Tt]he" => The car parked in the garage.
ar[.]
means: the lowercase letter a
, followed by the letter r
, followed by the period .
"ar[.]" => A garage is a good place to park a car.
^
represents the start of a string, but when enclosed in square brackets, it negates the character set. For example, the regex [^c]ar
means: any character except the letter c
, followed by the character a
,
followed by the letter r
."[^c]ar" => The car parked in the garage.
+
, *
, and ?
are used to specify how many times a subpattern can appear. These metacharacters act differently in different situations.*
matches zero or more repetitions of the preceding matcher. For example, the regex a*
matches zero or more repetitions of the preceding lowercase letter a
. However, if it appears after a character set, then it finds the repetitions of the whole character set.
For example, the regex [a-z]*
means: any number of lowercase letters in a row."[a-z]*" => The car parked in the garage #21.
*
can be used together with the metacharacter .
to match the arbitrary string .*
. It can also be used together with the whitespace character \\s
to match a string of whitespace characters.
For example, the regex \\s*cat\\s*
means: zero or more whitespaces, followed by the lowercase letter c
, followed by the lowercase letter a
, followed by the lowercase character t
, followed by zero or more whitespaces."\\s*cat\\s*" => The fat cat sat on the cat.
+
matches one or more repetitions of the preceding character. For example, the regex c.+t
means: the lowercase letter c
, followed by at least one character, followed by the lowercase letter t
."c.+t" => The fat cat sat on the mat.
?
makes the preceding character optional and matches zero or one repetition of the preceding character.
For example, the regex [T]?he
means: the optional uppercase character T
, followed by the lowercase letter h
, followed by the lowercase letter e
."[T]he" => The car is parked in the garage.
"[T]?he" => The car is parked in the garage.
[0-9]{2,3}
means: match at least 2 digits but not more than 3 digits (characters in the range of 0 to 9)."[0-9]{2,3}" => The number was 9.9997 but we rounded it off to 10.0.
[0-9]{2,}
means: match 2 or more digits. If the comma is also removed, the regex [0-9]{2} means: match exactly 2 digits."[0-9]{2,}" => The number was 9.9997 but we rounded it off to 10.0.
"[0-9]{2}" => The number was 9.9997 but we rounded it off to 10.0.
(...)
. If a quantifier is placed after a character, it will repeat the preceding character.
However, if a quantifier is placed after a capturing group, it will repeat the whole capturing group.
For example, the regex (ab)*
matches zero or more repetitions of the string "ab". The metacharacter |
can be used in a capturing group. For example, the regex (c|g|p)ar
means: the lowercase letter c
, g
, or p
, followed by the letter a
, followed by the letter r
."(c|g|p)ar" => The car is parked in the garage.
|
is used to define alternation that is like a condition between multiple expressions. Alternation seems to work in the same way as character set.
However, the great difference is that alternation can be used at the expression level, while character set at the character level.
For example, the regex (T|t)he|car
means: the uppercase character T
or the lowercase letter t
, followed by h
, followed by e
or c
, followed by a
, followed by r
."(T|t)he|car" => The car is parked in the garage.
\\
is used to escape the next character, allowing you to specify a symbol as a matching character including reserved characters { } [ ] / \\ + * . $ ^ | ?
. To use a special character as a matching character, prepend \\
before it.
For example, the regex .
is used to match any character except a line break. To match the character .
in the input string, the regex (f|c|m)at\\.?
means: the lowercase letter f
, c
, or m
, followed by the lowercase letter a
, followed by the lowercase letter t
, followed by the optional .
character."(f|c|m)at\\.?" => The fat cat sat on the mat.
^
(which checks whether the matching character is the start character of the input string) and $
(which checks whether the matching character is the end character).^
is used to check whether a matching character is the first character of the input string. If the regex ^a
(if a is the starting symbol) is used to match the string abc
, it matches a
.
However, if the regex ^b
is used, it does not match anything, because "b" in the string abc
is not the start character.
The regex ^(T|t)he
means that the uppercase character T
or the lowercase letter t
is the starting symbol of the input string, followed by the letter h
, followed by the lowercase letter e
."(T|t)he" => The car is parked in the garage.
"^(T|t)he" => The car is parked in the garage.
$
is used to check whether a matching character is the last character of the input string. For example, the regex (at\\.)$
means: the lowercase letter a
, followed by the lowercase letter t
, followed by the character .
, and the matcher must be the end of the string."(at\\.)" => The fat cat. sat. on the mat.
"(at\\.)$" => The fat cat sat on the mat.
Shorthand | Description |
. | Matches any character except a line break |
\\w | Matches alphanumeric characters: [a-zA-Z0-9_] |
\\W | Matches non-alphanumeric characters: [^\\w] |
\\d | Matches digits: [0-9] |
\\D | Matches non-digits: [^\\d] |
\\s | Matches whitespace character: [\\t\\n\\f\\r\\p{Z}] |
\\S | Matches non-whitespace character: [^\\s] |
.
character that are preceded by the character $
in the input string $4.44 and $10.88
, the regex (?<=\\$)[0-9\\.]*
can be used.
Below are the lookarounds used in regexes:Symbol | Description |
?= | Positive lookahead |
?! | Negative lookahead |
?<= | Positive lookbehind |
?<! | Negative lookbehind |
(?=...)
. The lookahead expression is written after the equal sign inside parentheses.
For example, the regex (T|t)he(?=\\sfat)
means: the uppercase letter T
or lowercase letter t
, followed by the letter h
, followed by the lowercase letter e
or c
.
In parentheses, the positive lookahead is defined, which tells the regex engine to match The
or the
which is followed by the word fat
."(at\\.)$" => The fat cat sat on the mat.
!
instead of the equal sign =
, such as (?!...)
.
For example, the regex (T|t)he(?!\\sfat)
means: get all the words The
or the
and add a whitespace character before the unmatched fat
word from the input string."(T|t)he(?!\\sfat)" => The fat cat sat on the mat.
(?<=...)
. For example, the regex (?<=(T|t)he\\s)(fat|mat)
means: get all the words fat
and mat
after the word The
or the
from the input string."(?<=(T|t)he\\s)(fat|mat)" => The fat cat sat on the mat.
(?<!...)
. For example, the regex (?<!(T|t)he\\s)(cat)
means: get all the cat
words that are not after the word The
or the
from the input string."(?<!(T|t)he\\s)(cat)" => The cat sat on cat.
Flag | Description |
i | Case-insensitive: Sets matching to be case-insensitive. |
g | Global search: Searches for all the matches throughout the input string. |
m | Multiline match: Matches every line of the input string. |
i
is used to perform a case-insensitive match. For example, the regex /The/gi
means: the uppercase letter T
, followed by the lowercase letter h
, followed by the lowercase letter e
.
At the end of the regex, the flag i
tells the regex to ignore the case. As can be seen, the flag g
is also used so as to search for matches in the whole input string."The" => The fat cat sat on the mat.
"/The/gi" => The fat cat sat on the mat.
g
is used to perform a global match (find all matches rather than stopping after the first match).
For example, the regex /.(at)/g
means: any character except a line break, followed by the lowercase letter a
, followed by the lowercase letter t
.
As the flag g
is used at the end of the regex, it will find all matches in the input string.".(at)" => The fat cat sat on the mat.
"/.(at)/g" => The fat cat sat on the mat.
m
is used to perform multiline matching. As discussed earlier, anchors (^, $)
are used to check whether the matched character is the beginning or end of the input string. To have anchors work on each line, the flag m
should be used.
For example, the regex /at(.)?$/gm
means: the lowercase letter a
, followed by the lowercase letter t
, and optionally zero or one repetition of any character except line break. Because the modifier m
is at the end of the regex, the regex engine matches pattern at the end of each line in a string."/.at(.)?$/" => The fatcat saton the mat.
"/.at(.)?$/gm" => The fatcat saton the mat.
Type | Expression |
Positive integer | ^-\\d+$ |
Negative integer | ^-\\d+$ |
Phone number | ^+?[\\d\\s]{3,}$ |
Phone code | ^+?[\\d\\s]+(?[\\d\\s]{10,}$ |
Integer | ^-?\\d+$ |
Username | ^[\\w\\d_.]{4,16}$ |
Alphanumeric character | ^[a-zA-Z0-9]*$ |
Alphanumeric character with whitespace | ^[a-zA-Z0-9 ]*$ |
Password | ^(?=^.{6,}$)((?=.*[A-Za-z0-9])(?=.*[A-Z])(?=.*[a-z]))^.*$ |
Email | ^([a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,4})*$ |
IPv4 address | ^((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))*$` |
Lowercase letters | ^([a-z])*$ |
Uppercase letter | ^([A-Z])*$ |
Username | ^[\\w\\d_.]{4,16}$ |
URL | ^(((http|https|ftp):\\/\\/)?([[a-zA-Z0-9]\\-\\.])+(\\.)([[a-zA-Z0-9]]){2,4}([[a-zA-Z0-9]\\/+=%&_\\.~?\\-]*))*$ |
Visa credit card number | ^(4[0-9]{12}(?:[0-9]{3})?)*$ |
Date (MM/DD/YYYY) | ^(0?[1-9]|1[012])[- /.](0?[1-9]|[12][0-9]|3[01])[- /.](19|20)?[0-9]{2}$ |
Date (YYYY/MM/DD) | ^(19|20)?[0-9]{2}[- /.](0?[1-9]|1[012])[- /.](0?[1-9]|[12][0-9]|3[01])$ |
Mastercard credit card number | ^(5[1-5][0-9]{14})*$ |
Was this page helpful?