Regular Expressions/Syntaxes
There are several variants of regular expressions. These variants differ not only in their concrete syntax but also in their capabilities. Individual tools that support regular expressions also have their own peculiarities.
- Simple Regular Expressions - widely used for backwards compatibility, but deprecated on POSIX-compliant systems.
- Basic Regular Expressions - used by some Unix shell tools
- Perl-Compatible Regular Expressions - used by Perl and some application programs
- POSIX Basic Regular Expressions - provides extensions for consistency between utility programs. These extensions are not supported by some traditional implementations of Unix tools.
- POSIX-Extended Regular Expressions - may be supported by some Unix utilities via the -E command line switch
- Non-POSIX Basic Regular Expressions - provides additional character classes not supported by POSIX
- Emacs Regular Expressions - used by the Emacs editor
- Shell Regular Expressions - a limited form of regular expression used for pattern matching and filename substitution
Greedy expressions
[edit | edit source]Quantifiers such as * and + match as much as they can: they are greedy. For some uses, their greediness does not fit. For example, let as assume you want to find the first string enclosed in quotation marks, in the following text:
- These words include "cat", "mat", and "pat".
The pattern ".*"
matches the italicized part of the text below, that is, "cat", "mat", and "pat" instead of the desired "cat":
- These words include "cat", "mat", and "pat".
To fix this, some flavours of regular expressions provide non-greedy operators such as *?, +?, and }?. In PHP, adding a "U" at the end of the regexp makes the quantifier non-greedy, as in /".*"/U
. In flavours that support neither of the two options, you can specify what is not to be matched, as in ("[^"]*")
to fix the discussed example. However, when dealing with bracketed expressions, (\[\[[^\]]*\]\])
fails to match on A B C [[D E] F G]].
Comparison table
[edit | edit source]A comparison table or matrix that shows which features or flavors of regular expressions are available in which tool or programming language is available from regular-expressions.info.