Jump to content

Regular Expressions/Syntaxes

From Wikibooks, open books for an open world

There are several variants of regular expressions. These variants differ not only in their concrete syntax but also in their capabilities. Individual tools that support regular expressions also have their own peculiarities.

Greedy expressions

[edit | edit source]

Quantifiers such as * and + match as much as they can: they are greedy. For some uses, their greediness does not fit. For example, let as assume you want to find the first string enclosed in quotation marks, in the following text:

These words include "cat", "mat", and "pat".

The pattern ".*" matches the italicized part of the text below, that is, "cat", "mat", and "pat" instead of the desired "cat":

These words include "cat", "mat", and "pat".

To fix this, some flavours of regular expressions provide non-greedy operators such as *?, +?, and }?. In PHP, adding a "U" at the end of the regexp makes the quantifier non-greedy, as in /".*"/U. In flavours that support neither of the two options, you can specify what is not to be matched, as in ("[^"]*") to fix the discussed example. However, when dealing with bracketed expressions, (\[\[[^\]]*\]\]) fails to match on A B C [[D E] F G]].

Comparison table

[edit | edit source]

A comparison table or matrix that shows which features or flavors of regular expressions are available in which tool or programming language is available from regular-expressions.info.

Introduction · Implementation