Jump to content

Regular Expressions/Simple Regular Expressions

From Wikibooks, open books for an open world

The Simple Regular Expression syntax is widely used on Unix based systems for the purposes of backwards compatibility. Most regular-expression-aware Unix utilities, such as grep and sed, use it by default while providing support for extended regular expressions with command line arguments (see below). This syntax is deprecated on POSIX compliant systems and should not be used by new utilities.

When simple regular expression syntax is being used, most characters, except metacharacters are treated as literal characters and match only themselves (for example, "a" matches "a", "(bc" matches "(bc", etc).

Operators
Operator Effect
. The dot operator matches any single character.
[ ] boxes enable a single character to be matched against character lists or character ranges.
[^ ] A complement box enables a single character not within a character list or character range to be matched.
^ A caret anchor matches the start of the line (or any line, when applied in multiline mode)
$ A dollar anchor matches the end of the line (or any line, when applied in multiline mode)
( ) parentheses are used to define a marked subexpression. The matched text section can be recalled at a later time.
\n Where n is a digit from 1 to 9; matches what the nth marked subexpression matched. This irregular construct has not been adopted in the extended regular expression syntax.
* A single character expression followed by "*" matches zero or more copies of the expression. For example, "ab*c" matches "ac", "abc", "abbbc" etc. "[xyz]*" matches "", "x", "y", "zx", "zyx", and so on.
  • \n*, where n is a digit from 1 to 9, matches zero or more iterations of what the nth marked subexpression matched. For example, "(a.)c\1*" matches "abcab" and "abcabab" but not "abcac".
  • An expression enclosed in "\(" and "\)" followed by "*" is deemed to be invalid. In some cases (e.g. /usr/bin/xpg4/grep of SunOS 5.8), it matches zero or more iterations of the string that the enclosed expression matches. In other cases (e.g. /usr/bin/grep of SunOS 5.8), it matches what the enclosed expression matches, followed by a literal "*".

Examples

[edit | edit source]

Examples:

  • "^[hc]at"
    • Matches hat and cat but only at the beginning of a line.
  • "[hc]at$"
    • Matches hat and cat but only at the end of a line.

Use in Tools

[edit | edit source]

Tools and languages that utilize this regular expression syntax include: