Jump to content

PHP Programming/Regular expressions

From Wikibooks, open books for an open world

Syntax

[edit | edit source]
Usual regular expressions
Character Type Explanation
. Dot any character
[...] Brackets character class: all the enumerated characters in the class
[^...] Brackets and circumflex complemented class: all the characters except for the enumerated ones
^ Circumflex string or line start
$ Dollar string or line end
| Pipe alternative
(...) Parenthesis capture group: also used to limit the range of an alternative
* Asterisk 0, 1 or several occurrences
+ Plus 1 or several occurrences
? Interrogation 0 or 1 occurrence
POSIX characters classes[1]
Classe Signification
[[:alpha:]] any letter
[[:digit:]] any digit
[[:xdigit:]] hexadecimal characters
[[:alnum:]] any letter or digit
[[:space:]] any white space
[[:punct:]] any punctuation letter
[[:lower:]] any small cap letter
[[:upper:]] any capital letter
[[:blank:]] space or tabulation
[[:graph:]] displayable et printable characters
[[:cntrl:]] escaping characters
[[:print:]] printable characters, except for the control ones
Unicode regex[2]
Expression Signification
\A String start
\b Start or end of word character
\d Digit
\D Non digit
\s Space characters
\S Non space characters
\w Letter, digit or underscore
\W Non letter, digit or underscore character
\X Unicode character
\z String end

Debugger: https://regex101.com/

  • ?:: ignore the capture group when numeration. Ex: ((?:ignored_substring|other).)
  • ?!: negation. Ex: ((?!excluded_substring).)
  • $1: first capture group result.

Attention: to search for a dollar, "\$" doesn't work because it's the variables format, so the simple quotes must be used instead of the double quotes: '\$'.

in PHP, the regex patterns must always be surrounded by a delimiter symbol. We generally use the grave accent (`), but we also find / and #.

In addition, we can add some options after these delimiters:

i case insensibility
m the "." include carriage returns
x ignore spaces
o only treat the first match
u count the Unicode characters (in multi-byte)

Research

[edit | edit source]

The function ereg(), which allowed to research in regex, has been replaced by preg_match() since PHP 5.3.

preg_match()

[edit | edit source]

The function preg_match[3] is the main regex search function[4]. It returns a Boolean and asks the two mandatory parameters: the regex pattern and the string to scan.

The third parameter represents the variable which stores the results array.

Finally, the fourth accepts an PHP flag allowing to modify the function base behavior.

  • Minimal example:
<?php
$string = 'PHP regex test for the English Wikibooks.';

if (preg_match('`.*Wikibooks.*`', $string)) {
    print('This texts talks about Wikibooks');
} else {
    print('This texts doesn\'t talk about Wikibooks');
}
?>
  • Advanced example:
<?php
$string = 'PHP regex test for the English Wikibooks.';

if (preg_match('`.*Wikibooks.*`', $string), results, $flag) {
    var_dump(results);
} else {
    print('This texts doesn\'t talk about Wikibooks');
}
?>

Flag examples:[5]

  • PREG_OFFSET_CAPTURE: displays the searched substring position in the string.
  • PREG_GREP_INVERT: displays the inverse in preg_grep().

preg_grep()

[edit | edit source]

This function searches into arrays[6].

preg_match_all()

[edit | edit source]

To get all true results in one array, replace preg_match by preg_match_all[7], and print by print_r.

Example to filter a file content:

$regex = "/\(([^)]*)\)/";
preg_match_all($regex, file_get_contents($filename), $matches);
print_r($matches);

Replacement

[edit | edit source]

preg_replace()

[edit | edit source]

The function preg_replace accepts three parameters: the replaced and replacing string to treat.

<?php
// Replace spaces by underscores
$string = "PHP regex test for the English Wikibooks.";
$sortedString = preg_replace('`( )`', '_', $string);
echo $sortedString;
?>

preg_filter()

[edit | edit source]

Same as preg_replace() but its result only include the replacements.

preg_split()

[edit | edit source]

Decomposes a string.

References

[edit | edit source]