Guide to Unix/Explanations/awk
The name 'awk' is derived from the names of the three people who originally developed it - Aho, Weinberger and Kernighan. It is a programming language which uses a pattern-action expression that transforms the input to the output. It processes the input (usually a file of data), searching each line for the given pattern. Any line that matches the given pattern has the action applied to it and this constitutes the output. A line that does not match is ignored.
Each input line is divided into fields by a separator character (default is space) and patterns can be matched to these fields as they are referenced in the usual Unix style - $1 being field 1, $2 being field 2 etc. $0 means the entire input line.
If no pattern is specified then all input lines are selected. If no action is specified, the default action is to print the entire line. Therefore if you just want to print a subset of the input, you just need to supply a pattern that will produce the desired results, Awk will print the input as found.
However, you can also specify which fields are to be output in the same way e.g. print $1.
A simple example:
awk '$1 ~ /A/ { print $2 " " $3 }' /etc/passwd
Program Structure
[edit | edit source]awk programs consist of a sequence of one or more pattern-action statements:
pattern { action } pattern { action } : :
awk scans input lines of data and performs actions on those lines that match any of the specified patterns.
Running AWK
[edit | edit source]Here we call awk from a shell script awk1.sh:
#!/bin/bash # awk1.sh awk ' { print } ' $1
There is no pattern, so every line fed into awk is matched and the action is invoked. Which results in every line of the file being printed on the screen. Thus awk1.sh behaves similar to cat.
To demonstrate, create the file numeric.dat with the contents:
1 one i 2 two ii 3 three iii 4 four iv 5 five v 6 six vi 7 seven vii 8 eight viii 9 nine ix 10 ten x
Run awk1.sh on numeric.dat (don't forget to make the script executable):
./awk1.sh numeric.dat 1 one i 2 two ii 3 three iii 4 four iv 5 five v 6 six vi 7 seven vii 8 eight viii 9 nine ix 10 ten x
(Notice how ./ is being used to execute a script.)
Expressions =
[edit | edit source]If the first field is equal to one then print the entire line
#!/bin/sh # awk1.sh awk ' $1 == 1 { print $0 } ' $1
Results in:
1 one i
If the second field is equal to "two" then print the entire line:
$2 == "two" { print $0 }
Results in:
2 two ii
If the first field is greater than 5 then print the third field
$1 > 5 { print $3 }
Results in
vi vii viii ix x
Regular Expressions
[edit | edit source]Print the input line if the pattern "ix" is matched in any field
/ix/ { print $0 }
Results in:
6 six vi 9 nine ix
Print the input line if the pattern "ix" is matched in the third field:
$3 ~ /ix/ { print $0 }
Results in:
9 nine ix
Print the input lines that do not contain the pattern "x"
$0 !~ /x/ { print }
Results in:
1 one i 2 two ii 3 three iii 4 four iv 5 five v 7 seven vii 8 eight viii
Compound expressions
[edit | edit source]Print lines where the third field matches the pattern "x" OR the first field is less than or equal to 3.
$3 ~ /x/ || $1 <= 3 { print $0 }
Results in:
1 one i 2 two ii 3 three iii 9 nine ix 10 ten x
Print lines where the third field matches the pattern "vi" AND the second field begins with the letter "s".
$3 ~ /vi/ && $2 ~ "^s" { print $0 }
Results in:
6 six vi 7 seven vii
Ranges
[edit | edit source]Print lines where the second field equals "three" and where the third field equals "vii" and all subsequent lines in between:
$2 == "three", $3 == "vii" { print $0 }
Results in:
3 three iii 4 four iv 5 five v 6 six vi 7 seven vii
BEGIN and END
[edit | edit source]BEGIN is a special pattern which matches before the first input line. Similarly END matches after the last input line.
BEGIN { print "start at 3..." } $2 == "three", $2 ~ /^e/ { print $1 } END { print "...and end at eight" }
Results in
start at 3... 3 4 5 6 7 8 ...and end at eight