An Awk Primer/Variables
Types and Initialization
[edit | edit source]As already mentioned, Awk supports both user-defined variables and its own predefined variables. Any identifier beginning with a letter and consisting of alphanumeric characters or underscores (_
) can be used as a variable name, provided it does not conflict with Awk's reserved words. Obviously, spaces are not allowed within a variable name; this would create too much confusion. Beware that using a reserved word is a common bug when building Awk programs, so if a program blows up on a seemingly inoffensive word, try changing it to something more unusual and see if the problem goes away.
There is no need to declare variables, and in fact it can't be done, though it is a good idea in an elaborate Awk program to initialize variables in the BEGIN
clause to make them obvious and to make sure they have proper initial values. Relying on default values is a bad habit in any programming language, although in Awk all variables begin with a value of zero (if used as a number) or an empty string. The fact that variables aren't declared in Awk can also lead to some odd bugs, for example by misspelling the name of a variable and not realizing that this has created a second, different variable that is out of sync with the rest of the program.
Also as mentioned, Awk is weakly typed. Variables have no data type, so they can be used to store either string or numeric values; string operations on variables will give a string result and numeric operations will give a numeric result. If a text string doesn't look like a number, it will simply be regarded as 0 in a numeric operation. Awk can sometimes cause confusion because of this issue, so it is important for the programmer to remember it and avoid possible traps. For example:
var = 1776
var = "1776"
Both examples are the same—they both load the value 1776 into the variable named var
. This can be treated as a numeric value in calculations in either case, and string operations can be performed on it as well. If var
is loaded up with a text string of the form:
var = "somestring"
String operations can be performed on it, but it will evaluate to a 0 in numeric operations. If this example is changed as follows:
var = somestring
Now, this will always return 0 for both string and numeric operations—because Awk thinks somestring
without quotes is the name of an uninitialized variable. Incidentally, an uninitialized variable can be tested for a value of 0:
var == 0
This tests "true" if var
hasn't been initialized; but, oddly, an attempt to print
an uninitialized variable gives nothing. For example:
print something
This simply prints a blank line, whereas:
something = 0; print something
This prints a "0".
Arrays and Strings
[edit | edit source]Unlike many other languages, an Awk string variable is not represented as a one-dimensional array of characters. However, it is possible to use the substr()
function to access the characters within a string. More info about arrays and string-handling functions will come later.
Built-In Variables
[edit | edit source]Awk's built-in variables include the field variables—$1
, $2
, $3
, and so on ($0
is the entire line)—that break a line of text into individual words or pieces called fields. Soon, we will see how slightly more advanced Awk programs can manipulate multi-line data, such as a list of mailing addresses.
Nevertheless, Awk also has several built-in variables. Some of these can be changed by using the assignment operator. For example, writing FS=":"
will change the field separator to a colon. From that point forward, the field variables will refer to each colon-separated part of the current line.
NR
: Keeps a current count of the number of input records. Remember that records are usually lines; Awk performs the pattern/action statements once for each record in a file.NF
: Keeps a count of the number of fields within the current input record. Remember that fields are space-separated words, by default, but they are essentially the "columns" of data if your input file is formatted like a table. The last field of the input line can be accessed with$NF
.FILENAME
: Contains the name of the current input file.FS
: Contains the field separator character used to divide fields on the input line. The default is "white space", meaning space and tab characters.FS
can be reassigned to another character (typically inBEGIN
) to change the field separator.RS
: Stores the current record separator character. Since, by default, an input line is the input record, the default record separator character is a newline. By settingFS
to a newline andRS
to a blank line (RS=""
), you can process multi-line data. This would be used for, say, a list of addresses (with each address taking several lines).OFS
: Stores the output field separator, which separates the fields when Awk prints them. The default is a blank space. Wheneverprint
has several parameters separated with commas, it will print the value ofOFS
in between each parameter.ORS
: Stores the output record separator, which separates the output lines when Awk prints them. The default is a newline character.print
automatically outputs the contents ofORS
at the end of whatever it is given to print.OFMT
: Stores the format for numeric output. The default format is "%.6g
", which will be explained whenprintf
is discussed.ARGC
: The number of command-line arguments present.ARGV
: The list of command-line arguments.
Changing Variables
[edit | edit source]By the way, values can be loaded into field variables; they aren't read-only. For example:
$2 = "NewText"
This changes the second text field in the input line to "NewText". It will not modify the input files; don't worry. This can be used as a trick to perform a modification on the lines of an input file and then simply print the lines using print
without any parameters.
Again, all variables can be modified, although some of the built-in variables will not produce the expected effect. You can, for instance, change the value of FILENAME
, but it will not load a new file. Awk simply continues normally, but if you access FILENAME
the new value will be there. Same for NR
and NF
—changing their values will affect your program if it reads those variables, but it won't affect Awk's behavior.
Practice
[edit | edit source]- Write the address book program. You'll need to set
FS
to a newline andRS
to a blank line. Your program should read the multi-line input and output it in single-line format. - Write a program that reads a list of numbers and outputs them in a different format. Each input line should begin with a character such as a comma or hyphen, followed by a space and then up to five numbers (space-separated). Your program should output these numbers with the new separator (given at the beginning of the line) in between the numbers. You'll have to modify
OFS
for each line of input.
Continue to the next page to learn about Awk's powerful associative arrays.