An Awk Primer/Arrays
Introduction
[edit | edit source]Awk also permits the use of arrays. Those who have programmed before are already familiar with arrays. For those who haven't, an array is simply a single variable that holds multiple values, similar to a list. The naming convention is the same as it is for variables, and, as with variables, the array does not have to be declared.
Awk arrays can only have one dimension; the first index is 1. Array elements are identified by an index, contained in square brackets. For example:
some_array[1] = "Hello" some_array[2] = "Everybody" some_array[3] = "!" print some_array[1], some_array[2], some_array[3]
The number inside the brackets is the index. It selects a specific entry, or element, within the array, which can then be created, accessed, or modified just like an ordinary variable.
Associative Arrays
[edit | edit source]This is where people familiar with C-style arrays learn something new. Awk arrays are interesting because they are actually associative arrays. The indexes are actually strings, so associative arrays work more like a dictionary than a numbered list. For example, an array could be used to tally the money owed by a set of debtors, as follows:
debts["Kim"] = 50 debts["Roberto"] += 70 debts["Vic"] -= 30 print "Vic paid 30 dollars, but still owes", debts["Vic"]
There are a few differences between C-style arrays (which behave like a numbered list) and associate arrays (which behave like a dictionary):
C-Style Array | Associative Array |
---|---|
|
|
Awk only has associative arrays, never C-style arrays. This comparison is only for people who have learned arrays from a different programming language.
Array Details
[edit | edit source]Let's review some of the more specific features of Awk's arrays
Variable Length
[edit | edit source]An array in Awk can grow and shrink throughout the course of the program. Whenever you access an index that Awk hasn't seen before, the new entry gets created automatically. There is no need to let Awk know how many elements you plan on using.
message[1]="Have a nice" message[2]="day." print message[1], message[2], message[3]
In this example, elements 1 and 2 within the array message
are created the moment we assigned a value to them. In the last line, message[3]
is accessed even though it hasn't been created yet. Awk will create the element message[3]
and initialize it to the empty string (so nothing appears).
Furthermore, elements can be deleted from an array. The following is an extension to the above example:
delete message[2] message[3]="night." print message[1], message[2], message[3]
Now, message[2]
no longer exists. It's as if you never gave it a value in the first place, so Awk treats the mentioning of it as an empty string. If you ran both examples together, the result would be:
Have a nice day. Have a nice night.
(Notice how the commas within the print
statement add spaces between the array elements. This can be changed by setting the built-in variable OFS
.)
Deletion
[edit | edit source]Some implementation, like gawk or mawk, also let the programmer to delete a whole array rather than individual elements. After
delete message
the array message
does not exist anymore.
String Index
[edit | edit source]You've already seen that Awk arrays use strings to select each element of an array, much like a dictionary.
translate["table"] = "mesa" translate["chair"] = "silla" translate["good"] = "bueno"
However, numbers are perfectly acceptable. As always, Awk simply converts the numbers into a string when necessary.
translate[1] = "uno" translate[5] = "cinco"
Things can get tricky, however, when arrays are accessed with decimal numbers.
problems[ (1/3) ] = "one third"
Could you access this element with problems[0.333]
? Nope. It depends on the contents of the built-in variable OFMT
, which tells Awk how to convert numbers into strings. A specific number of decimal places will be converted, the rest thrown away. In general, try to avoid indexes with decimal values, unless you are very careful to use the correct format (which can be changed).
Sparseness and Lack of Order
[edit | edit source]Awk arrays are sparse, meaning that you can have element 1 and element 3 without having element 2. This is obvious—Awk uses string indexes, so it makes no distinction about numbered elements.
More importantly, the elements in an associative array are not stored in any particular order.
There are two useful commands that allow you to check the elements within an array. We will learn more about them in the upcoming chapters, but for now let's look at some examples.
if( "Kane" in debts ) print "Kane owes", debts["Kane"]
for( person in debts ) print person, "owes", debts[person]
Looking back to the introduction (where a debts
array was created to associate people's names with an amount of money), we can see that Awk provides some useful commands to access arrays. The first one, if in
, lets us check if a particular element has been defined, then execute code based on that result. The second one, for in
, lets us create a temporary variable (called person
in this example) and repeat a statement for every element within an array.
Play around with these examples to see how Awk doesn't necessarily maintain a specific order within its arrays. Fortunately, this never really turns out to be a problem.
Dimensions
[edit | edit source]Awk arrays are only single-dimensional. That means there is exactly one index. However, there is a built-in variable called SUBSEP
, which equals "@
" unless you change it. If you wish to create a multi-dimensional array, in which there are two or more indexes for each element, you can separate them with a comma.
array["NY", "capital"] = "Albany" array["NY", "big city"] = "New York City" array["OR", "capital"] = "Salem" array["OR", "big city"] = "Portland"
These lines of code are exactly equal to:
array["NY@capital"] = "Albany" array["NY@big city"] = "New York City" array["OR@capital"] = "Salem" array["OR@big city"] = "Portland"
This is just a quick demo of multi-dimensional arrays. As you can see, these aren't really multi-dimensional; rather they are single-dimensional with a special separator. Multi-dimensional arrays won't be explored any further here because there are several technicalities that must be understood. You can still write useful Awk programs without them, but if you are curious about multi-dimensional arrays, feel free to consult your Awk manual (or just play around and see what works).
Functions
[edit | edit source]There are no functions which work with arrays in standard Awk. However, gawk offers three functions (A and B are supposed to be arrays):
- length(A) returns the length of A.
- asort(A[,B]) - if B is not given, sorts A. The indices of A are replaced by sequential integers starting with 1. If B is given, copies A to B, then sorts B as above, while A remains unchanged. Returns the length of A.
- asorti(A[,B]) - if B is not given, discard values of A and sorts its indices. The sorted indices become the new values, and sequential integers starting with 1 become the new indices. Like in the previous case, if B is given, copies A to B, then sorts B's indices as above, while A remains unchanged. Returns the length of A.
Practice
[edit | edit source]- Update the "coins" program that we wrote in the beginning of this book. Use arrays to keep a tally of the number of coins by country. Display the results along with the summary.
- Write the debtors program. It should scan a log file that lists transactions like "Jim owes 50" and "Kim paid 30". Using an associative array, keep a running total of all the money that people have borrowed and paid. Make sure that a person can appear several times within the file, and their debt will be updated appropriately. At the
END
, list everyone and their total. - Improve the program in #2 to delete a person's name from the array if they have paid everything that they owe. This way, the results won't be cluttered with people who owe zero dollars.
The next page gives a quick review of all the operators Awk has to offer.