A Quick Introduction to Unix/My First Shell Script
What is a shell script?
[edit | edit source]So far we have been issuing Unix commands at the shell prompt. This is a very straightforward way of working but in some circumstances it isn't ideal. Suppose you have a file that you process in a particular, complex way - subjecting its contents to a string of different Unix processes? You can do this using pipes and redirects at the command line, but if you make a mistake you may have to start again. (Of course, trying out a complex procedure for the first time you were working on a copy of the data file, weren't you?). It would also be irritating if you needed to do this often, maybe regularly, and had to go through the process a command at a time with all the opportunities for making typos that this would provide.
Fortunately, Unix provides a quite simple way to avoid these situation. You can create a text file containing Unix commands; give it a name with the extension .sh and then execute the whole bunch of commands by invoking this filename at the command prompt. Let's start with a very simple example.
Creating a script
[edit | edit source]In the pico editor, create a file containing the following text, exactly as it appears here.
#!/usr/bin/bash ls -l .*
Save this file as hid.sh in your home directory. We are going to use this script at the command line, just like a command. This will list all the files and directories (and their contents) that begin with a dot (which is to say the hidden files and directories). The first line in the script (the exclamation mark in this context is called a "bang", so the first line begins with hash bang also called shebang) ensures that Unix can find the shell to execute this file.
Before we run it we have to deal with the permissions on this file so that it can be executed. Unix does not allow files to be executed by default (and this is a very good thing). The command to make the file executable is
% chmod 755 hid.sh
(I've used a shorthand here - 755 - to set the permissions for read and execute for group and other and write, read, execute for owner).
Now you can execute the commands in the file just by invoking the filename at the prompt:
% ./hid.sh
(I have to type ./ because this file isn't in the current path. For the moment I want to ignore this complication - it has nothing to do with shell scripting and everything to do with Unix environment variables).
A more useful script
[edit | edit source]To illustrate a rather more interesting shell script, I'm going to process a file called science.txt. I created this file by stripping all the images and formatting from the Wikipedia article on science. You are of course welcome to try the same.
scenario
[edit | edit source]From the point of view of "real Unix scripting", what I do next is rather unnatural. Unix power users would not create a shell file of the kind I do below but would definitely use the piping and redirecting of grep output directly on the command line. But my aim here is to dazzle, hopefully inspire, and teach a little. So, learn and pass on.
Let us imagine that you are a historian of ideas. You want to know how Wikipedia presents the development of the idea of science. To begin with we'll just look at the lines of the Wikipedia article on science that actually contain the word science (as I mentioned above, I'm using a file that I created that only contains unformatted text from the article). How can we find these lines using what we know about Unix? The answer, as I'm sure you knew, is with grep. To find all the lines that contain the word science we would issue the command
% grep 'science' science.txt
So now create a text file with this as the first line after the shebang directive. You can call it scisearch.sh. When you have saved your file and changed the permissions, test it. Does it do what you expected? If it doesn't correct it, if it does, carry on.
This could be quite interesting. But rather than just display the results on screen, it would be more useful if they were saved to a file. We can do this with a redirect. Open the file scisearch.sh and change it so that it reads
grep 'science' science.txt > scioutput.txt
When you have made this change test and if necessary amend your file again.
Now, this is already an interesting file and it illustrates something about shell scripting but we can improve it. At the moment the search is case sensitive so amend it to read
grep -i 'science' science.txt > scioutput.txt
so that it finds not just science but Science. As usual you should test. You probably could get away without testing this last change but in real life it really is a good idea to test a script after each change so that you fix problems quickly before they get too difficult to untangle - or debug as the jargon has it.
One final amendment suggests itself. Let's add line numbers so that if we want to check a reference to our search term in context we can easily find it.
grep -in 'science' science.txt > scioutput.txt
(Of course now you could use a copy of the original with the lines numbered - you could do this with cat -n and a redirect of the output) Now you should check the file scioutput.txt - use less or open it with Pico - to see that the contents are what you expect.
This is a fairly simple shell script. It's only real purpose is to illustrate the general principle of creating script files. However, I do think it's worthwhile reflecting for a moment on how you might go about doing this in Microsoft Windows.
Generalising the script with variables
[edit | edit source]Our script is fine as it is but it's rather specialised. Suppose that I wanted to carry out a similar process one day on a file about religion. One approach would be to create a new shell script with different files and search terms. This is not optimal though. A better approach would be to parametrize or generalise the existing script.
The shell provides you with a number of variable names to represent positional parameters. These are values that can be substituted into your script from the command line based on the order they are typed in. The variable $0 is reserved for the name of your script. We don't need that right now. Instead we are going to use three numbered variables to represent the search string, input file and output file for our data processing. They will be called by the names $1, $2 and $3 respectively in the script.
Amend your script file so that it reads
grep -in $1 $2 > $3
So how do we use this new version? At the command line we substitute the terms for the variable names. We might type
scisearch.sh 'religion' religion.txt reloutput.txt
which assumes that we are searching for the string religion in a file religion.txt and sending the output to reloutput.txt. In many flavours of Unix you could now go onto make your script more interesting - by adding context to your output, for example capturing not just the single lines but those that precede and follow them as well, but we won't go into that here. It would be as well now to rename our script since it no longer has anything particularly to do with science.
This new script still only introduces the most basic scripting idea, but you are perhaps now in a position to look at a more detailed introduction to Unix.
Learning More
[edit | edit source]You can learn more about scripting from the Bourne Again Shell Scripting book.