Monthly Columns
 

Managing websites using Unix, Part Two

Copyright © 1999 Nik Clayton

This is the second of several articles explaining how to use the tools provided by Unix and clones (such as the free BSD implementations, and the various different Linux distributions) to manage the contents of a website, such as the free webspace that ISPs often give to their customers.

There is nothing about the techniques described here (and in future articles) that limit them to small, personal, websites. The author has successfully used these approaches to manage sites with thousands of pages and a half dozen active webmasters working on the site.

In this article make(1) is introduced.


Introduction

In the previous article we started looking at the problem of managing a website using freely available software. The problem was broken down into a number of different areas, to be tackled one by one.

We then investigated CVS, and started seeing how it can be used to manage a directory structure full of files, and how you can use CVS to restore old revisions of files, or see what has changed between two different revisions. Using ``cvs diff'', and ``patch'' you also learned how to back out (or revert) changes that you have made.

The CVS basics covered so far are enough to get by, so this article will ignore CVS for the time being and concentrate on another essential tool in your Unix utility belt, make(1). Of course, the emphasis is on how you can use make(1) to help maintain your web site.


Pre-requisites

The sh(1) scripting language. While not essential, this will make it easier for you to follow some of the examples in this article.

The problem

You have created your web site in your work area (the mywebsite directory, if you followed the previous article). The work area will contain files and directories that you need to copy to your staging area (mystage) where you can test your site before finally uploading it to the machine that serves your website.

On the face of it, this seems to be a relatively simple task. After all, copying files from one directory to another is a simple application of cp(1), which is hardly rocket science. However, a little more thought reveals some interesting possibilities.

For example, consider graphics. Suppose for the moment that your site includes some high resolution images that you want to make available in JPEG format. At the same time, you would like to produce thumbnails of these images as GIF files so that visitors to your site can get an idea of what they will be seeing before they commit to downloading a large file.

Because the GIF files can be produced from the JPEG files there is no point in keeping both the JPEG and the GIF files in the repository. Instead, it would make more sense to just store the JPEG files, and create the GIF files from the JPEG files as necessary. This is something that would be useful to automate.

Or perhaps you have a Java applet to include? Again, it makes more sense to include the source code to the applet in the repository, and have the applet compiled from source when you install the site in to the staging area.

Or perhaps you include some large text files on your site for people to download, and you want to make them available in a number of different compression formats. Instead of storing zipped and gzipped versions of the documents in the repository, it would be smarter to include the uncompressed version in the repository, and have the compressed versions produced when you copy the files to the staging area.

As you can see, this is a little more complicated than simply copying files around.

Ideally you want a way of being able to express requests like

If foo.gif does not exist then create foo.gif from foo.jpeg
or
If foo.gif does exist, but foo.jpeg has been modified more recently then recreate foo.gif from foo.jpeg
or
If foo.txt exists, and foo.txt.gz and or foo.txt.zip does not exist then compress foo.txt to create the missing files.
or
Install everything in to the staging area, creating any files (such as foo.gif, or foo.txt.gz) that should exist but do not.
As you can see, we are starting to build up a list of rules that should be followed in order to create and populate the staging area. We are also implying an ordering to those rules--- the install everything rule can not run until the build GIF files from JPEG files rule and the compress foo.txt rule has completed.

This is exactly the sort of problem that make(1) is designed to solve.


make(1)?

make(1) was originally written to help support programming projects. The source code to all but the most trivial of programs is typically stored in more than one file (foo.c, bar.c, and so on for C source code). To build the executable, each file must be compiled to an object file (foo.o, bar.o, ...). Finally, these object files must be linked to form the executable file (the command that you would run).

If you have changed just one .c file, you do not want to have to recompile all of them, and then relink. It would be a great time-saver if you could express in some way ``Only recompile the .c files that have been modified since .o file with the same base name[1] was created, and then relink the program.''

This is very similar to the requirements outlined above, and could be rewritten as ``Only convert the JPEG files that have been modified since the GIF file with the same base name was created.'' or ``Only re-compress the .txt files that have been modified since the .txt.gz file with the same base name was created.''

You express these requirements in rules contained within a text file that make(1) reads when it is run. This file is normally called makefile or Makefile.

Which make(1) are you using?: There are several different make(1) commands available, often subtly incompatible. Which one you run when you type make depends on your system.

The BSDs use Berkeley Make (bmake) as their standard make(1). This is also known as Parallel Make (pmake) because it can run several tasks at once.

Linux uses GNU Make (gmake). GNU Make is subtly different from Berkeley Make, and a Makefile written for Berkeley Make may not work with GNU Make.

Unless otherwise indicated, all these examples use Berkeley Make. If you are using a Linux system you may already have Berkeley Make installed. Try running bmake or pmake and see if that works. If you do not have Berkeley Make installed you can download and install it by following the simple instructions at http://www.quick.com.au/ftp/pub/sjg/help/bmake.html.


Your first Makefile

We will start by creating a very simple Makefile. This Makefile will contain two rules, called rule-one and rule-two. The body of each rule will simply print out the name of the rule when it is called.

Change to your web site directory (mywebsite) and create a new file called Makefile.

#
# Sample Makefile
#

rule-one:
        echo "This is rule-one"

rule-two:
        echo "This is rule-two"
Note: Now that you have created this file you may wish to use cvs add to add it to your CVS repository, and then use cvs commit to confirm the addition.

You will be editing this file throughout the tutorial, and you should feel free to commit the changes (with an appropriate log message) when you want to.

Some things should be immediately obvious.

First, lines that begin with a # mark are comments, and are ignored by make(1).

Secondly, rules are defined by starting a line in the first column, writing the rule name, and then placing a colon (:) after the rule name.

Thirdly, the body of the rule starts on the line immediately following the rule name. The body can contain multiple lines (although in this example each body only contains one line). Each line of the rule body must be indented by at least one tab character. Spaces will not suffice. make(1) is extraordinarily picky about this[2].

Run make(1), and include the name of a rule to process on the command line:

% make rule-one
echo "This is rule-one"
This is rule-one
As you can see, make(1) shows you the command before it runs it. This is why there are two lines of output and not one in this example.

If you do not want a command to be echoed to the screen before being run, place an @ symbol before the command in question. If you do that, your Makefile will now look like this:

#
# Sample Makefile
#

rule-one:
        @echo "This is rule-one"

rule-two:
        @echo "This is rule-two"
If you process this by make(1) again you should see the expected output:
% make rule-one
This is rule-one
% make rule-two
This is rule-two
If you do not tell make(1) which rule to run it will use whichever rule appears first in the Makefile.
% make
This is rule-one

Rules or targets?

Programs like make(1) often have their own vocabulary. In the jargon of make(1), a rule is called a target.

Edit Makefile, and replace every occurrence of the word rule with the word target. You should then be able to do the following;

% make target-one
This is target-one
% make target-two
This is target-two

Dependencies

Targets can depend on one another. If target-one is listed as a dependency of target-two then its body will be run before the body of target-two.

Listing dependencies

Edit Makefile and add target-one to the end of the target-two: line:
#
# Sample Makefile
#

target-one:
        @echo "This is target-one"

target-two: target-one
        @echo "This is target-two"
Now run make(1) processing target-two:
% make target-two
This is target-one
This is target-two
Adding target-one to the target-two: line has listed target-one as a dependency for target-two. When make(1) processed target-two it sees the dependency listed on target-one, first. This is why the output from target-one appears first.

A target can have more than one target in its dependency list. If this is the case then make(1) processes the dependencies in the order that they are listed.

The template for a rule in a Makefile looks like

<target name> : <optional dependencies>
          <body of rule>
          ...
          <end of body>

Building chains of dependencies

Suppose that you wanted to introduce a new target, target-three. This new target should depend upon target-two. Since it depends upon target-two running, it will obviously depend upon target-one running as well.

You might think that this new target would be written like so:

target-three: target-one target-two
        @echo "This is target-three"
You could write this, and it will work. It is a maintenance problem though. Suppose you add another target to target-two's dependency list. You would have to update target-three's dependency list as well. This rapidly becomes difficult and time consuming when you have many targets.

Fortunately, make(1) does not require you to list all the dependencies like this. make(1) can infer the dependencies you omit.

Edit Makefile so it looks like this:

#
# Sample Makefile
#

target-one:
        @echo "This is target-one"

target-two: target-one
        @echo "This is target-two"

target-three: target-two
        @echo "This is target-three"
When you run this, you should see the following;
% make target-three
This is target-one
This is target-two
This is target-three
make(1) has tried to run target-three. In doing so, it has seen that target-three depends on target-two. So make(1) stops processing target-three and tries to process target-two. Then it sees that target-two depends on target-onemake(1) examines target-one, sees that is has no dependencies, and runs it. After running target-onemake(1) can go back to target-two, which can then complete, before finally running target-three.

So make(1) has worked out that target-three has a dependency on target-one without you needing to say so. The dependency on target-two was sufficient.


Explicit and implicit dependencies

In the previous example there were two types of dependencies involved, explicit and implicit.

An explicit dependency is one that you write into the Makefile. For example, the Makefile explicitly specifies a dependency of target-three upon target-two.

An implicit dependency is one that make(1) works out for itself. target-three implicitly depends upon target-one, because of the explicit dependency on target-two.


Targets are filenames!

The real power of make(1) comes when you realize that targets can also be filenames[3].

When make(1) is processing a target, it looks to see if a file with the same name as the target exists. If it does, then make(1) does not have to do anything.

This is because make(1)'s reason for being is to make files. The examples so far have worked (and will continue to work) because there have been no files in the same directory as the Makefile called target-one, target-two, and target-three.

You can put this to the test yourself. First, create a file called target-one (using touch(1)) and then tell make(1) to execute the target-one target.

% touch target-one
% make target-one
make: `target-one' is up to date.
As you can see, make(1) has seen that a file with the same name as the target exists in the current directory, and so refuses to run the body of the rule. Instead, it prints a message explaining that the file is up to date.
Note:touch(1) is a very simple command. You provide it a filename as a parameter. touch(1) sets the ``last modified'' time of the named file to be the current time. If the file does not exist then touch(1) creates it. Read the manual page for more information about touch(1).
This is a very important concept to grasp. make(1) will only run a rule for a target if;
    No file with the same name as the target exists in the current directory, or

    One or more of the files listed as dependencies do not exist, or were modified more recently than the target file.

That is probably a little confusing, and an example would help.

First of all, edit Makefile, and add the line touch @targetname to the end of each rule. Your Makefile should look like this;

#
# Sample Makefile
#

target-one:
        @echo "This is target-one"
        @touch target-one

target-two: target-one
        @echo "This is target-two"
        @touch target-two

target-three: target-two
        @echo "This is target-three"
        @touch target-three
Remove the file target-one if you have not done so. Now run make target-three and examine the output. Is it what you expected?

You should have seen the same output as the last make target-three. Specifically:

This is target-one
This is target-two
This is target-three
If you use ls(1) to examine the directory you should also see that three, zero-length files have been created, corresponding to the names of the targets.

If you run make target-three again, make(1) will print:

make: `target-three' is up to date.
Notice how this is different from every other time you have run make target-three. Previously, the target-three file did not exist, so make(1) was forced to re-run the body of target-three and hope that the body would create the target-three file. Previously, of course, it did not, and now it does.
Important: If the commands in the body of your target do not create a file with the same name as the rule then make(1) will always have to run your rule when you ask it to. If your rule does create a file with the same name as the rule, make(1) will not run your rule if the file exists---unless dependencies are newer than the target.
You can put this to the test. As you have seen, make target-three will not do anything.

However, target-three depends on target-two. If make(1) believes that target-two is newer than target-threemake(1) will have to run the rule for target-three in order to update the target-three file.

In other words, if the target-two file was modified more recently than the target-three file then make(1) will need to re-run the body of target-three.

Use touch(1) to update the time of target-two, and then rerun the make(1) command.

% touch target-two
% make target-three
This is target-three
As promised, make(1) has had to rerun target-three because target-two was newer.

Remember that target-three has two dependencies. It has an explicit dependency on target-two (because target-two is listed among the dependencies) and an implicit dependency on target-one (because target-two depends on target-one).

Therefore, if you were to touch(1) target-one, and then make target-three, you would expect both target-two and target-three to be run by make(1).

Test that, with:

% touch target-one
% make target-three
This is target-two
This is target-three
Finally, what happens if the target-one file is removed altogether and you make target-three?

make(1) will have to rerun the target-one target in order to recreate the file. This file will, in turn, be newer than target-two, so target-two will need to be run, prompting a rerun of target-three.

And the proof:

% rm target-one
% make target-three
This is target-one
This is target-two
This is target-three

Variables in a Makefile

You can define variables in your Makefile in order to simplify targets and dependencies. Variables are names given to strings of text. Wherever you use that string in the Makefile you can substitute a reference to the variable.

Defining variables

Variables are defined using the syntax VARIABLE=value to define a new variable (called VARIABLE) with the value value, VARIABLE+=value to append value to the existing definition of the variable.

A space is inserted between the two entries. So, if you had the two lines;

VARIABLE=foo
VARiABLE+=bar
then VARIABLE would be set to "foo bar", not "foobar".

Finally, VARIABLE?=value will assign value to VARIABLE, but only if VARIABLE has not already been defined.

Any spaces between the =, += and ?= are ignored. The following two variable declarations are identical.

VARIABLE=value
VARIABLE=   value
Variables can also be defined on the command line. For example, to set the variable VARIABLE to foo you could run make(1) like so;
% make VARIABLE=foo target-name

Referring to variables

After you have defined a variable, reference it as ${VARIABLE}.

This Makefile,

#
# Variable test
#

FOO=  foo
FOO+= bar

target:
        @echo ${FOO}
produces this output;
% make
foo bar
You also reference variables using $(VARIABLE) (round brackets instead of curly ones) or simply $VARIABLE. I recommend using ${VARIABLE}.

A Makefile to install your web pages

That was a very brief introduction to make(1). There are many features that have not been covered, and this article has only skimmed the surface of the features that we have covered. However, it is sufficient to let you write a Makefile that you can use to install your web pages from the working directory to the staging area.

Rather than present you with the finished product, the Makefile will be built up piece by piece so you can see the stages it went through, and how they build upon one another.


The simplest possible site, just one page

We will start with the simplest possible web site, consisting of just one page.

In your work area, create a file called index.html[4]. This is the file that will be installed by make(1). The content of this file can be anything that you like, it is not important for this demonstration.

Now, consider what the Makefile will need to contain. Obviously, a target of some kind. Convention dictates that since this target will be used to install files it should be called install. This target will have no dependencies, and the rule for this target should copy index.html from the current directory in to the staging area.

This target will not create a file called install, so make(1) will always have to run the body of the target when you run make install.

Note: If you have followed the previous article then your work area is called ~/www/mywebsite/ and your staging area is called ~/www/mystage/. Adjust these examples if you have used different directory names.
The simplest Makefile could then look like this;
#
# Sample Makefile to install one file
#

install:
        cp index.html ../mystage/index.html
Test that:
% make install
cp index.html ../mystage/index.html
and verify that index.html has been copied to ../mystage. Notice that because the cp line did not start with an @ sign the line was echoed to the screen before make(1) executed it. This can be useful when you are writing and debugging your Makefile, to ensure that commands are being run when you expect, and with the options you expect.

There are many ways in which this Makefile can be improved.

First of all, the staging area is hard coded into the body of the install target. If it was made a variable it becomes easier to change should the staging area ever change. It would also allow the path to the staging area to be entered on the command line if you ever wanted to test a new staging area.

So, what to call this variable? You might suppose that STAGE_DIR or STAGEDIR might be appropriate. These are good choices.

However, BSD heritage suggests that the directory to which the install target installs files is controlled by the DESTDIR variable, so I have chosen to use that for these examples. Add a line to your Makefile to define this variable, and alter the body of install to use it.

The new Makefile should look similar to this:

#
# Sample Makefile to install one file
#

DESTDIR?=  ../mystage

install:
        cp index.html ${DESTDIR}/index.html
Note that DESTDIR has been defined using ?=, so that it can be overridden from the command line if necessary. Also, see how the variable is used in the cp(1) command.

As before, you can test this with:

% make install
cp index.html ../mystage/index.html
Notice that the echoed command includes the value of the variable, rather than the variable's name.

You can now override the destination directory (staging area) defined in the Makefile via a command line parameter. To install the file in to /var/tmp, you can do:

% make DESTDIR=/var/tmp install
cp index.html /var/tmp/index.html

A site with more than one page

Your current website (or the website you are planning on building) will almost certainly consist of more than one file. So how do you get the install target to handle multiple files?

A simple solution is just to replicate the cp line, one for each page that will be installed;

install:
          cp index.html ${DESTDIR}/index.html
          cp foo.html ${DESTDIR}/foo.html
          cp bar.html ${DESTDIR}/bar.html
          ...
A more elegant approach is to put the names of the files to install in a variable, and then use some shell programming to iterate over the list of names to install (a for loop) and install them one by one.

A for loop in the sh(1) scripting language looks like this;

for variable-name in list-of-items; do
    code that forms the body of the loop
done
Put the following lines into a file called test.sh
for var in val1 val2 val3; do
    echo $var
done
Now, process this file using /bin/sh;
% /bin/sh test.sh
val1
val2
val3
As you can see, the body of the loop has been executed three times, each time $var was set to the next item in the list.

Transferring this to a Makefile is relatively simple. You will need a variable that lists the names of the HTML files to install, and a loop in the install target that carries out the installation.

Edit the Makefile so it looks like this;

#
# Sample Makefile to install multiple files
#

HTML=  index.html about.html

DESTDIR?= ../mystage

install:
        for htmlfile in ${HTML}; do \
            cp $$htmlfile ${DESTDIR}/$$htmlfile; \
        done
Note: If you recall the discussion on defining variables in a Makefile, you will remember the += construct. You could use that in this Makefile, and replace the line
HTML=  index.html about.html
with the two lines
HTML=  index.html
HTML+= about.html
To make(1), these are identical.
There are two important points that this Makefile illustrates that have not been touched upon yet.

Firstly, this shows the use of two different variable types. HTML and DESTDIR are make(1) variables as described earlier. $$htmlfile is a sh(1) variable. If you look back to test.sh you will see that only one $ symbol was needed in the body of the loop. Why do you need to use two in the Makefile?

make(1) parses the Makefile before it runs sh(1). One of the things that make(1) has to do is expand all make(1) variable references within the Makefile, and to make(1), $htmlfile would look like a reference to a make(1) variable, and make(1) would replace it with the value of the variable (likely to be the empty string, ``'').

However, if make(1) sees two $ symbols together, it replaces them with one, and then carries on. So $$htmlfile becomes $htmlfile by the time sh(1) sees it.

The use of lowercase for sh(1) variables and uppercase for make(1) variables is not required, but I find it a useful convention to follow which helps me avoid mistakes.

Secondly, the \ characters at the end of two lines in the loop.

When make(1) processes the Makefile, it treats each line in the rule as a separate command, and calls a different copy of sh(1) for each line.

Without the \ characters, therefore, sh(1) would be called three times in our example; the first time with for htmlfile in index.html about.html; do as the command, the second time with cp $htmlfile ../mystage/$htmlfile; as the command, and the third time with done as the command.

Because these are three different calls to sh(1), each one has no knowledge of the other two---this means the variables are not set and the loop does not run.

However, a \ at the end of each line instructs make(1) to treat the following line as a continuation of this line. Adding \ to all but the last line will force make(1) to treat these as one very long line to be executed by sh(1), so sh(1) sees the entire loop.

As before, test this Makefile. First you will need to create about.html. Either copy in one of your files from your website (if you have one), or use touch(1) to create it. Then run the install target.

% touch about.html
% make install
You should see the entirety of the for command echoed to the screen, and then index.html and about.html will be copied to ../mystage. If you do not want the entire line to be echoed to the screen you can place an @ symbol immediately before the for. You do not need to place one before the cp or done because, as already discussed, make(1) will treat these three lines as one long line because of the \ characters, so only one @ is needed.

Remember you can still override the directory that files will be installed to by including the DESTDIR parameter on the command line;

% make DESTDIR=/var/tmp install
will install both files in to /var/tmp.

Making sure that the installed files are readable

There is little point in putting up files on your web site if no one else can read them.

It is possible that you will occasionally create files in your work area that are only readable by you (i.e., their permissions are 600, and appears as -rw------- in an ls -l listing). When these files are copied to the staging area they will retain these permissions.

This is wrong.

In fact, even if the files in your work area all have the correct permissions (typically 644, or -rw-r--r--, i.e., you can read and write to them, everyone else can read them) this is still wrong for files in the staging area. One of the reasons for enforcing the distinction between the work area and the staging area is that files in the staging area should not be editable. If a file needs to be changed it should be checked out of the repository in to the work area, changed, and then installed[5].

The solution is to alter the install target so that an explicit chmod(1) is performed on all installed files to set the permissions to read-only for everyone.

Alter the install target so that it looks like this;

install:
          for htmlfile in ${HTML}; do \
              cp $$htmlfile ${DESTDIR}/$$htmlfile; \
              chmod 444 ${DESTDIR}/$$htmlfile; \
          done
As you can see, the chmod(1) command will be run after the file has been installed. Notice that it still needs a trailing \ character to indicate that this line continues up until the final done.

As before, test this yourself.


What now?

We have not finished exploring the possibilities offered by make(1), but this article is in danger of turning in to a full size tutorial. The next article will show how to use make(1) to install whole directories of files, instead of just the one directory we have done so far. It will also explain how to define dependencies to convert images from the JPEG format to the GIF format before installing them.

Meanwhile, you should experiment more with make(1). Add some more files to the directory, add them to the HTML variable, and confirm that the install target still works.

You could also investigate the install(1) command, which has features from cp(1)mv(1)chown(1), and chmod(1). In particular, try replacing the cp(1) and chmod(1) commands in Makefile with one call to install(1).

Remember that as long as you have committed your working changes to the Makefile, you can restore the Makefile at any time just by doing;

% rm Makefile
% cvs update Makefile
One last thing to try; make(1) variables are often used to store the paths to commands. That way, if you are using the Makefile on an OS where certain commands are stored in different places you only need to change the definition of the variable, rather than search and replace for each occurrence of the command name.

Try something like;

CP=     /bin/cp
CHMOD=  /bin/chmod

install:
        for htmlfile in ${HTML}; do \
            ${CP} -f $htmlfile ${DESTDIR}/$$htmlfile; \
            ${CHMOD} 444 ${DESTDIR}/$$htmlfile; \
        done
and experiment. See if you can work out why the -f parameter to cp(1) is required...

Notes

[1] The ``base name'' is the part of the filename without the last part of the extension. With foo.c, foo is the base name, .c is the extension. With foo.tar.gz, foo.tar is the base name, .gz is the extension. 
[2] The error message is not very enlightening either, typically Need an operator
[3] In fact, this why make(1) was written. The previous examples are a slightly unorthodox usage of make(1) as no files were created by the body of the targets. This can still be useful, as the install target will shortly demonstrate. 
[4] Or use the index.html created in the previous article. 
[5] This might seem a bit draconian, and there will be occasions where you want to get a change on to your live site as soon as possible, and then backport the changes to the repository copy of the site. You can still do this, but by making sure that files in the staging area are read-only, changing them requires a conscious effort to make the files read-write, and is therefore harder to do automatically. 

Nik Clayton, nik@freebsd.org