DæmonNews: News and views for the BSD community

October 2001 Get BSD New to BSD? Search BSD Submit News FAQ Contact Us Join Us
Search


Get BSD Stuff

The Answer Man

Gary Kline <kline@thought.org>, David Leonard <leonard+answerman@mail.csee.uq.edu.au>, Dirk Myers <dirkm@teleport.com>

Who's Afraid of the CLI?

There are ways of solving problems that call for some serious C or C++ coding, or maybe just scripting with /bin/sh or perl; and then there are equally good solutions that use the strengths of the toolset included with Unix by default. In case you think solving a particular problem means having to write a script, we are going to show you how to use some of the power tools that come with your BSD operating system.

For our examples we're going to use sort, grep, uniq, awk, sed, and /bin/sh from the command line. We'll use ls, ps and mv as supporting players. Finally, we'll dip into perl as a command-line text processing utility.

Basic Plumbing

The most important thing to understand about the command line is the idea of a pipe. A pipe is simply an easy way to connect two programs together so that output from one program becomes input to another program. It's a simple, powerful concept.

Let's consider some simple commands that can be "fastened together" with pipes to produce some interesting results. The ls command lists the files in a directory. The grep command searches for text that matches a pattern, either in a file or on standard input. Both of these commands are useful without pipes, but they're even more powerful when they're connected.

The -l flag to ls produces long format listings. Among other things, these listings always show directories as entries beginning with a 'd':

  -r-xr-xr-x   1 root  wheel   30632 Aug 27 15:56 CCLEngine
  -r-xr-xr-x   1 root  wheel   28484 Aug 27 15:56 CrashReporter
  dr-xr-xr-x   3 root  wheel     264 May  1 15:17 MiniTerm.app
  -r-xr-xr-x   1 root  wheel   14940 Aug 27 15:54 atrun
  -r-xr-xr-x   1 root  wheel  142620 Aug 27 15:56 bootpd
  -r-sr-xr-x   1 root  wheel   24716 Aug 27 15:54 chkpasswd
  -r-xr-xr-x   1 root  wheel   14668 Aug 27 15:54 comsat
  -rwxr-xr-x   1 root  wheel    6903 Apr 26 13:49 create_nidb
  -rwxr-xr-x   1 root  wheel  125792 Aug 27 15:55 dnskeygen
  drwxr-xr-x   3 root  wheel     264 Feb 23  2001 emacs
  -r-xr-xr-x   1 root  wheel   14580 Aug 27 15:55 fingerd
  -r-xr-xr-x   1 root  wheel   80200 Aug 27 15:54 ftpd
  -r-xr-xr-x   1 root  wheel   17984 Aug 27 15:54 getNAME
  -r-xr-xr-x   1 root  wheel   24624 Aug 27 15:54 getty
  -r-xr-xr-x   1 root  wheel   22464 Mar  1 07:27 hdicompressd
  drwxr-xr-x  38 root  wheel    1248 May  8 17:13 httpd
  -r-xr-xr-x   1 root  wheel   24476 Aug 27 15:54 identd
  -rwxr-xr-x   1 root  wheel  101032 Feb 16  2001 kadmind
  -rwxr-xr-x   1 root  wheel   95044 Feb 16  2001 kadmind4
  ...

You can use the grep command and a pipe to produce a listing which shows only directories.

  % /bin/ls -l | grep ^d

What this command does is to run ls -l, and connect the output from the ls command to the standard input of grep command. The grep command prints each line that begins with a 'd'.

  dr-xr-xr-x   3 root  wheel     264 May  1 15:17 MiniTerm.app
  drwxr-xr-x   3 root  wheel     264 Feb 23  2001 emacs
  drwxr-xr-x  38 root  wheel    1248 May  8 17:13 httpd

(Obviously, you can season the ls command with other flags and produce other results. ls -lt and ls -la come to mind.)

Match, Trim, Sort

Whenever you need to sort through a program's output for lines that match a pattern, an easy way to do it is to use a pipe to grep. For example,

  % from | grep joe 

will let you know if mail from joe@hotnewthing.com is waiting in your queue.

  From joe@hotnewthing.com  Wed Aug 29 22:07:19 2001

Similarly,

  % ps -axuw | grep sendmail 

  root    225  12.0  0.1     1340    288 std  R+     0:00.05 grep sendmail
  root    220   0.0  0.2     1728    336  ??  Ss     0:00.06 sendmail -bd -q30

will tell you if sendmail is active.

The following tweak may make the above output easier to read by getting rid of the grep process that sometimes, not always, clutters up the output.

  % ps -axuw | grep sendmail | grep -v grep
  root    220   0.0  0.2     1728    336  ??  Ss     0:00.07 sendmail -bd -q30

This pipeline simply finds all the running processes which match 'sendmail', then takes that list of processes and uses the v option to invert the pattern and show only lines that don't match 'grep'.

You can trim the output even further to find exactly the data you're interested in. The awk language provides easy ways to manipulate the output from a pipeline. For example, if a line of output is separated into columns, awk will assign each column in a line to a corresponding variable -- the first column in $1, the second column in $2, and so on.

So, to find the process ID number of sendmail, we add a simple awk command to the end of the pipeline.

  %  ps -axuw |  grep sendmail | grep -v grep | awk '{print $2}'
  220

The awk command tells awk to print the second column of each line of input. Since the process ID number is in the second column, the pipeline above gives us the process ID# of sendmail. [1]

Likewise, since the owner is in the first column, this

  % ps -axuw |  grep sendmail | grep -v grep | awk '{print $1}'

will tell you who owns the process. Of course, you know the owner of the sendmail daemon is root. But who owns all the instances of vi or emacs among the users on the system you are on? Substitute vi or emacs for sendmail and you'll know. Or httpd to learn who owns the http daemon.

  % ps -axuw | grep vi | grep -v grep | awk '{print $1}' 
  dirkm
  dirkm
  dirkm
  root
  dirkm
  dirkm
  dirkm
  dirkm
  dirkm
  root
  dirkm
  dirkm

The drawback with the command above is that you get repeated names if someone owns more than one process running the same program. There's an easy way to delete the duplicates. The uniq program compares adjacent lines, and only keeps one copy of lines that are identical.

  % ps -axuw | grep vi | grep -v grep | awk '{print $1}' | uniq        
  dirkm
  root
  dirkm
  root
  dirkm

That's almost what we need, but it isn't quite enough. Since the ps program sorts its output by process ID number, it's quite likely that the processes owned by a given user won't show up together. To solve this problem, we use the sort command to sort the output from awk. That way, every instance of a specific username will be together, and uniq can do its thing.

  % ps -axuw | grep vi | grep -v grep | awk '{print $1}' | sort | uniq 
  dirkm
  root

Last in this series of simple command line glue-togethers, let's use du and sort. For the sake of this example, we assume that you have a development subdirectory off your home directory. If your name is John Q. Smith and your login is jqs, the path to your development directory might be

  /home/jqs/devel

You've been doing a lot of program development and documentation in your ~/devel tree and in recent days noted that there must be lots of junk files (objects, binary and other data files, and probably miscellaneous core files).

Being a rational and common-sense type, you would like to locate the subdirectories under ~/devel quickly so you know where the junk files are. The following command line hackery will find the most likely directories quickly.

  % du -h /home/jqs/devel | sort -rn

This calculates the sizes of the directories under /home/jqs/devel and sorts the directories in reverse numerical order, so that the largest directories are at the top of the listing.

Text Processing on the Command Line

For a more involved example of what command line programming can help you achieve, let's say that in /home/jqs/devel/sh.scripts are many Bourne-flavor shell script that you have written over the years.

Further, these scripts were quickly written. You used a simple colon (":") instead of the "proper" "#!/bin/sh" to tell the system that the files were meant to be executable. But now that you're cleaning up your system and putting things in order, you've decided to replace the initial ":" byte with "#!/bin/sh".

There are obviously many ways to do this, but here is a first shot. The steps required for this substitution are fairly clear:

  1. Identify every file that starts with a colon

  2. Get the list of these files.

  3. Make a backup of each file before changing it.

  4. Change the : to "#!/bin/sh".

The following will list files with lines that contain a single colon:

  % grep -l "^:$" *

Now we need a means of feeding this command line output to a subsitutor program like sed or perl. Prior experience tells us that two grave marks (aka "backtics") do this. The shell first runs the command inside backticks, and then passes the output of that command back to the shell. [2]

   % `grep -l "^:$" *`

The syntax for a simple substitution using sed is

  sed 's/oldpattern/newtext/'

but, since we want to replace a ":" with "#!/bin/sh" we need to use other pattern delimiters than the slashes. The sed command interprets the first character after the 's' as the delimiter, so we can pick any character we want. Since neither the pattern nor the new text use the percent-sign, that's as good a character to use as any. Our sed command becomes:

  % sed 's%:%!/bin/sh%'

Pasting on the grep, we have:

  % sed 's%:%#!/bin/sh%' `grep -l "^:$" *`

Running the above cmd in the ~/devel/sh.scripts directory we find that it does indeed work. Since sed writes its changes to standard output --to the screen--we see that the colons have been replaced by "#!/bin/sh" in each file. However, we may also discover a problem with this command. Every line in the file that consists of a single colon is changed to "#!/bin/sh". That's what we want if the line is the first line in the file, but we need to guarantee that no other lines are changed.

With sed, this is simple to do. Sed lets you specify a range of lines in front of each command. For this case, since we only want to run the command on the first line of each file, we simply add a 1 in front of the command, and run the command again.

  % sed '1s%:%#!/bin/sh%' `grep -l "^:$" *`

At this point, we could write a simple shell script that iterates over the output of the grep, and uses sed on each file:

  for i in `grep ^:$ -l *`
  do
    cp ${i} ${i}.bak
    sed '1s%:%#!/bin/sh%'<${i}.bak > ${i}
  done

The interesting part of this little script happens on the third line. The shell provides the backup file to sed as input (the '<${i}.bak' part of the line) and writes the output from sed to the original file (the '> ${i}' part of the line).

This works, but it's a lot of typing. However, perl provides convenient command line switches that can automatically update a file while saving a backup copy. Even better, the perl syntax for doing the substitution matches the sed syntax. So, we'll switch to perl.

Because the grep command is in backticks, it will feed a series of filenames to perl. We'll use the -p flag to tell perl to loop over the lines in each file. We want to change the file in place, so we'll use the i flag for that, and add a .bak to the end of the flag to tell perl to save backups in files that end with ".bak". The last thing is the -e that tells perl to execute the command from the command line rather than to expect the command on standard input.

When we add these flags, we have a perl command line that, simply enough, reads:

  perl -p -i.bak -e 's%:%#!/bin/sh%' `grep -l "^:$" *`

There's a problem here, though (and the shell script above has the same problem). The grep command matches any file that has a single colon on a line, and the perl substitution translates all these lines to #!/bin/sh. That's what we want if the line is the first line in the file, but how can we guarantee that a line somewhere else in the file isn't inadvertently changed?

What we need to do is add a little bit of logic to the perl expression so that the substitution is only run against the first line of the file. For this situation, it would be ideal if we could test whether the command is processing the first line in the file. Perl doesn't have an easy way to do that in this situation. However, perl does provide the keyword eof, which is true whenever perl is processing the last line in a file. So, we solve the problem by approaching it from the other direction. We set up a condition that's true for the first line in the first file. After that line, we make the condition false. When perl tells us that it's on the last line of the file we make the condition true again -- so that the condition is true for the first line of the next file.

We create this bit of logic:

  if (!$a) { s%:%#!/bin/sh% }
  $a++; 
  if (eof) { $a = 0 }

First, perl checks to see if $a is false. In perl, 0 is false, and any other number is true, so if $a equals 0, then perl runs the substitution. (Since perl automatically sets variables to 0 the first time it sees the variable, $a will be properly false for the first line of the first file.) Perl then adds one to $a. Finally, if the line being processed is the last line in the file, perl sets $a to 0.

The final, as-safe-as-we-know-how-to-make-it one liner is [3]

  % perl -pi.bak -e 'if(!$a){ s%:%#!/bin/sh%};$a++;if(eof){$a=0}' `grep -l "^:$" *`

A few seconds after we execute this command in the ~/devel/sh.scripts ' directory, all 51 files have been transformed; the original files are in files named with the ".bak" appended to them. These can easily be removed whenever you wish.

For final touch, you can move (or copy) the files that are shell scripts in ~/devel/sh.script that do not end in .sh to have that suffix. This is an almost trivial command line oneliner.

  % for f in `ls | grep -v ".sh"`; do mv ${f} ${f}.sh ; done

Final Words

As usual, we've only shown you a small sampling of the utilities available on the command line. Still, you can get an amazing amount of work done with the commands you use every day combined with simple pipelines, the looping the shell provides, and a few special-purpose utilities like sed. For more complicated processing, call on awk or perl.

1. Of course, we're only doing things this way for demonstration purposes. Sendmail saves its process id number in a file, typically /var/run/sendmail.pid. It's far more efficient to simply

  % head -1 /var/run/sendmail.pid

2. Almost like a pipe in reverse. One difference is that, in a pipeline, all the commands in the pipeline are started at once. The data moves through the pipeline piece by piece. With backticks, the shell keeps track of the entire output before passing the output to the next command. This is much less efficient. It doesn't matter much for small commands, but using backticks for a command that will generate a large amount of data is a bad idea.

3. In fact, perl does this so efficiently that the grep statement isn't actually necessary. It does a great job of weeding out files that we couldn't possibly be interested in, but every file in the directory is processed once with grep, and then again with perl. Instead, if we don't mind the possibility of getting backups of files that don't have any chance of changing, we could just run:

  % perl -pi.bak -e 'if(!$a){ s%^:$%#!/bin/sh%};$a++;if(eof){$a=0}' *

Or, even the slightly shorter (but somewhat more difficult to explain):

  % perl -pi.bak -e 'if($a++){$a=0 if eof}else{s%^:$%#!/bin/sh%}' *

About the Authors

Gary Kline has been porting code since the late 1970's. When he isn't hacking code, he's hacking prose or pretend poetry, or listening to jazz radio and slurping down espresso.

For four years he has been writing the software equivalent of a mind-machine, dubbed Muuz, and has already released some alpha code for FreeBSD. Check the FreeBSD ports tree if you are interested. A new release in due in the first quarter of the new century...with luck!

His most recent adventures include an ISDL link to the net, including the thrills of learning about the Domain Name System and network and mail administration.

[home|mail]

David Leonard is a PhD student in the Department of Computer Science and Electrical Engineering at the University of Queensland, Brisbane, Australia.

His area of research is QoS-adaptive component software architectures, and in his spare time is a developer for the OpenBSD project. That said, David enjoys living the quiet life with his wife, Kylie and cat, Mu. He especially enjoys frequenting Moreton Bay's many fabulous places to eat. Mmmmm!

[home|mail]

Dirk Myers does things with words, perl, and Unix.

[mail]




Author maintains all copyrights on this article.
Images and layout Copyright © 1998-2004 Dæmon News. All Rights Reserved.