11 April 2007

Software Tools

Shortly after I started working for Gary at Lupfer & Long, Peter lent me a copy of Software Tools, by Brian W. Kernighan and P.J. Plauger. Published in 1976, the book remains a classic text: beginning with programming in the small, the authors work up to the implementation of an automatic programming language translator—all in the span of fewer than 350 pages. Along the way, they illustrate the philosophy of the Unix operating system from the viewpoint of the applications programmer.

Granted, today's readers may find some of the material quaint, and be baffled by the absence of object-oriented concepts. The book was written at a time when structured programming and top-down design were key concepts of the curriculum, when programmers had to be cautioned against goto statements. (Dijkstra's seminal letter had been published only eight years before; it wasn't until decades later that the industry solved the goto-problem by devising widely-accepted languages that lacked the statement.) The book's power is that it accomplishes so much with nothing more than simple data structures, good algorithm design, and careful separation of function.

Kernighan and Plauger use a vendor-neutral language dubbed Ratfor (short for Rational Fortran) for their code examples. Influenced by the then-new language C (p. 318), Ratfor is a cleaned-up version of the various Fortran dialects of the era. In its lack of line-ending punctuation, it bears a slight resemblance to Python. Here is the word-counting program from page 15 (extracted from the source code, still online, astonishingly):

# wordcount - count words in standard input
character getc
character c
integer inword, wc

wc = 0
inword = NO
while (getc(c) ¬= EOF)
if (c == BLANK | c == NEWLINE | c == TAB)
inword = NO
else if (inword == NO) {
inword = YES
wc = wc + 1
call putdec(wc, 1)
call putc(NEWLINE)

The book introduces the concept of programs as filters—again, something familiar to us but rather revolutionary at the time—very early, in Chapter 2, with examples like entab, which replaces strings of blanks with tab characters; translit; and crypt; as well as other analogues of Unix command shell utilities. The shell syntax for a pipeline caps off this chapter. And from there, the material ramifies. The full table of contents:

  • Preface
  • Introduction
  • 1. Getting Started
  • 2. Filters
  • 3. Files
  • 4. Sorting
  • 5. Text Patterns
  • 6. Editing
  • 7. Formatting
  • 8. Macro Processing
  • 9. A Ratfor-Fortran Translator
  • Epilogue
  • Appendix: Primitives and Symbolic Constants
  • Index of First Lines
  • Index

Those of us who subscribe to the "Code is Poetry" philosophy appreciate the felicitously-named "Index of First Lines." In this case, since the first line of every program is a comment describing its function, the index is a synopsis of all the code in the book.

For modern audiences, the book functions not so much as a general reference as it does as a guide to thinking about how to approach a programming project. (And yet, for those of us today who rely on class libraries to do our sorting for us, Chapter 4, with its functioning implementations of Bubble Sort, Shell Sort, and Quicksort, may be the only sorting reference we ever need.) Everything is explained in a clear, simple, no-hype style, one that infuses the several titles that Kernighan has co-authored. Here's an example from very early in the book (pp. 14-15), dealing with the chewy problem of static analysis and testing:

How should we test linecount to make sure it really works? When bugs occur, they usually arise at the "boundaries" or extremes of program operation. These are the "exceptions that prove the rule." Here the boundaries are fairly simple: a file with no lines and a file with one line. If the code handles these cases correctly, and the general case of a file with more than one line, it is probable that it will handle all inputs properly.

So we feed linecount an empty file. The while test fails the first time and the body is never obeyed. No lines are counted when none are input. Fine.

If we feed linecount one line, the while is satisfied for every character on the line; the if is satisfied when the NEWLINE is seen, and the line is counted. Then the test is repeated and the while loop exits. Again fine.

A multi-line file. Same behavior as for one line, only now we observe that after each line, the program ends up at the test part of the while—the proper place to begin handling the next line, or EOF. The program checks out.

This may seem like excruciating detail for such a simple program, but it's not. There are common coding blunders which could have caused any one of those three tests to fail, sometimes even while the other two tests succeed. You should learn to think of boundary tests as you code each piece of a program—try them mentally as you write and then physically on the finished product. In practice, the tests go much quicker than we can talk about them and cost but a little additional effort. It is effort that is well repaid.

The intended audience for this book was originally undergraduates taking a second course in CS, but anyone who has moved beyond Logo or Excel macros would benefit from the book, at least its early chapters. And all of us can use a gentle reminder about boundary conditions.

The volume is put together without distracting book designer tricks to pad out the page count: goofy icons, space-wasting figures. It's nothing but text and code, with exercises and bibliographic notes at the end of each chapter. It probably helps that the authors set the text themselves using a PDP-11/45-driven phototypesetter.

A version of this book with Pascal as the coding language appeared in 1981. The newer book perhaps makes it easier to get up and running with the code examples, but it lacks that nifty Ratfor translator. If you've ever thought that it would be cool to build TECO from scratch, pick up a copy of this book.

No comments: