17 April 2008

Fun with 88's: Part 1

As programming tools have evolved, at each step I as an application developer am able to operate at a higher level of abstraction. A clickable onscreen button is a lot easier to work with than its corresponding region in video memory (my first hello-world assignment in C consisted of poking the letter "A" into a particular location on the 24x80 screen); an editable, sortable grid of data, bound to a database table and hosted in a web browser window, is in turn a huge advance over the HTML forms and tables it is replacing.

The workhorse business data processing languages of the 1960s and 1970s (COBOL, PL/1, FORTRAN to some extent) offered their own high-level abstractions. From today's vantage point, most of these technical advances appear quaint, crude, or worse. But some, like COBOL's named conditions, also known as 88-levels, provided a tidy solution in code to common processing problems, one that was not replicated by subsequent mainstream languages. It's easily my favorite feature of COBOL.

In this and subsequent posts, I'll take a little trip down memory lane to describe named conditions and how to use them. I'm a little bit rusty: when I was last actively using COBOL the predominant standard was COBOL85. That standard allowed lower case keywords and user-defined names, but COBOL just seems more COBOL-y in upper case.

In a COBOL program, procedural code and data specs are strictly segregated into the PROCEDURE DIVISION and the DATA DIVISION. Local variables are described in the WORKING-STORAGE SECTION of the DATA DIVISION.

To me, the fundamental unit of storage in COBOL is an 01-level group item, and it corresponds to a C struct. It is any number of individual elementary items arranged in a hierarchy. Within the hierarchy, a prefixing level number indicates what nests where; a PICTURE clause and a USAGE clause provide most of the physical typing information (how many bytes, what kind of data it can hold). An 80-column punchcard that holds name and address information might be represented in memory like this:



DATA DIVISION.
WORKING-STORAGE SECTION.
1 CARD-IMAGE.
3 NAME-AND-ADDRESS.
5 NAME.
7 FIRST-NAME PICTURE IS XXXXXXXXXX USAGE IS DISPLAY.
7 LAST-NAME PICTURE IS X(10) USAGE IS DISPLAY.
5 FILLER PICTURE IS XXX USAGE IS DISPLAY.
5 ADDRESS.
7 ADDRESS-LINE-1 PICTURE IS X(15) USAGE IS DISPLAY.
7 ADDRESS-LINE-2 PICTURE IS X(15) USAGE IS DISPLAY.
7 CITY PICTURE IS X(10) USAGE IS DISPLAY.
7 STATE PICTURE IS XX USAGE IS DISPLAY.
7 FILLER PICTURE IS X(8) USAGE IS DISPLAY.
7 ZIP-CODE PICTURE IS 99999 USAGE IS DISPLAY.
3 FILLER PICTURE IS XX USAGE IS DISPLAY.



Well, this first example demonstrates how COBOL got its reputation for being excessively verbose and hard to understand. So the first point to be made is that no practitioner would actually code it this way. COBOL syntax defines some noisewords like IS and has lots of synonyms and shortcuts. The USAGE IS DISPLAY clause (which specfies alphanumeric data) can be disposed of altogether. You can code vendor-specific variations on the USAGE IS COMPUTATIONAL clause to specify integer data; however, most arithmetic is done with fixed-precision decimal data, so there's rarely a need to specify floating point data.

The PICTURE clause is an early attempt at coding-by-example. Ten X's mean ten bytes of alphanumeric, five 9's mean five bytes of decimal data that can do arithmetic. A repetition factor in parentheses means what you'd expect. There are other lots of fancy symbols that can be used in PICTURE clauses, to specify a decimal point or to automatically insert commas and currency signs—it's not unlike a printf() format string.

Group items in the hierarchy can always be treated like an alphanumeric string of their component characters. So ADDRESS can be used anywhere you'd want to use the 55 characters that comprise it.

It's a convention that level numbers are indented to reflect the hierarchy, but it's not a requirement. Also, most smart coders leave some gaps in the level numbers, so that if future maintenance calls for an intermediate level (for instance, a city-state-zip group item), it's easy to add. Leading zeroes in the level numbers are also used by convention to make the layout nicer.

FILLER designates anonymous storage, and its name is usually extra-indented to make it disappear. COBOL isn't particularly good at syntax for managing namespaces, so it's conventional practice to prefix all elements with an abbreviated version of the name of the 01-level group item.

So a more realistic example, one that we might see in a real program, would look like:



DATA DIVISION.
WORKING-STORAGE SECTION.
01 CARD-IMAGE.
03 CARD-NAME-AND-ADDRESS.
05 CARD-NAME.
07 CARD-FIRST-NAME PIC X(10).
07 CARD-LAST-NAME PIC X(10).
05 FILLER PIC X(3).
05 CARD-ADDRESS.
07 CARD-ADDRESS-LINE-1 PIC X(15).
07 CARD-ADDRESS-LINE-2 PIC X(15).
07 CARD-CITY PIC X(10).
07 CARD-STATE PIC X(2).
07 FILLER PIC X(8).
07 CARD-ZIP-CODE PIC 9(5).
03 FILLER PIC X(2).



Next: level numbers that aren't level numbers.

Update: 18 April, followed standard COBOL terminology

No comments: