29 April 2008

It's about time

Sweet marjoram! As Scott Gilbertson reports, David Hyatt and Daniel Glazman have prepared a proposal, suitable for discussion in the Working Group, for extending Cascading Style Sheets to support variables. In a gem of an understatement, Hyatt and Glazman write:
We expect CSS Variables to receive a very positive feedback from... the Web authors' community...

28 April 2008

Modulo treasure

Andrew Binstock interviews Donald Knuth, who has recently released volume 4, fascicle 0 of his life work The Art of Computer Programming.
...software methodology has always been akin to religion. With the caveat that there’s no reason anybody should care about the opinions of a computer scientist/mathematician like me regarding software development, let me just say that almost everything I’ve ever heard associated with the term "extreme programming" sounds like exactly the wrong way to go...with one exception. The exception is the idea of working in teams and reading each other’s code. That idea is crucial, and it might even mask out all the terrible aspects of extreme programming that alarm me.

(Link via The Code Project.)

22 April 2008

Fun with 88's: Part 2

My previous post introduced some of the syntax for defining local variables in COBOL. Now, we'll look at some procedural logic.

The PROCEDURE DIVISION consists of paragraphs of code, optionally organized into sections. Each paragraph consists of one or more statements and ends with a period (yes, a period: remember that the syntax was designed to resemble English). Paragraphs act like open subroutines in that control can fall into a paragraph and all program variables are accessible, so COBOL makes it easy to tangle your code into linguine—but you don't have to do it that way.

The EVALUATE statement is COBOL's switch, and the PERFORM statement executes a paragaph or a range of paragraphs. Common practice is to prefix paragraph names with a 4- or 5-digit number, which indicates where in the source code each is defined and how it fits into the execution hierarchy. As a result, programs tend to read top-down through the hierarchy rather than (my preference, which I adopted from Wirth) bottom-up.

A fragment of code for processing our name and address data from the previous post might be:



PROCEDURE DIVISION.
* * *
1200-LABEL-STATE.
EVALUATE CARD-STATE
WHEN "PA", "KY", "VA", "MA"
PERFORM 1210-LABEL-AS-COMMONWEALTH
WHEN "DC"
PERFORM 1220-LABEL-AS-DISTRICT
WHEN OTHER
PERFORM 1290-LABEL-AS-STATE
END-EVALUATE.

1210-LABEL-AS-COMMONWEALTH.
* * *
1220-LABEL-AS-DISTRICT.
* * *
1290-LABEL-AS-STATE.
* * *



Fairly readable, maintainable code, but a little brittle, should we need this classification scheme somewhere else in the program. Named conditions will help us out here.

The special level number 88 identifies a value or values that an elementary item might hold and a name for the condition that indicates that the item currently holds the value. Turning back to our example variable definitions:



* * *
01 CARD-IMAGE.
03 CARD-NAME-AND-ADDRESS.
05 CARD-NAME.
07 CARD-FIRST-NAME PIC X(10).
07 CARD-LAST-NAME PIC X(10).
05 FILLER PIC X(3).
05 CARD-ADDRESS.
07 CARD-ADDRESS-LINE-1 PIC X(15).
07 CARD-ADDRESS-LINE-2 PIC X(15).
07 CARD-CITY PIC X(10).
07 CARD-STATE PIC X(2).
88 CARD-IS-DISTRICT VALUE "DC".
88 CARD-IS-COMMONWEALTH VALUE "PA", "KY", "VA", "MA".
07 FILLER PIC X(8).
07 CARD-ZIP-CODE PIC 9(5).
03 FILLER PIC X(2).



Then our procedural code simplifies to:



PROCEDURE DIVISION.
* * *
1200-LABEL-STATE.
EVALUATE TRUE
WHEN CARD-IS-COMMONWEALTH
PERFORM 1210-LABEL-AS-COMMONWEALTH
WHEN CARD-IS-DISTRICT
PERFORM 1220-LABEL-AS-DISTRICT
WHEN OTHER
PERFORM 1290-LABEL-AS-STATE
END-EVALUATE.

1210-LABEL-AS-COMMONWEALTH.
* * *
1220-LABEL-AS-DISTRICT.
* * *
1290-LABEL-AS-STATE.
* * *



Now, if we find that requirements change, for instance that data for Mexico has to be supported, we have only one place that has to be updated to accommodate "DF" for the Distrito Federal:



07 CARD-STATE PIC X(2).
88 CARD-IS-DISTRICT VALUE "DC", "DF".
88 CARD-IS-COMMONWEALTH VALUE "PA", "KY", "VA", "MA".



(We'd have to make some other changes, too, but that's not my point here.)

88-level conditions can use sets of values that overlap, and ranges can be specified with the keyword THRU. Returning to the original example:



07 CARD-STATE PIC X(2).
88 CARD-IS-DISTRICT VALUE "DC".
88 CARD-IS-COMMONWEALTH VALUE "PA", "KY", "VA", "MA".
88 CARD-IS-13-ORIGINAL VALUE "MA", "NH", "RI", "CT", "NY",
"NJ", "PA", "DE", "MD", "VA", "NC",
"SC", "GA".



THRU is generally more useful with numeric data. Consider this contrived example of a tax calculation. The S in the PICTURE indicates the sign, and the V an implicit decimal point.



07 TAXABLE-INCOME PIC S9(6)V9(2).
88 BRACKET-IS-10-PCT VALUE 0 THRU 10000.
88 BRACKET-IS-15-PCT VALUE 10000.01 THRU 20000.
* * *



Aside: the "." symbol in the previous line is used in two very different ways: as a terminator period, as we've seen before, and as a decimal point in the numeric literal 10000.01. Getting these two right is one of the masochistic joys of machine-translating COBOL.

Next: to REDEFINE the union.

17 April 2008

Fun with 88's: Part 1

As programming tools have evolved, at each step I as an application developer am able to operate at a higher level of abstraction. A clickable onscreen button is a lot easier to work with than its corresponding region in video memory (my first hello-world assignment in C consisted of poking the letter "A" into a particular location on the 24x80 screen); an editable, sortable grid of data, bound to a database table and hosted in a web browser window, is in turn a huge advance over the HTML forms and tables it is replacing.

The workhorse business data processing languages of the 1960s and 1970s (COBOL, PL/1, FORTRAN to some extent) offered their own high-level abstractions. From today's vantage point, most of these technical advances appear quaint, crude, or worse. But some, like COBOL's named conditions, also known as 88-levels, provided a tidy solution in code to common processing problems, one that was not replicated by subsequent mainstream languages. It's easily my favorite feature of COBOL.

In this and subsequent posts, I'll take a little trip down memory lane to describe named conditions and how to use them. I'm a little bit rusty: when I was last actively using COBOL the predominant standard was COBOL85. That standard allowed lower case keywords and user-defined names, but COBOL just seems more COBOL-y in upper case.

In a COBOL program, procedural code and data specs are strictly segregated into the PROCEDURE DIVISION and the DATA DIVISION. Local variables are described in the WORKING-STORAGE SECTION of the DATA DIVISION.

To me, the fundamental unit of storage in COBOL is an 01-level group item, and it corresponds to a C struct. It is any number of individual elementary items arranged in a hierarchy. Within the hierarchy, a prefixing level number indicates what nests where; a PICTURE clause and a USAGE clause provide most of the physical typing information (how many bytes, what kind of data it can hold). An 80-column punchcard that holds name and address information might be represented in memory like this:



DATA DIVISION.
WORKING-STORAGE SECTION.
1 CARD-IMAGE.
3 NAME-AND-ADDRESS.
5 NAME.
7 FIRST-NAME PICTURE IS XXXXXXXXXX USAGE IS DISPLAY.
7 LAST-NAME PICTURE IS X(10) USAGE IS DISPLAY.
5 FILLER PICTURE IS XXX USAGE IS DISPLAY.
5 ADDRESS.
7 ADDRESS-LINE-1 PICTURE IS X(15) USAGE IS DISPLAY.
7 ADDRESS-LINE-2 PICTURE IS X(15) USAGE IS DISPLAY.
7 CITY PICTURE IS X(10) USAGE IS DISPLAY.
7 STATE PICTURE IS XX USAGE IS DISPLAY.
7 FILLER PICTURE IS X(8) USAGE IS DISPLAY.
7 ZIP-CODE PICTURE IS 99999 USAGE IS DISPLAY.
3 FILLER PICTURE IS XX USAGE IS DISPLAY.



Well, this first example demonstrates how COBOL got its reputation for being excessively verbose and hard to understand. So the first point to be made is that no practitioner would actually code it this way. COBOL syntax defines some noisewords like IS and has lots of synonyms and shortcuts. The USAGE IS DISPLAY clause (which specfies alphanumeric data) can be disposed of altogether. You can code vendor-specific variations on the USAGE IS COMPUTATIONAL clause to specify integer data; however, most arithmetic is done with fixed-precision decimal data, so there's rarely a need to specify floating point data.

The PICTURE clause is an early attempt at coding-by-example. Ten X's mean ten bytes of alphanumeric, five 9's mean five bytes of decimal data that can do arithmetic. A repetition factor in parentheses means what you'd expect. There are other lots of fancy symbols that can be used in PICTURE clauses, to specify a decimal point or to automatically insert commas and currency signs—it's not unlike a printf() format string.

Group items in the hierarchy can always be treated like an alphanumeric string of their component characters. So ADDRESS can be used anywhere you'd want to use the 55 characters that comprise it.

It's a convention that level numbers are indented to reflect the hierarchy, but it's not a requirement. Also, most smart coders leave some gaps in the level numbers, so that if future maintenance calls for an intermediate level (for instance, a city-state-zip group item), it's easy to add. Leading zeroes in the level numbers are also used by convention to make the layout nicer.

FILLER designates anonymous storage, and its name is usually extra-indented to make it disappear. COBOL isn't particularly good at syntax for managing namespaces, so it's conventional practice to prefix all elements with an abbreviated version of the name of the 01-level group item.

So a more realistic example, one that we might see in a real program, would look like:



DATA DIVISION.
WORKING-STORAGE SECTION.
01 CARD-IMAGE.
03 CARD-NAME-AND-ADDRESS.
05 CARD-NAME.
07 CARD-FIRST-NAME PIC X(10).
07 CARD-LAST-NAME PIC X(10).
05 FILLER PIC X(3).
05 CARD-ADDRESS.
07 CARD-ADDRESS-LINE-1 PIC X(15).
07 CARD-ADDRESS-LINE-2 PIC X(15).
07 CARD-CITY PIC X(10).
07 CARD-STATE PIC X(2).
07 FILLER PIC X(8).
07 CARD-ZIP-CODE PIC 9(5).
03 FILLER PIC X(2).



Next: level numbers that aren't level numbers.

Update: 18 April, followed standard COBOL terminology

16 April 2008

Teletype 33

Click through to page 4 of this photo essay by Michael Shamberg for Life's January 1970 number. This is totally me in ninth grade, learning BASIC (my first significant assignment was finding roots of polynomials by simple iteration). Especially the paper roll spilling onto the floor.

Shamberg visits the Rodman family of Ardmore, Pa., who have installed a Teletype 33 in their home and signed up for timesharing access to a mainframe in New Jersey. General Electric provides the service. The Rodmans apparently have access to permanent storage on the mainframe, whereas we students used paper tape to save our work. You can see the paper tape reader/punch attached to the Teletype unit, in the right side of the page 4 picture. The acoustic coupler modulator-demodulator is in the left side of the pic.

"For me, the main physical effect of having a computer at home is that I’m able to spend a lot more time with my family,” says Dr. Rodman, who is a lung specialist on the faculty of Temple University medical school in Philadelphia. “For all of us the real impact is mental. Programming a computer is like thinking in a foreign language. It forces you to approach problems with a high degree of logic. Because we always have a computer handy, we turn to it with problems we never would have thought of doing on one before.”


(Link via Boing Boing.)

02 April 2008

Zero bombs away

Coding Horror reminds us that Core War is still around.

Back in the mid 80's when I got my first Mac (a 512K model, a loaner from my company) and I was looking for a programming project, I noodled around with building a Redcode emulator, but nothing ever came of it.