IEFBR14: 2008

29 December 2008

Linkages

I have a new personal/professional profile, one to take the place of my old TypeKey profile. For a small fee, the folks at Hover provide the redirection.

23 December 2008

For future reference: Joel Spolsky points to Scott Schiller's post that explains how to ask Internet Explorer 7 to use a sensible image resampling algorithm. The code snippet to go in your stylesheet is


img { -ms-interpolation-mode:bicubic; }

The result: smoother resized images.

17 December 2008

Recycled clockwork

James Randerson embeds video of Michael Wright's working replica of the Antikythera Mechanism.

Jo Marchant's book about the efforts to understand the clump of metal found in the Mediterranean a century ago, Decoding the Heavens, is scheduled for publication early next year.

(Link via Wired.)

16 December 2008

Not as cute as our receptionist

And here I thought they were only cupholders: Emmanuel Florac improvises a door-opening robot.

03 December 2008

Decisions, decisions

Tak Cheung Lam et al. explore the details of syntactic parsing of XML under four technologies: the well-known DOM and SAX and the lesser-known StAX (Streaming API for XML) and VTD-XML (virtual token descriptor)—with a preparatory exploration of character decoding and lexical analysis, processing steps that are common to all XML analyzers.

Since neither SAX nor StAX create in-memory representations of the complete document, they are not well-suited to applications that must transform the document, but they can be effective for simple streaming applications. StAX uses a "pull" model that puts the processing loop in the application, so many developers will find it easier to use than SAX.

By contrast, DOM and VTD are the tools to use if you need to rewrite the XML. As compared to DOM, VTD does not construct its in-memory representation with an object tree, but rather with lightweight arrays of 64-bit integers, and the article gives a one-figure sketch of how this works. The authors estimate VTD to be 5 to 8 times faster than DOM and to take up 20% of the space that DOM does, especially for incremental updates (but they don't back up these calculations with empirical measurements). They also speculate that VTD is a good candidate for hardware acceleration.

01 December 2008

Steampunk reader

Shorpy has a fabulous photographic image from 1917 of a punchcard tabulating machine in service at the Census Bureau. More images are promised!

25 November 2008

I detect a pattern here

I've had Coding Horror's post on regular expressions bookmarked for a while now, just waiting for the chance to take a few minutes to type "Right on!" For certain validation problems, a regex is the only way to go. At Vovici, I used them with the RegularExpressionValidator control to ensure that a text box was, say, filled in with a valid e-mail address or with a URL from a particular domain. And about once a quarter my colleague Cap would IM me with a request for a quick regex consult.

You can also use a regex to make sure that a text box is filled in with a valid date (in, MM-DD-YY, format, for instance), but in this case you're usually better off using a specialized date picker, for instance, one that presents a pop-up monthly calendar and all the user has to do is click a number.

The big problem with regular expressions is the proliferation of implementations and all the bells and whistles that come with. For example, we found a particularly useful pattern at RegExLib.com to match e-mail addresses that include the display-name part (as in "User, Joe" <joe.user@example.com>), but the pattern wasn't useful for client-side validation because it used features that depended on a browser-specific regex engine. So a reference book like Jeffrey Friedl's Mastering Regular Expressions is really handy to help you keep track of platform-specifics. By all means, use the contributed patterns form a site like RegExLib.com, but don't put a pattern into production that you don't understand yourself.

Another tool that you may find useful is Ivaylo Badinov's test harness for regular expressions, REGex TESTER.

Just to amplify a couple of Jeff Atwood's points:

Do not try to do everything in one uber-regex. I know you can do it that way, but you're not going to. It's not worth it. Break the operation down into several smaller, more understandable regular expressions, and apply each in turn. Nobody will be able to understand or debug that monster 20-line regex, but they might just have a fighting chance at understanding and debugging five mini regexes.

This is also good advice for smaller patterns, too. If you're trying to recognize U.S. telephone numbers, for instance, start with a pattern that recognizes area codes (something like /\d{3}/) and one that recognizes exchange and number body (/\d{3}-\d{4}/) and then put the two patterns together (into /(\d{3}-)?\d{3}-\d{4}/).

Regular expressions are not Parsers. Although you can do some amazing things with regular expressions, they are weak at balanced tag matching. Some regex variants have balanced matching, but it is clearly a hack—and a nasty one. You can often make it kinda-sorta work, as I have in the sanitize routine. But no matter how clever your regex, don't delude yourself: it is in no way, shape or form a substitute for a real live parser.

Exactly. Regular expressions are good for problems that call for a bounded degree of nesting: breaking up a file of XML into tokens that represent the element and attribute names, for instance. These problems are what the language translation people would call lexical analysis. For problems that permit arbitrarily deep nesting, like parsing the stream of XML tokens into a document tree, ensuring that each tag is properly closed and nested, you're doing syntactic analysis, and you need a tool like yacc.

18 November 2008

Fan belt?

The Australian Computer Museum Society has agreed to lend a 1960's-era IBM tape drive to the cause of recovering data on lunar dust that was collected on the Apollo XI, XII, and XIV missions, reports Nic MacBean. The 7-track IBM 729 Mark V drive is described as "in need of tender love and care."

(Link via Risks Digest.)

11 November 2008

A freebie

IEEE/Computer Society has announced a new benefit to members, to be available in December: free access to 600 titles in the O'Reilly Safari library. Depending on what is made available, this could save me some slots in my current 10-slot subscription.

And that do I get for the $15/month that I'm paying? Right now, I have these volumes on my virtual bookshelf:

Duthie and MacDonald, ASP.NET in a Nutshell, 2/e
Meyer, CSS: The Definite Guide, 3/e
Bergsten, JavaServer Pages, 3/e: I checked this out for a specific project, and will release it soon
Pogue, Mac OS X Leopard: The Missing Manual
Friedl, Mastering Regular Expressions, 3/e
Snell et al., MCPD Self-Paced Training Kit (Exam 70-547): this goes back once I pass the exam
Northrup et al., MCTS Self-Paced Training Kit (Exam 70-536): ditto
Pogue et al., Windows XP Pro Edition: The Missing Manual, 2/e

I can swap out anything after holding it for 30 days. Most important, I can change up to the next edition of a book without having to pulp the old one.

05 November 2008

Exam prep: 7

Well, there's been lots of excitement these past few weeks, in and out of the office, but I'm trying to keep my feet moving in my studies for the MCTS exam 70-536. This week I started chapter 8 of the standard prep guide: application domains and services.

30 October 2008

Letters, we get letters

All sorts of unexpected news and correspondence this week! I got a note from someone who had read my review of Herding Cats requesting other recommendations for readings in project management for someone aspiring to be a team lead. I took the opportunity to plug three of my favorite authors. I wrote (edited slightly):

Different shops call for different management styles, so YMMV. But take a look at Becoming a Technical Leader by Gerald M.
Weinberg. For that matter, just about anything Weinberg has written about programming and the psychology behind it is worth reading.

You're probably familiar with Steve McConnell's work. His Rapid Development provides a survey of the management techniques you
can use to improve the delivery of good software; some of the topics in Code Complete are also relevant.

Finally, DeMarco and Lister's Peopleware is good for helping you identify aspects of your office environment that are making you
and your team unproductive.

You may have noticed that two of these four titles are from Dorset House publishing. There's lots more good stuff to be found there.

16 October 2008

XSRF and me

Security is not my long and strong suit. But recent postings by Scott Gilbertson on clickjacking and by Jeff Atwood on strategies to counteract cross-site request forgeries (XSRF) caught my attention.

While there aren't any good countermeasures against clickjakcing yet, there are practices that you can follow to mitigate XSRF attacks. But doesn't ASP.NET take care of all that for me? Not really. Todd Miranda demonstrates, in a 20-minute video, how the exploit works against an ASP.NET site and shows some basic techniques to cope.

09 October 2008

Toolable?

Naomi Hamilton continues her randomly alphabetical interviews of language architects with a visit with Anders Hejlsberg, leader of C# development for Microsoft:

[I also learnt to] design the language to be well-toolable. This does impact the language in subtle ways – you’ve got to make sure the syntax works well for having a background compiler, and statement completion. There are actually some languages, such as SQL, where it’s very hard to do meaningful statement completion as things sort of come in the wrong order.

08 October 2008

Need a hint?

Nice set of tutorial brain teasers at Project Euler. Some of them would be simple enough to use as screening questions in a technical interview.

(Link via The Daily WTF.)

07 October 2008

Geezerbox

I took a long side trip from my family business in the Sacramento area to visit the Computer History Museum in Mountain View, spang in Silicon Valley. The donation-funded museum was relocated a few years ago from digs in Massachusetts.

Until early next year, the highlight of the collection is Difference Engine No. 2, constructed from Charles Babbage's plans for Nathan Myhrvold and on loan to the museum. Like everything else in the museum, this machine is vounteer-powered , one staffer taking a turn at the crank while the other explains the workings. Though the gear is equipped for printing (see detail at right), that part of its operation is not part of the demo, as it takes four hours to clean up every time.

Most of the equipment is hands-off, but you can have a seat on this Cray-1, located just outside the main exhibit hall.

Another highlight of the visit is the demonstration of a reconstructed PDP-1, Digital Equipment's first commercial system, docented by John Bohner and Peter Samson when I visited. The PDP-1, introduced in the early 1960s, was the first machine to feature a symbolic debugger, an amenity no doubt appreciated by Samson. As part of the restoration, he reverse-engineered paper-tape music files that had been serendipitously preserved in order to recreate a music synthesizer that he wrote while an undergraduate at MIT. The synthesizer resides in 4K of memory, which is also a good thing, because this model holds all of 12K 18-bit words.

Most all of the other boxes are not powered up, but rather are displayed warehouse-style in the main hall. (Imagine the heat generated by all of these boxes were they all running!) My graduate school days were brought back by the sight of a DECSystem-10 (at left). Those panels of switches are perhaps the only attractive industrial design to come out of the 1970s. And most of us, in one way or another, have crossed paths with an IBM System/360 (at right).

There are lots of smaller, newer, and older items, as well: a rack of HP calculators, Herman Hollerith's tabulation equipment, a rack of tubes from ENIAC, some game consoles, a Sage air-defense system (tube-based and inexplicably still in service through 1983), a Norden bombsight, an Enigma machine.

Except for a side exhibit of computer chess (and the PDP-1 demo), there isn't a lot of emphasis on software; for now, the museum is largely a repository of hardware. But, we hope, forthcoming fundraising will increase the level of interactivity at this gem of a museum.

30 September 2008

Spare some cycles?

Researchers at UCLA led by Edson Smith have announced the identification of the first Mersenne prime with more than 10 million digits, 2^43,112,609-1, as Thomas H. Maugh II reports.

22 September 2008

240 GLOC

Michael Swaine's "Is Your Next Language COBOL?":

To say that COBOL is widespread is an understatement. In 1997 the Gartner Group estimated that there were 240 billion lines of COBOL code in active apps. Something like 90 percent of financial transactions are processed by COBOL code, and 75 percent of all business data processing is COBOL. Merril Lynch reports that 70 percent of its business runs on COBOL apps.... One estimate puts the value of current running COBOL code at $2 trillion.

19 September 2008

BAL

Dan Wohlbruck continues his story of learning systems and programming in the 1960s. He learns IBM assembler using a new teaching device, "programmed instruction," something I haven't seen since I used it to teach myself a little calculus early in high school.

There were six or seven of us from the previous class that had been chosen to learn BAL and when we arrived at the Education Center, we were directed to our new classroom. The room had four rows of tables, enough chairs for the students, but no lectern for an instructor. Promptly at 9:00, two gentlemen, one from IBM and one from Bell Tell, arrived and explained that we were to be part of an experiment called "programmed instruction." We would be given paper-bound text books, Assembler Language coding pads, and pencils, but otherwise left on our own to learn a new generation of computer architecture and the language used to program it. Every 90 minutes an IBM expert would join us and ask us if we had questions. After a brief discussion with the expert, we would take a break.

17 September 2008

Finally

I don't know whether I'll participate much in Atwood and Spolsky's new collaboration site for software developers, Stack Overflow, but I'll say this much for it: it's the first site that I've walked up to and all I needed was an OpenID to register.

A workshop

We're looking into piloting the use of agile methods on some of our upcoming projects, so my director arranged a one-day workshop for the entire unit, including product management and documentation. It was led by Jeff Neilsen and John B. of local consultancy Stelligent. We got an overview of the agile approach to software development—about what you'd get from a few well-written articles and book chapters—and then dug deeper into the practice of user stories. A few of my notes and take-aways:

Short development cycles push people to find ways to be more efficient.

One of Jeff's clients calls refactoring "entropy reduction."

A rule of thumb for how big a user story should be: small enough to build six to twelve of them in a one- to two-week iteration. Of course, how much work this is depends on how many people you have on the development team.

Many of the practices suggested by agile practitioners seem counter-productive—scrum rooms, for example, with multiple conversations going on at the same time. Try it anyway: if it doesn't work for you, then drop it.

Agile's strength is that it expects requirements to change, and it explicitly provides for this, at the end of each iteration.

These techniques are best applied domains where the cost of failure is low: think shoestring-capitalized dot-com startups, not avionics.

I haven't seen much from the literature on applying agile methods to projects that are largely integration of third-party packages, nor to projects that are building APIs or frameworks with no user interface.

But what impressed me most about Jeff's presentation was his effective use of PowerPoint. To linger on a key point, generally he used a slide that consisted of a stock photo, full bleed, with oversized type reversed out of the image, something like those Miller Beer ads from a few years ago. (It was a photo of a tray of Krispy Kreme donuts labelled in Chinese that caused me to take notice.) His slides use little or no chrome—by that I mean those distracting standardized frames that corporate messaging departments insist on. Some of his slides reproduce a very small Stelligent logo in the lower left corner. About the only consistent design element is the oversized sans serif typeface that Jeff used; it looked something like Tahoma. This meant that he could incorporate disparate graphic elements from a lot of different sources (diagrams, mostly, and some Dilberts), of different qualities, and the design maintained unity. The effect was engaging without being too slick.

06 September 2008

Pretty pictures

Anne Eisenberg reports on the IBM Many Eyes project, a contributor-driven site for data visualization.

28 August 2008

Progress report

Cyrus Farivar reports on the accomplishments, intended and otherwise, of the One Laptop Per Child project.

27 August 2008

Memory lane

Deirdre Blake reposts a status report from 1995 on the draft COBOL 97 standard, which introduced object-oriented features into the language.

22 August 2008

753

Not pretty, but gets the job done: I passed Microsoft certification exam 70-528.

19 August 2008

Top box

I'm in the process of migrating articles from an internal wiki to a new platform (I much prefer the one we're leaving, and I prefer MediaWiki to both, but it's not my call). Anyway, I stumbled over an entry that I wrote a few months ago about top box and bottom box statistics. Top box analysis, as far as I can tell, is fairly popular in the market research industry. It's often used with measures of customer satisfaction. It's a simple tool, but it hasn't received a lot of rigorous academic attention, so there isn't a lot of information available online. And, unfortunately, the only way to search for it is with "top box -office -set".

08 August 2008

Pre-360

Dan Wohlbruck has started a series of columns on his early days as a programmer in the 1960s, starting with training on the IBM 1401 system. The 1401 used a model 1402 80-column card reader-punch for input and a 1403 printer for output.

Once the card was read, where in memory are the 80-columns of data placed, you wonder? In positions 1 through 80, of course. The 1401 mapped the first 333 positions of memory for card input (1-80), card output (101-180), and a print line (201-332). The 333rd position of memory could be used for printer channel control. If you are scheming how to use those leftover positions from 81-100 and 181-200, you are ahead of the game.

05 August 2008

Exam prep: 6

Microsoft has reopened the window for its Second Shot program. This time around, there's no limit on the number of exams you can take.

19 July 2008

Windows XP and Samba

I bought a network hard drive for my home setup, largely so that I would have something for backing up my laptop. The laptop (named Mulligan) is an HP Pavilion ze2000 running Windows XP Professional at Service Pack 3, while the workhorse machine on my home network is a dual-G5 Mac running Leopard (Mac OS 10.5), named Dedalus.

Over the holiday weekend, I brought the hard drive home (an Iomega Home Network 500GB), plugged it in, configured it over the web interface from Dedalus. I named the drive Boylan, did a Go > Connect to Server... from Finder, and It Just Worked. The network drive shares files with the Samba protocol (a/k/a SMB, or Server Message Block), and Mac OS speaks fluent Samba.

Not quite so easy connecting from the Windows machine. I tried using the Iomega-provided Discovery Tool Home to mount the shared folders. The tool could find the drive and folders, but when I picked a folder and clicked the button, the tool popped up a dialog with the rather unhelpful message "Exception Error."

I fiddled with the port settings on the firewall that I have running on on Mulligan (Norton), but no luck. I disabled the Norton firewall and went back to Windows Firewall, but no joy again.

I started a chat session with a support rep from Iomega and learned the first secret: the Discovery Tool is just a GUI for Tools > Map Network Drive..., because that's all that the rep used to work the problem. Unfortunately, Map Network Drive... was equally unable to mount the shared folders, and provided no clue as to what was not configured correctly. He asked me to check some settings on the router, and at the moment I had a glitch logging into the router, so I had to close the chat session.

The next day I had some free time to do some searching, and I turned up a page of David Lechnyr's Unofficial Samba HOW TO that taught me the second secret: Tools > Map Network Drive... is just a GUI for the console command net use. So I tried


net use e: \\192.168.0.101\public

and I got back the message "System error 1231 has occurred." Finally a clue! More searching, and I found a nicely comprehensive list of System Errors and troubleshooting tips. The page for 1231 reads, in part:

...when we checked the Properties of the LAN, we found the Client for Microsoft Networks and File and Printer Sharing for Microsoft Networks were disabled.

I bounced off to the Properties sheet for my network connection, and indeed, Client for Microsoft Networks was disabled. I checked that box, rebooted, and much satisfied mounting of folders was mine. Furthermore, now both Boylan (the hard drive) and Mulligan (the laptop) show up in My Network Places under Microsoft Windows Network and the node for the workgroup.

So, while File and Printer Sharing for Microsoft Networks enables other devices to access shared external resources from the local machine, Client for Microsoft Networks works in the other direction—it enables the local machine to access external resources.

17 July 2008

Yet another yacc post

Naomi Hamilton jumps way out of alphabetical order to interview Stephen C. Johnson, developer of the yacc parser generator, for her A-Z series.

YACC made it possible for many people who were not language experts to make little languages (also called domain-specific languages) to improve their productivity.

And that is exactly what happened in my case: with no prior academic preparation, I used yacc to build a translator for a COBOL-like reporting and OLTP application language called XPL. (That is, I used yacc, lex, and every trick I could find in a couple of the seminal books on compilers that were available in the early 1990s: the 1986 edition of Aho et al. ["the dragon book"] and Schreiner and Friedman's Introduction to Compiler Construction with Unix, which we called "the unicorn book.")

I did this work for a company called Magna Software, now defunct, which has left few online traces of its existence. I've lost my conversancy with language translation and compilers, so I don't know how it's done any more—I doubt that yacc could support the interpret-on-fly behavior that Visual Studio gives me.

14 July 2008

My Toolbox

Back when I was a wee graduate student tapping away at a borrowed Texas Instruments Silent 700 terminal for beer money, the scope of my development environment was rather small. One language to code in (SPSS), one text editor (SOS—I tried to teach myself TECO and failed), the SPSS executive, and the DEC-10 command shell—DCL, I think it was. I have more gear to manage now.

Languages

My assignments in this Microsoft shop call for declarative and/or imperative code in the following languages:

C#: we're using version 2.0 of the .NET framework
SQL: both DDL and DML; we do most of our development against SQL Server, but we also support Oracle
HTML: the app itself uses XHTML 1.0 Transitional, and the generated surveys are HTML 4.0 Transitional
JavaScript: we don't rely heavily on JavaScript, at least not directly, but we do use it as a glue language
CSS
XML

We use the little languages regular expressions and XPath expressions. ASP.NET markup to declare a server control or to code a @Page directive is a little language too, but isn't it interesting that it doesn't have its own name?

We have legacy code that uses Delphi and XSL/T.

Compilers, etc.

At any given moment, my desktop may have windows open for the following tools. In the course of a week, it's almost certain that I will use all of these.

Visual Studio 2005
SQL Server Management Studio
Perforce Windows Client, for access to the source code repository

Perforce's means for coding client specs is its own perplexing little language. We depend a lot on the context menu pick Create/Replace Using as Template...
Firefox 2, equipped with ColorZilla, Firebug, Flashblock, View Source Chart, and Web Developer
Internet Explorer 7, equipped with IE7Pro and IE Developer Toolbar
a VMware workstation running Internet Explorer 6, for more browser compatibility testing
Remote Desktop Connections to other servers
a paint program for managing screen shots: my GUI team lead likes Jasc Paint Shop Pro
UltraEdit: clunky like a Swiss Army knife with too many blades, but it handles .csv files that Excel can't, and it can make XML readable
any of two or three different file compression utilities
a command line

We're starting to use Team City for automated builds. Some people on the team swear by ReSharper, but I find that it just slows down my compile and test cycle. Every once in a while I'll need to pop open the Windows Services manager, or the IIS Manager to set up a new web site. My boss likes Beyond Compare, so it's installed on my machine, but I'm lazy and never bothered to learn how to use it; I depend on the default comparison tool provided by Perforce. Now that I look in there, I see that there's a lot of stuff in my Start Menu that I've never used. And sometimes I'll have a pattern-search chore than I can't use Visual Studio or Windows Explorer's Search for (maybe wading through web server log files), and then I'm glad that I have a copy of Windows Grep.

Third Party Controls and Libraries

For everything from managing Ajax interactions, to business graphics, to file-format translations, to fancy web controls, we license tools from ComponentArt, Dundas, Aspose, Telerik, and Steema Software. And, as I posted earlier, we built the Community Builder Module with DotNetNuke.

Communication

When I log in, Microsoft Outlook and AOL Instant Messenger automatically launch. I treat IM software as a necessary evil, as most of the rest of the team likes to use it.

We use FogBugz for bug tracking, and we've just started using its integrated wiki for documenting procedures, tracking issues that are more complicated than individual software defects, and advance planning. I really prefer wiki software that puts you in control of the final product (like MediaWiki, which powers Wikipedia), but we chose a tool that could be picked up by a broad base of users.

Lest we forget, for project specs and planning: Microsoft Office, primarily Word, Excel, and Project, with a smattering of Visio.

The Desktop

Sitting on my desk, holding up my telephone, is a PC running Windows Server 2003. The shelf over my monitors is holding about a dozen books, but I only pull out three or four of them. The one I go for most is Spainhour & Eckstein's Webmaster in a Nutshell. For more details, I rely on my Safari subscription. And, perhaps most importantly, a big desk pad loaded with paper.

02 July 2008

Some assembly required

Dimomidis Spinellis constructs an emulator of the Antikythera Mechanism with the Squeak EToys multimedia authoring environment—and a lot of overlapping polygons.

Having the gears as polygons makes modeling their interactions child's play. Etoys has a built-in primitive to locate overlapping objects. Thus, on each time step, I simply look for overlapping polygons and rotate them in the appropriate direction until they no longer overlap.

EToys runs, among other places, on the XO laptop of the OLPC intiative. A download of the emulator's project file is available from the author.

25 June 2008

Exam (re)prep: 5

So, the short answer is that I did not pass the certification exam that I sat a couple of weeks ago. As far as I can tell, I scored acceptably high on the web section, did okay on the foundation material, but I tanked on the analysis and design section (70-547).

I'm going to pass up the free-retake offer, which expires at the end of this month. I just don't have time to prep that much material. Rather, I'm going to take the three exams separately--probably 70-528 first, in August, and 70-536 in September. Those two together are good for the MCTS credential, and then we'll see about advancing to the MCPD.

I gave a presentation to my development workgroup (about 16 guys) about my current experiences and generally about how the program works. I was a little surprised that more than a couple guys were interested enough to ask questions.

02 June 2008

A sharp, simple tool

Naomi Hamilton interviews Alfred V. Aho, co-creator of AWK.

AWK has inspired many other languages, as you've already mentioned. Why do you think this is?

What made AWK popular initially was its simplicity and the kinds of tasks it was built to do. It has a very simple programming model. The idea of pattern-action programming is very natural for people. We also made the language compatible with pipes in Unix....

Another advantage of AWK is that the language is stable. We haven't changed it since the mid-1980s. And there are also lots of other people who've implemented versions of AWK on different platforms such as Windows.

According to Aho, AWK is still one of the 30 most popular programming languages. Just a couple of weeks ago, I pulled out my copy of The AWK Progamming Langauge (wow! the book is much more expensive now) and wrote a 3-line program to generate a 5000-row CSV file of test data.

(Link via CodeProject.)

27 May 2008

Exam prep: 4

So I've finished my read-through of the Microsoft training material, and I have my exam scheduled for Friday, so that I can take advantage of the Second Shot program. A 4-hour exam, and material I've been trying to absorb for several months: I'm pretty sure that I will need to exercise my free retest option.

08 May 2008

A directory: 1

Via wood s lot, Jessica Hupp lists her top 100 blogs in user-centered web design.

06 May 2008

Fun with 88's: Part 3

Continuing my series of posts on tips and tricks with COBOL named conditions (part 1, part 2), let's look at the REDEFINES keyword, which works like a struct, and the OCCURS keyword, which defines an array.

Let's say that requirements for our toy name and address processing application have changed again, and that Canadian addresses and post codes must be supported. We'll put a record type indicator in the unused space at the end of the card, and lay out the city-state-zip storage differently depending on the record type. Remember how we left some space in the level numbers for maintenance? Here's a case where the practice comes in handy:



01  CARD-IMAGE.
    03  CARD-NAME-AND-ADDRESS.
        05  CARD-NAME.
            07  CARD-FIRST-NAME          PIC X(10).
            07  CARD-LAST-NAME           PIC X(10).
        05      FILLER                   PIC X(3).
        05  CARD-ADDRESS.
            07  CARD-ADDRESS-LINE-1      PIC X(15).
            07  CARD-ADDRESS-LINE-2      PIC X(15).
          06  CARD-USA-AREA.
            07  CARD-CITY                PIC X(10).
            07  CARD-STATE               PIC X(2).
                88  CARD-IS-DISTRICT     VALUE "DC".
                88  CARD-IS-COMMONWEALTH VALUE "PA", "KY", "VA", "MA".
            07      FILLER               PIC X(8).
            07  CARD-ZIP-CODE            PIC 9(5).
          06  CARD-CANADA-AREA.
                                         REDEFINES CARD-USA-AREA.
            07      FILLER               PIC X(10).
            07  CARD-PROVINCE            PIC X(2).
            07      FILLER               PIC X(6).
            07  CARD-POST-CODE           PIC X(7).
    03  CARD-RECORD-TYPE                 PIC X(2).
        88  CARD-IS-USA                  VALUE "US".
        88  CARD-IS-CANADA               VALUE "CA".

Granted, there is opportunity for the record type indicator to disagree with the way the storage is used: object-oriented languages do have something to offer here.

We don't have to provide a separate name for the city part of CARD-CANADA-AREA: we can use CARD-CITY to refer to characters 54 through 63, irrespective of record type.

Now, let's say that we want to print the city and state part of the card image, separated by a comma, with the trailing whitespace squeezed out, for example, "New York, NY". (The STRING statement can also be used to do this, but that's a post for another day.) We can use OCCURS to treat the characters of CARD-CITY as an array (1-based) of 10 characters, and similarly for CARD-STATE.



* * * 
            07  CARD-CITY.
                09  CARD-CITY-CHAR       PIC X(1)
                                         OCCURS 10 TIMES.
                    88  CARD-CITY-CHAR-IS-SPACE
                                         VALUE " ".
            07  CARD-STATE.
                88  CARD-IS-DISTRICT     VALUE "DC".
                88  CARD-IS-COMMONWEALTH VALUE "PA", "KY", "VA", "MA".
                09  CARD-STATE-CHAR      PIC X(1)
                                         OCCURS 2 TIMES.

TIMES is another noise word that usually isn't coded. We'll need an output area and a couple of indexes:



01  OUTPUT-STRING.
    03  OUTPUT-CHAR                      PIC X(1)
                                         OCCURS 13.
01  IEND                                 PIC S9(4) USAGE COMP.
01  IFROM                                PIC S9(4) USAGE COMP.
01  ITO                                  PIC S9(4) USAGE COMP.

Now we're ready to write some procedural logic. PERFORM... VARYING makes a counted for loop. MOVE is the workhorse assignment statement: notice that the "left hand side" is actually coded on the right.



PROCEDURE DIVISION.
* * *
*   Initialize result and its indexer
    MOVE SPACES TO OUTPUT-STRING
    MOVE ZERO TO ITO
*   Scan from the end of the city area until 
*   a nonspace character is found
    PERFORM VARYING IEND FROM 10 BY -1
        UNTIL IEND < 1
        OR NOT CARD-CITY-CHAR-IS-SPACE(IEND)

        CONTINUE
    END-PERFORM

*   [Some exception-handling logic for the case 
*   in which the city portion is completely blank
*   could be written here.]

*   Copy the city, one character at a time, to the output
    PERFORM VARYING IFROM FROM 1 BY 1
        UNTIL IFROM > IEND
        
        ADD 1 TO ITO
        MOVE CARD-CITY-CHAR(IFROM) TO OUTPUT-CHAR(ITO)
    END-PERFORM
*   Copy the comma
    ADD 1 TO ITO
    MOVE "," TO OUTPUT-CHAR(ITO)
*   Copy the state
    PERFORM VARYING IFROM BY 1 BY 1
        UNTIL IFROM > 2

        ADD 1 TO ITO
        MOVE CARD-STATE-CHAR(IFROM) TO OUTPUT-CHAR(ITO)
     END-PERFORM

There's all sorts of things we could do to improve and simplify this logic. One change would be to add code to the WORKING-STORAGE SECTION to define the entire card image as an array of 80 characters. It's common that more experienced COBOL programmers write proportionally more code in the DATA DIVISION than they do in the PROCEDURE DIVISION.

A warning, once again: all of the above code is from memory, and hasn't been subjected to compilation or testing. In particular, I don't remember for certain whether 88's can be applied to group-level items as I did above with CARD-STATE.

05 May 2008

Community Builder

Version 3.1 of EFM Community has been released. Along with a heapin' helpin' of bug fixes, this version incorporates a Community Builder module, which

...will allow organizations to quickly and cost effectively create and manage online community panels and provide a voice to customers, employees and other constituents.

01 May 2008

Happy birthday, BASIC

John Kemeny and Thomas Kurtz did the first load and go of a BASIC compiler on this day in 1964 (DTSS timesharing and an interpreter were to come shortly thereafter), as Randy Alfred reports,

Exam prep: 3

I've completed my read-thru of the training material for MCTS 70-528. I still have all the security and crypto material from MCTS 70-536 to read. I'm planning to schedule the exam for the end of this month. Pass or fail, I'm doing a 5-minute presentation at an upcoming monthly group meeting on the Microsoft certification process.

29 April 2008

It's about time

Sweet marjoram! As Scott Gilbertson reports, David Hyatt and Daniel Glazman have prepared a proposal, suitable for discussion in the Working Group, for extending Cascading Style Sheets to support variables. In a gem of an understatement, Hyatt and Glazman write:

We expect CSS Variables to receive a very positive feedback from... the Web authors' community...

28 April 2008

Modulo treasure

Andrew Binstock interviews Donald Knuth, who has recently released volume 4, fascicle 0 of his life work The Art of Computer Programming.

...software methodology has always been akin to religion. With the caveat that there’s no reason anybody should care about the opinions of a computer scientist/mathematician like me regarding software development, let me just say that almost everything I’ve ever heard associated with the term "extreme programming" sounds like exactly the wrong way to go...with one exception. The exception is the idea of working in teams and reading each other’s code. That idea is crucial, and it might even mask out all the terrible aspects of extreme programming that alarm me.

(Link via The Code Project.)

22 April 2008

Fun with 88's: Part 2

My previous post introduced some of the syntax for defining local variables in COBOL. Now, we'll look at some procedural logic.

The PROCEDURE DIVISION consists of paragraphs of code, optionally organized into sections. Each paragraph consists of one or more statements and ends with a period (yes, a period: remember that the syntax was designed to resemble English). Paragraphs act like open subroutines in that control can fall into a paragraph and all program variables are accessible, so COBOL makes it easy to tangle your code into linguine—but you don't have to do it that way.

The EVALUATE statement is COBOL's switch, and the PERFORM statement executes a paragaph or a range of paragraphs. Common practice is to prefix paragraph names with a 4- or 5-digit number, which indicates where in the source code each is defined and how it fits into the execution hierarchy. As a result, programs tend to read top-down through the hierarchy rather than (my preference, which I adopted from Wirth) bottom-up.

A fragment of code for processing our name and address data from the previous post might be:



PROCEDURE DIVISION.
* * *
1200-LABEL-STATE.
    EVALUATE CARD-STATE
        WHEN "PA", "KY", "VA", "MA"
            PERFORM 1210-LABEL-AS-COMMONWEALTH
        WHEN "DC"
            PERFORM 1220-LABEL-AS-DISTRICT
        WHEN OTHER
            PERFORM 1290-LABEL-AS-STATE
    END-EVALUATE.

1210-LABEL-AS-COMMONWEALTH.
* * *
1220-LABEL-AS-DISTRICT.
* * *
1290-LABEL-AS-STATE.
* * *

Fairly readable, maintainable code, but a little brittle, should we need this classification scheme somewhere else in the program. Named conditions will help us out here.

The special level number 88 identifies a value or values that an elementary item might hold and a name for the condition that indicates that the item currently holds the value. Turning back to our example variable definitions:



* * *
01  CARD-IMAGE.
    03  CARD-NAME-AND-ADDRESS.
        05  CARD-NAME.
            07  CARD-FIRST-NAME          PIC X(10).
            07  CARD-LAST-NAME           PIC X(10).
        05      FILLER                   PIC X(3).
        05  CARD-ADDRESS.
            07  CARD-ADDRESS-LINE-1      PIC X(15).
            07  CARD-ADDRESS-LINE-2      PIC X(15).
            07  CARD-CITY                PIC X(10).
            07  CARD-STATE               PIC X(2).
                88  CARD-IS-DISTRICT     VALUE "DC".
                88  CARD-IS-COMMONWEALTH VALUE "PA", "KY", "VA", "MA".
            07      FILLER               PIC X(8).
            07  CARD-ZIP-CODE            PIC 9(5).
    03      FILLER                       PIC X(2).

Then our procedural code simplifies to:



PROCEDURE DIVISION.
* * *
1200-LABEL-STATE.
    EVALUATE TRUE
        WHEN CARD-IS-COMMONWEALTH
            PERFORM 1210-LABEL-AS-COMMONWEALTH
        WHEN CARD-IS-DISTRICT
            PERFORM 1220-LABEL-AS-DISTRICT
        WHEN OTHER
            PERFORM 1290-LABEL-AS-STATE
    END-EVALUATE.

1210-LABEL-AS-COMMONWEALTH.
* * *
1220-LABEL-AS-DISTRICT.
* * *
1290-LABEL-AS-STATE.
* * *

Now, if we find that requirements change, for instance that data for Mexico has to be supported, we have only one place that has to be updated to accommodate "DF" for the Distrito Federal:



            07  CARD-STATE               PIC X(2).
                88  CARD-IS-DISTRICT     VALUE "DC", "DF".
                88  CARD-IS-COMMONWEALTH VALUE "PA", "KY", "VA", "MA".

(We'd have to make some other changes, too, but that's not my point here.)

88-level conditions can use sets of values that overlap, and ranges can be specified with the keyword THRU. Returning to the original example:



            07  CARD-STATE               PIC X(2).
                88  CARD-IS-DISTRICT     VALUE "DC".
                88  CARD-IS-COMMONWEALTH VALUE "PA", "KY", "VA", "MA".
                88  CARD-IS-13-ORIGINAL  VALUE "MA", "NH", "RI", "CT", "NY",
                                         "NJ", "PA", "DE", "MD", "VA", "NC",
                                         "SC", "GA".

THRU is generally more useful with numeric data. Consider this contrived example of a tax calculation. The S in the PICTURE indicates the sign, and the V an implicit decimal point.



            07  TAXABLE-INCOME           PIC S9(6)V9(2).
                88  BRACKET-IS-10-PCT    VALUE 0 THRU 10000.
                88  BRACKET-IS-15-PCT    VALUE 10000.01 THRU 20000.
* * *

Aside: the "." symbol in the previous line is used in two very different ways: as a terminator period, as we've seen before, and as a decimal point in the numeric literal 10000.01. Getting these two right is one of the masochistic joys of machine-translating COBOL.

Next: to REDEFINE the union.

17 April 2008

Fun with 88's: Part 1

As programming tools have evolved, at each step I as an application developer am able to operate at a higher level of abstraction. A clickable onscreen button is a lot easier to work with than its corresponding region in video memory (my first hello-world assignment in C consisted of poking the letter "A" into a particular location on the 24x80 screen); an editable, sortable grid of data, bound to a database table and hosted in a web browser window, is in turn a huge advance over the HTML forms and tables it is replacing.

The workhorse business data processing languages of the 1960s and 1970s (COBOL, PL/1, FORTRAN to some extent) offered their own high-level abstractions. From today's vantage point, most of these technical advances appear quaint, crude, or worse. But some, like COBOL's named conditions, also known as 88-levels, provided a tidy solution in code to common processing problems, one that was not replicated by subsequent mainstream languages. It's easily my favorite feature of COBOL.

In this and subsequent posts, I'll take a little trip down memory lane to describe named conditions and how to use them. I'm a little bit rusty: when I was last actively using COBOL the predominant standard was COBOL85. That standard allowed lower case keywords and user-defined names, but COBOL just seems more COBOL-y in upper case.

In a COBOL program, procedural code and data specs are strictly segregated into the PROCEDURE DIVISION and the DATA DIVISION. Local variables are described in the WORKING-STORAGE SECTION of the DATA DIVISION.

To me, the fundamental unit of storage in COBOL is an 01-level group item, and it corresponds to a C struct. It is any number of individual elementary items arranged in a hierarchy. Within the hierarchy, a prefixing level number indicates what nests where; a PICTURE clause and a USAGE clause provide most of the physical typing information (how many bytes, what kind of data it can hold). An 80-column punchcard that holds name and address information might be represented in memory like this:



DATA DIVISION.
WORKING-STORAGE SECTION.
1  CARD-IMAGE.
   3  NAME-AND-ADDRESS.
   5  NAME.
   7  FIRST-NAME PICTURE IS XXXXXXXXXX USAGE IS DISPLAY.
   7  LAST-NAME PICTURE IS X(10) USAGE IS DISPLAY.
   5  FILLER PICTURE IS XXX USAGE IS DISPLAY.
   5  ADDRESS.
   7  ADDRESS-LINE-1 PICTURE IS X(15) USAGE IS DISPLAY.
   7  ADDRESS-LINE-2 PICTURE IS X(15) USAGE IS DISPLAY.
   7  CITY PICTURE IS X(10) USAGE IS DISPLAY.
   7  STATE PICTURE IS XX USAGE IS DISPLAY.
   7  FILLER PICTURE IS X(8) USAGE IS DISPLAY.
   7  ZIP-CODE PICTURE IS 99999 USAGE IS DISPLAY.
   3  FILLER PICTURE IS XX USAGE IS DISPLAY.

Well, this first example demonstrates how COBOL got its reputation for being excessively verbose and hard to understand. So the first point to be made is that no practitioner would actually code it this way. COBOL syntax defines some noisewords like IS and has lots of synonyms and shortcuts. The USAGE IS DISPLAY clause (which specfies alphanumeric data) can be disposed of altogether. You can code vendor-specific variations on the USAGE IS COMPUTATIONAL clause to specify integer data; however, most arithmetic is done with fixed-precision decimal data, so there's rarely a need to specify floating point data.

The PICTURE clause is an early attempt at coding-by-example. Ten X's mean ten bytes of alphanumeric, five 9's mean five bytes of decimal data that can do arithmetic. A repetition factor in parentheses means what you'd expect. There are other lots of fancy symbols that can be used in PICTURE clauses, to specify a decimal point or to automatically insert commas and currency signs—it's not unlike a printf() format string.

Group items in the hierarchy can always be treated like an alphanumeric string of their component characters. So ADDRESS can be used anywhere you'd want to use the 55 characters that comprise it.

It's a convention that level numbers are indented to reflect the hierarchy, but it's not a requirement. Also, most smart coders leave some gaps in the level numbers, so that if future maintenance calls for an intermediate level (for instance, a city-state-zip group item), it's easy to add. Leading zeroes in the level numbers are also used by convention to make the layout nicer.

FILLER designates anonymous storage, and its name is usually extra-indented to make it disappear. COBOL isn't particularly good at syntax for managing namespaces, so it's conventional practice to prefix all elements with an abbreviated version of the name of the 01-level group item.

So a more realistic example, one that we might see in a real program, would look like:



DATA DIVISION.
WORKING-STORAGE SECTION.
01  CARD-IMAGE.
    03  CARD-NAME-AND-ADDRESS.
        05  CARD-NAME.
            07  CARD-FIRST-NAME          PIC X(10).
            07  CARD-LAST-NAME           PIC X(10).
        05      FILLER                   PIC X(3).
        05  CARD-ADDRESS.
            07  CARD-ADDRESS-LINE-1      PIC X(15).
            07  CARD-ADDRESS-LINE-2      PIC X(15).
            07  CARD-CITY                PIC X(10).
            07  CARD-STATE               PIC X(2).
            07      FILLER               PIC X(8).
            07  CARD-ZIP-CODE            PIC 9(5).
    03      FILLER                       PIC X(2).

Next: level numbers that aren't level numbers.

Update: 18 April, followed standard COBOL terminology

16 April 2008

Teletype 33

Click through to page 4 of this photo essay by Michael Shamberg for Life's January 1970 number. This is totally me in ninth grade, learning BASIC (my first significant assignment was finding roots of polynomials by simple iteration). Especially the paper roll spilling onto the floor.

Shamberg visits the Rodman family of Ardmore, Pa., who have installed a Teletype 33 in their home and signed up for timesharing access to a mainframe in New Jersey. General Electric provides the service. The Rodmans apparently have access to permanent storage on the mainframe, whereas we students used paper tape to save our work. You can see the paper tape reader/punch attached to the Teletype unit, in the right side of the page 4 picture. The acoustic coupler modulator-demodulator is in the left side of the pic.

"For me, the main physical effect of having a computer at home is that I’m able to spend a lot more time with my family,” says Dr. Rodman, who is a lung specialist on the faculty of Temple University medical school in Philadelphia. “For all of us the real impact is mental. Programming a computer is like thinking in a foreign language. It forces you to approach problems with a high degree of logic. Because we always have a computer handy, we turn to it with problems we never would have thought of doing on one before.”

(Link via Boing Boing.)

02 April 2008

Zero bombs away

Coding Horror reminds us that Core War is still around.

Back in the mid 80's when I got my first Mac (a 512K model, a loaner from my company) and I was looking for a programming project, I noodled around with building a Redcode emulator, but nothing ever came of it.

17 March 2008

Touch me

Eric Burke explains simplicity in three sketches. I've recently worked with a scary app that makes Burke's #3 look crystalline, by comparison.

13 March 2008

WTF/min

Thom Holwerda proposes a new code quality metric.

(Link via Scott Rosenberg's Wordyard.)

07 March 2008

Exam prep: 2

So I've finished my first read-through of the training book for MCPD 70-547. I'm making progress a little more slowly than I'd like, as my notes indicate that I started on January 17. Here's hoping that I can get through MCTS 70-528 and what I have left to read in MCTS 70-536 more quickly: I'd really like to sit for the exam by the end of the second quarter. And I'd like to free up those slots in my Safari subscription.

21 February 2008

Herding Cats

I wrote a book review for IEEE Software a few years ago (published online only) of Herding Cats: A Primer for Programmers Who Lead Programmers by J. Hank Rainwater. I had bookmarked it with TinyURL.com because IEEE uses a CMS (content mangling system) to run its web site, but I can't find the old bookmark. So here it is again.

[Obligatory Mark Twain reference]

Maybe there will still be work for me when I'm ready to move into semi-retirement. Tam Harbert reports on demand and supply for COBOL programmers (for Computerworld, naturally).

Some 75% of the world's businesses data is still processed in Cobol, and about 90% of all financial transactions are in Cobol, according to Arunn Ramadoss, head of the academic connections program at Micro Focus International PLC...

15 February 2008

DotNetNuke: Early returns

We did our first code drop to QA last week on our first project that employed DotNetNuke. DotNetNuke is gaining popularity because it allows developers and administrators to take a Chinese menu approach to web site design. It's a framework for developing apps within ASP.NET: developers code modules of functionality (traditionally in VB.NET, but C# works just fine) which can be rearranged on web site pages without recoding.

Typically, a module is written as three user controls: a View, an Edit, and a configuration Settings combination view/edit. The View control might present a grid or list of entities (blog posts, say), and then the Edit control would be used to edit one existing entity or create a new one. Or the View module might just be a widget that encapsulates content from another source, like a stock ticker or a Google map, and there would be no Edit module.

DotNetNuke (ugly name, that) is open source, and the weak API documentation reflects this fact. The search engine is often overloaded, so I've taken to using Google to search within the site. And it's usually the case that a method or property just isn't described in sufficient detail. Sometimes the only hits are someone's unanswered question in a forum: "Does anybody know how this thing works?" There is a book from Wrox of middling quality, Professional DotNetNuke 4.

One of the services that the DotNetNuke framework provides is caching of module content. If your module is serving semi-static text (like a Welcome box) or slowly-changing data like automobile traffic reports, you probably don't need the content to be refreshed second-to-second, and the admin can configure an appropriate cache timeout. But if your module provides any interactivity and validation (for instance, a new user registration control that collects name and address), you really don't want anything to be cached. When you, the developer, prepare a module for deployment, you can specify a default cache timeout of 0 seconds. But you can't count on that value being used in all deployments. I predict that a lot of our support calls will center on caching issues.

We've also noticed some problems, not well-described, with ASP.NET validator controls running client-side, and so we've specified EnableClientScript="false" for everything.

04 February 2008

Screening

Jeff Atwood of Coding Horror annotates Steve Yegge's Five Essential Phone-Screen Questions, orginally posted in 2004. The kernel of Yegge's post is that the phone interview should actually do some screening:

Please understand: what I'm looking for here is a total vacuum in one of these areas. It's OK if they struggle a little and then figure it out. It's OK if they need some minor hints or prompting. I don't mind if they're rusty or slow. What you're looking for is candidates who are utterly clueless, or horribly confused, about the area in question.

Yegge's five questions, or rather categories of questions, are:

Coding
Object-oriented Programming
Scripting and Regular Expressions
Data Structures
Bits and Bytes

Yegge's example coding questions use C++ and Java, and would be appropriate for C# and JavaScript. For our own purposes, I'd be inclined to include some questions that called for HTML and perhaps some SQL.

11 January 2008

A/K/A timesharing

The Economist brings its customary skeptical perspective to the software-as-a-service market:

The biggest doubt is whether you can make much money selling software this way. Vendors of conventional enterprise software made a killing by requiring customers to pay a high licensing fee upfront and then charging them for maintenance. Web-based firms, by contrast, have to make do with subscription fees. This means they are not able to grow as quickly: both NetSuite and Salesforce have been around for almost a decade. They have had to invest a lot in attracting customers and building data centres to supply their services. As a result, NetSuite has never posted a profit; at the end of September, its accumulated deficit amounted to nearly $242m. Salesforce is barely profitable and boasts an otherworldly price-to-earnings ratio of around 660.

The article includes some forecasts (courtesy of Gartner) for the size of the SaaS market through 2011.