29 December 2008
Linkages
23 December 2008
A question of scale
img { -ms-interpolation-mode:bicubic; }
The result: smoother resized images.
17 December 2008
Recycled clockwork
Jo Marchant's book about the efforts to understand the clump of metal found in the Mediterranean a century ago, Decoding the Heavens, is scheduled for publication early next year.
(Link via Wired.)
16 December 2008
Not as cute as our receptionist
03 December 2008
Decisions, decisions
Since neither SAX nor StAX create in-memory representations of the complete document, they are not well-suited to applications that must transform the document, but they can be effective for simple streaming applications. StAX uses a "pull" model that puts the processing loop in the application, so many developers will find it easier to use than SAX.
By contrast, DOM and VTD are the tools to use if you need to rewrite the XML. As compared to DOM, VTD does not construct its in-memory representation with an object tree, but rather with lightweight arrays of 64-bit integers, and the article gives a one-figure sketch of how this works. The authors estimate VTD to be 5 to 8 times faster than DOM and to take up 20% of the space that DOM does, especially for incremental updates (but they don't back up these calculations with empirical measurements). They also speculate that VTD is a good candidate for hardware acceleration.
01 December 2008
Steampunk reader
25 November 2008
I detect a pattern here
You can also use a regex to make sure that a text box is filled in with a valid date (in, MM-DD-YY, format, for instance), but in this case you're usually better off using a specialized date picker, for instance, one that presents a pop-up monthly calendar and all the user has to do is click a number.
The big problem with regular expressions is the proliferation of implementations and all the bells and whistles that come with. For example, we found a particularly useful pattern at RegExLib.com to match e-mail addresses that include the display-name part (as in "User, Joe" <joe.user@example.com>), but the pattern wasn't useful for client-side validation because it used features that depended on a browser-specific regex engine. So a reference book like Jeffrey Friedl's Mastering Regular Expressions is really handy to help you keep track of platform-specifics. By all means, use the contributed patterns form a site like RegExLib.com, but don't put a pattern into production that you don't understand yourself.
Another tool that you may find useful is Ivaylo Badinov's test harness for regular expressions, REGex TESTER.
Just to amplify a couple of Jeff Atwood's points:
Do not try to do everything in one uber-regex. I know you can do it that way, but you're not going to. It's not worth it. Break the operation down into several smaller, more understandable regular expressions, and apply each in turn. Nobody will be able to understand or debug that monster 20-line regex, but they might just have a fighting chance at understanding and debugging five mini regexes.
This is also good advice for smaller patterns, too. If you're trying to recognize U.S. telephone numbers, for instance, start with a pattern that recognizes area codes (something like /\d{3}/) and one that recognizes exchange and number body (/\d{3}-\d{4}/) and then put the two patterns together (into /(\d{3}-)?\d{3}-\d{4}/).
Regular expressions are not Parsers. Although you can do some amazing things with regular expressions, they are weak at balanced tag matching. Some regex variants have balanced matching, but it is clearly a hack—and a nasty one. You can often make it kinda-sorta work, as I have in the sanitize routine. But no matter how clever your regex, don't delude yourself: it is in no way, shape or form a substitute for a real live parser.
Exactly. Regular expressions are good for problems that call for a bounded degree of nesting: breaking up a file of XML into tokens that represent the element and attribute names, for instance. These problems are what the language translation people would call lexical analysis. For problems that permit arbitrarily deep nesting, like parsing the stream of XML tokens into a document tree, ensuring that each tag is properly closed and nested, you're doing syntactic analysis, and you need a tool like yacc.
18 November 2008
Fan belt?
(Link via Risks Digest.)
11 November 2008
A freebie
And that do I get for the $15/month that I'm paying? Right now, I have these volumes on my virtual bookshelf:
- Duthie and MacDonald, ASP.NET in a Nutshell, 2/e
- Meyer, CSS: The Definite Guide, 3/e
- Bergsten, JavaServer Pages, 3/e: I checked this out for a specific project, and will release it soon
- Pogue, Mac OS X Leopard: The Missing Manual
- Friedl, Mastering Regular Expressions, 3/e
- Snell et al., MCPD Self-Paced Training Kit (Exam 70-547): this goes back once I pass the exam
- Northrup et al., MCTS Self-Paced Training Kit (Exam 70-536): ditto
- Pogue et al., Windows XP Pro Edition: The Missing Manual, 2/e
I can swap out anything after holding it for 30 days. Most important, I can change up to the next edition of a book without having to pulp the old one.
05 November 2008
Exam prep: 7
30 October 2008
Letters, we get letters
Different shops call for different management styles, so YMMV. But take a look at Becoming a Technical Leader by Gerald M.
Weinberg. For that matter, just about anything Weinberg has written about programming and the psychology behind it is worth reading.
You're probably familiar with Steve McConnell's work. His Rapid Development provides a survey of the management techniques you
can use to improve the delivery of good software; some of the topics in Code Complete are also relevant.
Finally, DeMarco and Lister's Peopleware is good for helping you identify aspects of your office environment that are making you
and your team unproductive.
You may have noticed that two of these four titles are from Dorset House publishing. There's lots more good stuff to be found there.
16 October 2008
XSRF and me
While there aren't any good countermeasures against clickjakcing yet, there are practices that you can follow to mitigate XSRF attacks. But doesn't ASP.NET take care of all that for me? Not really. Todd Miranda demonstrates, in a 20-minute video, how the exploit works against an ASP.NET site and shows some basic techniques to cope.
09 October 2008
Toolable?
[I also learnt to] design the language to be well-toolable. This does impact the language in subtle ways – you’ve got to make sure the syntax works well for having a background compiler, and statement completion. There are actually some languages, such as SQL, where it’s very hard to do meaningful statement completion as things sort of come in the wrong order.
08 October 2008
Need a hint?
(Link via The Daily WTF.)
07 October 2008
Geezerbox
Until early next year, the highlight of the collection is Difference Engine No. 2, constructed from Charles Babbage's plans for Nathan Myhrvold and on loan to the museum. Like everything else in the museum, this machine is vounteer-powered , one staffer taking a turn at the crank while the other explains the workings. Though the gear is equipped for printing (see detail at right), that part of its operation is not part of the demo, as it takes four hours to clean up every time.
Most of the equipment is hands-off, but you can have a seat on this Cray-1, located just outside the main exhibit hall.
Another highlight of the visit is the demonstration of a reconstructed PDP-1, Digital Equipment's first commercial system, docented by John Bohner and Peter Samson when I visited. The PDP-1, introduced in the early 1960s, was the first machine to feature a symbolic debugger, an amenity no doubt appreciated by Samson. As part of the restoration, he reverse-engineered paper-tape music files that had been serendipitously preserved in order to recreate a music synthesizer that he wrote while an undergraduate at MIT. The synthesizer resides in 4K of memory, which is also a good thing, because this model holds all of 12K 18-bit words.
Most all of the other boxes are not powered up, but rather are displayed warehouse-style in the main hall. (Imagine the heat generated by all of these boxes were they all running!) My graduate school days were brought back by the sight of a DECSystem-10 (at left). Those panels of switches are perhaps the only attractive industrial design to come out of the 1970s. And most of us, in one way or another, have crossed paths with an IBM System/360 (at right).
There are lots of smaller, newer, and older items, as well: a rack of HP calculators, Herman Hollerith's tabulation equipment, a rack of tubes from ENIAC, some game consoles, a Sage air-defense system (tube-based and inexplicably still in service through 1983), a Norden bombsight, an Enigma machine.
Except for a side exhibit of computer chess (and the PDP-1 demo), there isn't a lot of emphasis on software; for now, the museum is largely a repository of hardware. But, we hope, forthcoming fundraising will increase the level of interactivity at this gem of a museum.
30 September 2008
Spare some cycles?
22 September 2008
240 GLOC
To say that COBOL is widespread is an understatement. In 1997 the Gartner Group estimated that there were 240 billion lines of COBOL code in active apps. Something like 90 percent of financial transactions are processed by COBOL code, and 75 percent of all business data processing is COBOL. Merril Lynch reports that 70 percent of its business runs on COBOL apps.... One estimate puts the value of current running COBOL code at $2 trillion.
19 September 2008
BAL
There were six or seven of us from the previous class that had been chosen to learn BAL and when we arrived at the Education Center, we were directed to our new classroom. The room had four rows of tables, enough chairs for the students, but no lectern for an instructor. Promptly at 9:00, two gentlemen, one from IBM and one from Bell Tell, arrived and explained that we were to be part of an experiment called "programmed instruction." We would be given paper-bound text books, Assembler Language coding pads, and pencils, but otherwise left on our own to learn a new generation of computer architecture and the language used to program it. Every 90 minutes an IBM expert would join us and ask us if we had questions. After a brief discussion with the expert, we would take a break.
17 September 2008
Finally
A workshop
- Short development cycles push people to find ways to be more efficient.
- One of Jeff's clients calls refactoring "entropy reduction."
- A rule of thumb for how big a user story should be: small enough to build six to twelve of them in a one- to two-week iteration. Of course, how much work this is depends on how many people you have on the development team.
- Many of the practices suggested by agile practitioners seem counter-productive—scrum rooms, for example, with multiple conversations going on at the same time. Try it anyway: if it doesn't work for you, then drop it.
- Agile's strength is that it expects requirements to change, and it explicitly provides for this, at the end of each iteration.
- These techniques are best applied domains where the cost of failure is low: think shoestring-capitalized dot-com startups, not avionics.
- I haven't seen much from the literature on applying agile methods to projects that are largely integration of third-party packages, nor to projects that are building APIs or frameworks with no user interface.
But what impressed me most about Jeff's presentation was his effective use of PowerPoint. To linger on a key point, generally he used a slide that consisted of a stock photo, full bleed, with oversized type reversed out of the image, something like those Miller Beer ads from a few years ago. (It was a photo of a tray of Krispy Kreme donuts labelled in Chinese that caused me to take notice.) His slides use little or no chrome—by that I mean those distracting standardized frames that corporate messaging departments insist on. Some of his slides reproduce a very small Stelligent logo in the lower left corner. About the only consistent design element is the oversized sans serif typeface that Jeff used; it looked something like Tahoma. This meant that he could incorporate disparate graphic elements from a lot of different sources (diagrams, mostly, and some Dilberts), of different qualities, and the design maintained unity. The effect was engaging without being too slick.
06 September 2008
Pretty pictures
28 August 2008
Progress report
27 August 2008
Memory lane
22 August 2008
19 August 2008
Top box
08 August 2008
Pre-360
Once the card was read, where in memory are the 80-columns of data placed, you wonder? In positions 1 through 80, of course. The 1401 mapped the first 333 positions of memory for card input (1-80), card output (101-180), and a print line (201-332). The 333rd position of memory could be used for printer channel control. If you are scheming how to use those leftover positions from 81-100 and 181-200, you are ahead of the game.
05 August 2008
Exam prep: 6
29 July 2008
Choose Me
A Choose One question is usually rendered with radio buttons, though sometimes designers use a dropdown.
What is your favorite color?
Red Orange Yellow Green Blue
While a Choose Many (a/k/a Choose All That Apply) uses checkboxes.
What Metro line(s) do you ride regularly?
Red OrangeYellow Green Blue
These two question types look to be so similar, and yet there are subtle differences in the way the data for them is collected and analyzed, and I've seen more than one analyst stumble over the differences.
Let's say that you're architecting an application that lets people design and administer surveys. What are the pitfalls? Here I'm focusing on how to model the actual survey responses that are collected, rather than how to model the metadata that constitutes the survey design.
A single database attribute is sufficient to store the respondents' answers to a Choose One. For our example question above, assign the values 1 through 5 to the five color choices. Then you can store the survey respondents' answers in a relational database column typed as an integer of some suitable size, 8 or 16 bits, perhaps. It depends on how many possible choices you want to provide for in your survey software app.
On the other hand, for a Choose Many, you need as many database attributes as there are possible choices. One way to do it is to use one boolean database column for each choice: five columns in our example above. Or you could pack the answers into a bitfield, although querying and filtering on individual choices would then be a problem. Whatever path you take as the software architect, the key point is that the respondents' answers will be represented differently than for Choose One questions.
Generally, Choose Ones are just easier to deal with—a generic design is more accommodating. This fact probably explains why almost all of the free online polls that you see (hosted by outfits like polldaddy) are Choose Ones.
If your survey app allows designers to change a survey's design by adding more choices after the survey has been released to the world, you can see that adding a choice to a Choose One (just another possible value to be stored) is a lot simpler than adding a choice to a Choose Many (maybe a new database column).
What is your favorite color?
Red Orange Yellow Green Blue Indigo Violet
What Metro line(s) do you ride regularly?
Red Orange Yellow Green Blue Silver Purple
Even trickier is enabling a survey designer to rewrite a Choose One as a Choose Many, or vice versa.
What are your favorite color(s)?
Red Orange Yellow Green Blue
What Metro line do you ride most often?
Red Orange Yellow Green Blue
In this case, you're not simply extending a relational schema by adding columns or possible values, but rather you've got to convert one column type to another.
Reporting and analyzing survey responses for the two question types is likewise different.
For Choose One questions, you can do a frequency analysis that adds up to 100%:
What is your favorite color? (N of respondents=100)
- 45% Red
- 12% Orange
- 10% Yellow
- 17% Green
- 16% Blue
And if your corresponding numerical scale makes sense, you can compute means and other statistics. (Granted, this makes more sense for computing something like average customer satisfaction than for computing an average favorite color.) Analysis like this can be presented effectively with a pie chart.
But for Choose Many questions, the numbers never add up.
What Metro line(s) do you ride regularly? (N of respondents=100)
- 45% Red
- 22% Orange
- 65% Yellow
- 22% Green
- 40% Blue
An unstacked bar chart is probably the best way to present this information graphically. At least you can depend on the longest/tallest bar being no more than 100%. Also notice that you don't have a numerical scale, just booleans, so you can't compute means and standard deviations for Choose Many questions.
I haven't delved into some of the more fiddly bits of designing a survey app that supports Choose One and Choose Many questions. Things like support for "none of the above," "I don't know," "not applicable to me," or "I'd rather not say" choices, or requiring that the respondent check at least one box or radio button, or at most three checkboxes.
19 July 2008
Windows XP and Samba
Over the holiday weekend, I brought the hard drive home (an Iomega Home Network 500GB), plugged it in, configured it over the web interface from Dedalus. I named the drive Boylan, did a Go > Connect to Server... from Finder, and It Just Worked. The network drive shares files with the Samba protocol (a/k/a SMB, or Server Message Block), and Mac OS speaks fluent Samba.
Not quite so easy connecting from the Windows machine. I tried using the Iomega-provided Discovery Tool Home to mount the shared folders. The tool could find the drive and folders, but when I picked a folder and clicked the button, the tool popped up a dialog with the rather unhelpful message "Exception Error."
I fiddled with the port settings on the firewall that I have running on on Mulligan (Norton), but no luck. I disabled the Norton firewall and went back to Windows Firewall, but no joy again.
I started a chat session with a support rep from Iomega and learned the first secret: the Discovery Tool is just a GUI for Tools > Map Network Drive..., because that's all that the rep used to work the problem. Unfortunately, Map Network Drive... was equally unable to mount the shared folders, and provided no clue as to what was not configured correctly. He asked me to check some settings on the router, and at the moment I had a glitch logging into the router, so I had to close the chat session.
The next day I had some free time to do some searching, and I turned up a page of David Lechnyr's Unofficial Samba HOW TO that taught me the second secret: Tools > Map Network Drive... is just a GUI for the console command net use. So I tried
net use e: \\192.168.0.101\public
and I got back the message "System error 1231 has occurred." Finally a clue! More searching, and I found a nicely comprehensive list of System Errors and troubleshooting tips. The page for 1231 reads, in part:
...when we checked the Properties of the LAN, we found the Client for Microsoft Networks and File and Printer Sharing for Microsoft Networks were disabled.
I bounced off to the Properties sheet for my network connection, and indeed, Client for Microsoft Networks was disabled. I checked that box, rebooted, and much satisfied mounting of folders was mine. Furthermore, now both Boylan (the hard drive) and Mulligan (the laptop) show up in My Network Places under Microsoft Windows Network and the node for the workgroup.
So, while File and Printer Sharing for Microsoft Networks enables other devices to access shared external resources from the local machine, Client for Microsoft Networks works in the other direction—it enables the local machine to access external resources.
17 July 2008
Yet another yacc post
YACC made it possible for many people who were not language experts to make little languages (also called domain-specific languages) to improve their productivity.
And that is exactly what happened in my case: with no prior academic preparation, I used yacc to build a translator for a COBOL-like reporting and OLTP application language called XPL. (That is, I used yacc, lex, and every trick I could find in a couple of the seminal books on compilers that were available in the early 1990s: the 1986 edition of Aho et al. ["the dragon book"] and Schreiner and Friedman's Introduction to Compiler Construction with Unix, which we called "the unicorn book.")
I did this work for a company called Magna Software, now defunct, which has left few online traces of its existence. I've lost my conversancy with language translation and compilers, so I don't know how it's done any more—I doubt that yacc could support the interpret-on-fly behavior that Visual Studio gives me.
14 July 2008
My Toolbox
Languages
My assignments in this Microsoft shop call for declarative and/or imperative code in the following languages:
- C#: we're using version 2.0 of the .NET framework
- SQL: both DDL and DML; we do most of our development against SQL Server, but we also support Oracle
- HTML: the app itself uses XHTML 1.0 Transitional, and the generated surveys are HTML 4.0 Transitional
- JavaScript: we don't rely heavily on JavaScript, at least not directly, but we do use it as a glue language
- CSS
- XML
We use the little languages regular expressions and XPath expressions. ASP.NET markup to declare a server control or to code a @Page directive is a little language too, but isn't it interesting that it doesn't have its own name?
We have legacy code that uses Delphi and XSL/T.
Compilers, etc.
At any given moment, my desktop may have windows open for the following tools. In the course of a week, it's almost certain that I will use all of these.
- Visual Studio 2005
- SQL Server Management Studio
- Perforce Windows Client, for access to the source code repository
Perforce's means for coding client specs is its own perplexing little language. We depend a lot on the context menu pick Create/Replace Usingas Template... - Firefox 2, equipped with ColorZilla, Firebug, Flashblock, View Source Chart, and Web Developer
- Internet Explorer 7, equipped with IE7Pro and IE Developer Toolbar
- a VMware workstation running Internet Explorer 6, for more browser compatibility testing
- Remote Desktop Connections to other servers
- a paint program for managing screen shots: my GUI team lead likes Jasc Paint Shop Pro
- UltraEdit: clunky like a Swiss Army knife with too many blades, but it handles .csv files that Excel can't, and it can make XML readable
- any of two or three different file compression utilities
- a command line
We're starting to use Team City for automated builds. Some people on the team swear by ReSharper, but I find that it just slows down my compile and test cycle. Every once in a while I'll need to pop open the Windows Services manager, or the IIS Manager to set up a new web site. My boss likes Beyond Compare, so it's installed on my machine, but I'm lazy and never bothered to learn how to use it; I depend on the default comparison tool provided by Perforce. Now that I look in there, I see that there's a lot of stuff in my Start Menu that I've never used. And sometimes I'll have a pattern-search chore than I can't use Visual Studio or Windows Explorer's Search for (maybe wading through web server log files), and then I'm glad that I have a copy of Windows Grep.
Third Party Controls and Libraries
For everything from managing Ajax interactions, to business graphics, to file-format translations, to fancy web controls, we license tools from ComponentArt, Dundas, Aspose, Telerik, and Steema Software. And, as I posted earlier, we built the Community Builder Module with DotNetNuke.
Communication
When I log in, Microsoft Outlook and AOL Instant Messenger automatically launch. I treat IM software as a necessary evil, as most of the rest of the team likes to use it.
We use FogBugz for bug tracking, and we've just started using its integrated wiki for documenting procedures, tracking issues that are more complicated than individual software defects, and advance planning. I really prefer wiki software that puts you in control of the final product (like MediaWiki, which powers Wikipedia), but we chose a tool that could be picked up by a broad base of users.
Lest we forget, for project specs and planning: Microsoft Office, primarily Word, Excel, and Project, with a smattering of Visio.
The Desktop
Sitting on my desk, holding up my telephone, is a PC running Windows Server 2003. The shelf over my monitors is holding about a dozen books, but I only pull out three or four of them. The one I go for most is Spainhour & Eckstein's Webmaster in a Nutshell. For more details, I rely on my Safari subscription. And, perhaps most importantly, a big desk pad loaded with paper.
02 July 2008
Some assembly required
Having the gears as polygons makes modeling their interactions child's play. Etoys has a built-in primitive to locate overlapping objects. Thus, on each time step, I simply look for overlapping polygons and rotate them in the appropriate direction until they no longer overlap.EToys runs, among other places, on the XO laptop of the OLPC intiative. A download of the emulator's project file is available from the author.
25 June 2008
Exam (re)prep: 5
I'm going to pass up the free-retake offer, which expires at the end of this month. I just don't have time to prep that much material. Rather, I'm going to take the three exams separately--probably 70-528 first, in August, and 70-536 in September. Those two together are good for the MCTS credential, and then we'll see about advancing to the MCPD.
I gave a presentation to my development workgroup (about 16 guys) about my current experiences and generally about how the program works. I was a little surprised that more than a couple guys were interested enough to ask questions.
02 June 2008
A sharp, simple tool
AWK has inspired many other languages, as you've already mentioned. Why do you think this is?
What made AWK popular initially was its simplicity and the kinds of tasks it was built to do. It has a very simple programming model. The idea of pattern-action programming is very natural for people. We also made the language compatible with pipes in Unix....
Another advantage of AWK is that the language is stable. We haven't changed it since the mid-1980s. And there are also lots of other people who've implemented versions of AWK on different platforms such as Windows.
According to Aho, AWK is still one of the 30 most popular programming languages. Just a couple of weeks ago, I pulled out my copy of The AWK Progamming Langauge (wow! the book is much more expensive now) and wrote a 3-line program to generate a 5000-row CSV file of test data.
(Link via CodeProject.)
27 May 2008
Exam prep: 4
08 May 2008
06 May 2008
Fun with 88's: Part 3
struct
, and the OCCURS keyword, which defines an array. Let's say that requirements for our toy name and address processing application have changed again, and that Canadian addresses and post codes must be supported. We'll put a record type indicator in the unused space at the end of the card, and lay out the city-state-zip storage differently depending on the record type. Remember how we left some space in the level numbers for maintenance? Here's a case where the practice comes in handy:
01 CARD-IMAGE.
03 CARD-NAME-AND-ADDRESS.
05 CARD-NAME.
07 CARD-FIRST-NAME PIC X(10).
07 CARD-LAST-NAME PIC X(10).
05 FILLER PIC X(3).
05 CARD-ADDRESS.
07 CARD-ADDRESS-LINE-1 PIC X(15).
07 CARD-ADDRESS-LINE-2 PIC X(15).
06 CARD-USA-AREA.
07 CARD-CITY PIC X(10).
07 CARD-STATE PIC X(2).
88 CARD-IS-DISTRICT VALUE "DC".
88 CARD-IS-COMMONWEALTH VALUE "PA", "KY", "VA", "MA".
07 FILLER PIC X(8).
07 CARD-ZIP-CODE PIC 9(5).
06 CARD-CANADA-AREA.
REDEFINES CARD-USA-AREA.
07 FILLER PIC X(10).
07 CARD-PROVINCE PIC X(2).
07 FILLER PIC X(6).
07 CARD-POST-CODE PIC X(7).
03 CARD-RECORD-TYPE PIC X(2).
88 CARD-IS-USA VALUE "US".
88 CARD-IS-CANADA VALUE "CA".
Granted, there is opportunity for the record type indicator to disagree with the way the storage is used: object-oriented languages do have something to offer here.
We don't have to provide a separate name for the city part of CARD-CANADA-AREA: we can use CARD-CITY to refer to characters 54 through 63, irrespective of record type.
Now, let's say that we want to print the city and state part of the card image, separated by a comma, with the trailing whitespace squeezed out, for example, "New York, NY". (The STRING statement can also be used to do this, but that's a post for another day.) We can use OCCURS to treat the characters of CARD-CITY as an array (1-based) of 10 characters, and similarly for CARD-STATE.
* * *
07 CARD-CITY.
09 CARD-CITY-CHAR PIC X(1)
OCCURS 10 TIMES.
88 CARD-CITY-CHAR-IS-SPACE
VALUE " ".
07 CARD-STATE.
88 CARD-IS-DISTRICT VALUE "DC".
88 CARD-IS-COMMONWEALTH VALUE "PA", "KY", "VA", "MA".
09 CARD-STATE-CHAR PIC X(1)
OCCURS 2 TIMES.
TIMES is another noise word that usually isn't coded. We'll need an output area and a couple of indexes:
01 OUTPUT-STRING.
03 OUTPUT-CHAR PIC X(1)
OCCURS 13.
01 IEND PIC S9(4) USAGE COMP.
01 IFROM PIC S9(4) USAGE COMP.
01 ITO PIC S9(4) USAGE COMP.
Now we're ready to write some procedural logic. PERFORM... VARYING makes a counted
for
loop. MOVE is the workhorse assignment statement: notice that the "left hand side" is actually coded on the right.
PROCEDURE DIVISION.
* * *
* Initialize result and its indexer
MOVE SPACES TO OUTPUT-STRING
MOVE ZERO TO ITO
* Scan from the end of the city area until
* a nonspace character is found
PERFORM VARYING IEND FROM 10 BY -1
UNTIL IEND < 1
OR NOT CARD-CITY-CHAR-IS-SPACE(IEND)
CONTINUE
END-PERFORM
* [Some exception-handling logic for the case
* in which the city portion is completely blank
* could be written here.]
* Copy the city, one character at a time, to the output
PERFORM VARYING IFROM FROM 1 BY 1
UNTIL IFROM > IEND
ADD 1 TO ITO
MOVE CARD-CITY-CHAR(IFROM) TO OUTPUT-CHAR(ITO)
END-PERFORM
* Copy the comma
ADD 1 TO ITO
MOVE "," TO OUTPUT-CHAR(ITO)
* Copy the state
PERFORM VARYING IFROM BY 1 BY 1
UNTIL IFROM > 2
ADD 1 TO ITO
MOVE CARD-STATE-CHAR(IFROM) TO OUTPUT-CHAR(ITO)
END-PERFORM
There's all sorts of things we could do to improve and simplify this logic. One change would be to add code to the WORKING-STORAGE SECTION to define the entire card image as an array of 80 characters. It's common that more experienced COBOL programmers write proportionally more code in the DATA DIVISION than they do in the PROCEDURE DIVISION.
A warning, once again: all of the above code is from memory, and hasn't been subjected to compilation or testing. In particular, I don't remember for certain whether 88's can be applied to group-level items as I did above with CARD-STATE.
05 May 2008
Community Builder
...will allow organizations to quickly and cost effectively create and manage online community panels and provide a voice to customers, employees and other constituents.
01 May 2008
Happy birthday, BASIC
Exam prep: 3
29 April 2008
It's about time
We expect CSS Variables to receive a very positive feedback from... the Web authors' community...
28 April 2008
Modulo treasure
...software methodology has always been akin to religion. With the caveat that there’s no reason anybody should care about the opinions of a computer scientist/mathematician like me regarding software development, let me just say that almost everything I’ve ever heard associated with the term "extreme programming" sounds like exactly the wrong way to go...with one exception. The exception is the idea of working in teams and reading each other’s code. That idea is crucial, and it might even mask out all the terrible aspects of extreme programming that alarm me.
(Link via The Code Project.)
22 April 2008
Fun with 88's: Part 2
The PROCEDURE DIVISION consists of paragraphs of code, optionally organized into sections. Each paragraph consists of one or more statements and ends with a period (yes, a period: remember that the syntax was designed to resemble English). Paragraphs act like open subroutines in that control can fall into a paragraph and all program variables are accessible, so COBOL makes it easy to tangle your code into linguine—but you don't have to do it that way.
The EVALUATE statement is COBOL's
switch
, and the PERFORM statement executes a paragaph or a range of paragraphs. Common practice is to prefix paragraph names with a 4- or 5-digit number, which indicates where in the source code each is defined and how it fits into the execution hierarchy. As a result, programs tend to read top-down through the hierarchy rather than (my preference, which I adopted from Wirth) bottom-up.A fragment of code for processing our name and address data from the previous post might be:
PROCEDURE DIVISION.
* * *
1200-LABEL-STATE.
EVALUATE CARD-STATE
WHEN "PA", "KY", "VA", "MA"
PERFORM 1210-LABEL-AS-COMMONWEALTH
WHEN "DC"
PERFORM 1220-LABEL-AS-DISTRICT
WHEN OTHER
PERFORM 1290-LABEL-AS-STATE
END-EVALUATE.
1210-LABEL-AS-COMMONWEALTH.
* * *
1220-LABEL-AS-DISTRICT.
* * *
1290-LABEL-AS-STATE.
* * *
Fairly readable, maintainable code, but a little brittle, should we need this classification scheme somewhere else in the program. Named conditions will help us out here.
The special level number 88 identifies a value or values that an elementary item might hold and a name for the condition that indicates that the item currently holds the value. Turning back to our example variable definitions:
* * *
01 CARD-IMAGE.
03 CARD-NAME-AND-ADDRESS.
05 CARD-NAME.
07 CARD-FIRST-NAME PIC X(10).
07 CARD-LAST-NAME PIC X(10).
05 FILLER PIC X(3).
05 CARD-ADDRESS.
07 CARD-ADDRESS-LINE-1 PIC X(15).
07 CARD-ADDRESS-LINE-2 PIC X(15).
07 CARD-CITY PIC X(10).
07 CARD-STATE PIC X(2).
88 CARD-IS-DISTRICT VALUE "DC".
88 CARD-IS-COMMONWEALTH VALUE "PA", "KY", "VA", "MA".
07 FILLER PIC X(8).
07 CARD-ZIP-CODE PIC 9(5).
03 FILLER PIC X(2).
Then our procedural code simplifies to:
PROCEDURE DIVISION.
* * *
1200-LABEL-STATE.
EVALUATE TRUE
WHEN CARD-IS-COMMONWEALTH
PERFORM 1210-LABEL-AS-COMMONWEALTH
WHEN CARD-IS-DISTRICT
PERFORM 1220-LABEL-AS-DISTRICT
WHEN OTHER
PERFORM 1290-LABEL-AS-STATE
END-EVALUATE.
1210-LABEL-AS-COMMONWEALTH.
* * *
1220-LABEL-AS-DISTRICT.
* * *
1290-LABEL-AS-STATE.
* * *
Now, if we find that requirements change, for instance that data for Mexico has to be supported, we have only one place that has to be updated to accommodate "DF" for the Distrito Federal:
07 CARD-STATE PIC X(2).
88 CARD-IS-DISTRICT VALUE "DC", "DF".
88 CARD-IS-COMMONWEALTH VALUE "PA", "KY", "VA", "MA".
(We'd have to make some other changes, too, but that's not my point here.)
88-level conditions can use sets of values that overlap, and ranges can be specified with the keyword THRU. Returning to the original example:
07 CARD-STATE PIC X(2).
88 CARD-IS-DISTRICT VALUE "DC".
88 CARD-IS-COMMONWEALTH VALUE "PA", "KY", "VA", "MA".
88 CARD-IS-13-ORIGINAL VALUE "MA", "NH", "RI", "CT", "NY",
"NJ", "PA", "DE", "MD", "VA", "NC",
"SC", "GA".
THRU is generally more useful with numeric data. Consider this contrived example of a tax calculation. The S in the PICTURE indicates the sign, and the V an implicit decimal point.
07 TAXABLE-INCOME PIC S9(6)V9(2).
88 BRACKET-IS-10-PCT VALUE 0 THRU 10000.
88 BRACKET-IS-15-PCT VALUE 10000.01 THRU 20000.
* * *
Aside: the "." symbol in the previous line is used in two very different ways: as a terminator period, as we've seen before, and as a decimal point in the numeric literal 10000.01. Getting these two right is one of the masochistic joys of machine-translating COBOL.
Next: to REDEFINE the union.
17 April 2008
Fun with 88's: Part 1
The workhorse business data processing languages of the 1960s and 1970s (COBOL, PL/1, FORTRAN to some extent) offered their own high-level abstractions. From today's vantage point, most of these technical advances appear quaint, crude, or worse. But some, like COBOL's named conditions, also known as 88-levels, provided a tidy solution in code to common processing problems, one that was not replicated by subsequent mainstream languages. It's easily my favorite feature of COBOL.
In this and subsequent posts, I'll take a little trip down memory lane to describe named conditions and how to use them. I'm a little bit rusty: when I was last actively using COBOL the predominant standard was COBOL85. That standard allowed lower case keywords and user-defined names, but COBOL just seems more COBOL-y in upper case.
In a COBOL program, procedural code and data specs are strictly segregated into the PROCEDURE DIVISION and the DATA DIVISION. Local variables are described in the WORKING-STORAGE SECTION of the DATA DIVISION.
To me, the fundamental unit of storage in COBOL is an 01-level group item, and it corresponds to a C
struct
. It is any number of individual elementary items arranged in a hierarchy. Within the hierarchy, a prefixing level number indicates what nests where; a PICTURE clause and a USAGE clause provide most of the physical typing information (how many bytes, what kind of data it can hold). An 80-column punchcard that holds name and address information might be represented in memory like this:
DATA DIVISION.
WORKING-STORAGE SECTION.
1 CARD-IMAGE.
3 NAME-AND-ADDRESS.
5 NAME.
7 FIRST-NAME PICTURE IS XXXXXXXXXX USAGE IS DISPLAY.
7 LAST-NAME PICTURE IS X(10) USAGE IS DISPLAY.
5 FILLER PICTURE IS XXX USAGE IS DISPLAY.
5 ADDRESS.
7 ADDRESS-LINE-1 PICTURE IS X(15) USAGE IS DISPLAY.
7 ADDRESS-LINE-2 PICTURE IS X(15) USAGE IS DISPLAY.
7 CITY PICTURE IS X(10) USAGE IS DISPLAY.
7 STATE PICTURE IS XX USAGE IS DISPLAY.
7 FILLER PICTURE IS X(8) USAGE IS DISPLAY.
7 ZIP-CODE PICTURE IS 99999 USAGE IS DISPLAY.
3 FILLER PICTURE IS XX USAGE IS DISPLAY.
Well, this first example demonstrates how COBOL got its reputation for being excessively verbose and hard to understand. So the first point to be made is that no practitioner would actually code it this way. COBOL syntax defines some noisewords like IS and has lots of synonyms and shortcuts. The USAGE IS DISPLAY clause (which specfies alphanumeric data) can be disposed of altogether. You can code vendor-specific variations on the USAGE IS COMPUTATIONAL clause to specify integer data; however, most arithmetic is done with fixed-precision decimal data, so there's rarely a need to specify floating point data.
The PICTURE clause is an early attempt at coding-by-example. Ten X's mean ten bytes of alphanumeric, five 9's mean five bytes of decimal data that can do arithmetic. A repetition factor in parentheses means what you'd expect. There are other lots of fancy symbols that can be used in PICTURE clauses, to specify a decimal point or to automatically insert commas and currency signs—it's not unlike a
printf()
format string.Group items in the hierarchy can always be treated like an alphanumeric string of their component characters. So ADDRESS can be used anywhere you'd want to use the 55 characters that comprise it.
It's a convention that level numbers are indented to reflect the hierarchy, but it's not a requirement. Also, most smart coders leave some gaps in the level numbers, so that if future maintenance calls for an intermediate level (for instance, a city-state-zip group item), it's easy to add. Leading zeroes in the level numbers are also used by convention to make the layout nicer.
FILLER designates anonymous storage, and its name is usually extra-indented to make it disappear. COBOL isn't particularly good at syntax for managing namespaces, so it's conventional practice to prefix all elements with an abbreviated version of the name of the 01-level group item.
So a more realistic example, one that we might see in a real program, would look like:
DATA DIVISION.
WORKING-STORAGE SECTION.
01 CARD-IMAGE.
03 CARD-NAME-AND-ADDRESS.
05 CARD-NAME.
07 CARD-FIRST-NAME PIC X(10).
07 CARD-LAST-NAME PIC X(10).
05 FILLER PIC X(3).
05 CARD-ADDRESS.
07 CARD-ADDRESS-LINE-1 PIC X(15).
07 CARD-ADDRESS-LINE-2 PIC X(15).
07 CARD-CITY PIC X(10).
07 CARD-STATE PIC X(2).
07 FILLER PIC X(8).
07 CARD-ZIP-CODE PIC 9(5).
03 FILLER PIC X(2).
Next: level numbers that aren't level numbers.
Update: 18 April, followed standard COBOL terminology
16 April 2008
Teletype 33
Shamberg visits the Rodman family of Ardmore, Pa., who have installed a Teletype 33 in their home and signed up for timesharing access to a mainframe in New Jersey. General Electric provides the service. The Rodmans apparently have access to permanent storage on the mainframe, whereas we students used paper tape to save our work. You can see the paper tape reader/punch attached to the Teletype unit, in the right side of the page 4 picture. The acoustic coupler modulator-demodulator is in the left side of the pic.
"For me, the main physical effect of having a computer at home is that I’m able to spend a lot more time with my family,” says Dr. Rodman, who is a lung specialist on the faculty of Temple University medical school in Philadelphia. “For all of us the real impact is mental. Programming a computer is like thinking in a foreign language. It forces you to approach problems with a high degree of logic. Because we always have a computer handy, we turn to it with problems we never would have thought of doing on one before.”
(Link via Boing Boing.)
02 April 2008
Zero bombs away
Back in the mid 80's when I got my first Mac (a 512K model, a loaner from my company) and I was looking for a programming project, I noodled around with building a Redcode emulator, but nothing ever came of it.
17 March 2008
Touch me
13 March 2008
07 March 2008
Exam prep: 2
21 February 2008
Herding Cats
[Obligatory Mark Twain reference]
Some 75% of the world's businesses data is still processed in Cobol, and about 90% of all financial transactions are in Cobol, according to Arunn Ramadoss, head of the academic connections program at Micro Focus International PLC...
15 February 2008
DotNetNuke: Early returns
Typically, a module is written as three user controls: a View, an Edit, and a configuration Settings combination view/edit. The View control might present a grid or list of entities (blog posts, say), and then the Edit control would be used to edit one existing entity or create a new one. Or the View module might just be a widget that encapsulates content from another source, like a stock ticker or a Google map, and there would be no Edit module.
DotNetNuke (ugly name, that) is open source, and the weak API documentation reflects this fact. The search engine is often overloaded, so I've taken to using Google to search within the site. And it's usually the case that a method or property just isn't described in sufficient detail. Sometimes the only hits are someone's unanswered question in a forum: "Does anybody know how this thing works?" There is a book from Wrox of middling quality, Professional DotNetNuke 4.
One of the services that the DotNetNuke framework provides is caching of module content. If your module is serving semi-static text (like a Welcome box) or slowly-changing data like automobile traffic reports, you probably don't need the content to be refreshed second-to-second, and the admin can configure an appropriate cache timeout. But if your module provides any interactivity and validation (for instance, a new user registration control that collects name and address), you really don't want anything to be cached. When you, the developer, prepare a module for deployment, you can specify a default cache timeout of 0 seconds. But you can't count on that value being used in all deployments. I predict that a lot of our support calls will center on caching issues.
We've also noticed some problems, not well-described, with ASP.NET validator controls running client-side, and so we've specified
EnableClientScript="false"
for everything.
04 February 2008
Screening
Please understand: what I'm looking for here is a total vacuum in one of these areas. It's OK if they struggle a little and then figure it out. It's OK if they need some minor hints or prompting. I don't mind if they're rusty or slow. What you're looking for is candidates who are utterly clueless, or horribly confused, about the area in question.
Yegge's five questions, or rather categories of questions, are:
- Coding
- Object-oriented Programming
- Scripting and Regular Expressions
- Data Structures
- Bits and Bytes
Yegge's example coding questions use C++ and Java, and would be appropriate for C# and JavaScript. For our own purposes, I'd be inclined to include some questions that called for HTML and perhaps some SQL.
11 January 2008
A/K/A timesharing
The biggest doubt is whether you can make much money selling software this way. Vendors of conventional enterprise software made a killing by requiring customers to pay a high licensing fee upfront and then charging them for maintenance. Web-based firms, by contrast, have to make do with subscription fees. This means they are not able to grow as quickly: both NetSuite and Salesforce have been around for almost a decade. They have had to invest a lot in attracting customers and building data centres to supply their services. As a result, NetSuite has never posted a profit; at the end of September, its accumulated deficit amounted to nearly $242m. Salesforce is barely profitable and boasts an otherworldly price-to-earnings ratio of around 660.
The article includes some forecasts (courtesy of Gartner) for the size of the SaaS market through 2011.