22 December 2009

Dusty decks

A couple of archive sites from my bookmarks: First (via things magazine), the bitsavers project, an archive of software and documentation "for minicomputers and mainframes, from the 50's to the 80's." Doesn't look like the archive is being curated and catalogued at this point.

Hmm. I have some old listings in my storage unit. I wonder whether I can coax Gary and Greg to place SPREAD (a report writer for timesharing services, minicomputers—really, anything with a FORTRAN compiler) in the public domain.

Second, Karl Kleine's Historic Documents in Computer Science, a mix of archived material and links elsewhere. Some examples: The first FORTRAN manual, by John Backus and others, from 1956; Dennis Ritchie's C reference manual from about 1976. Unfortunately, the archive's last update was in October 2003, which may explain why it's missing some key material, like (ahem) anything on BASIC or COBOL.

18 December 2009

Regex mystery

One little bit of the app that I've been working on for the past several months is an HTML text box where the editor/producer can enter a relative URL that identifies an image file. But, in reality, it's common for the editor to have an absolute URL that he/she is pasting into the text box (maybe context-clicking to grab a URL from elsewhere on the web site), so one of the bits of processing to be done in JavaScript is to remove the scheme and server name from the URL. My client has multiple media servers within its client.org domain. So I wrote this tiny function, which in essence is nothing but



var URL_REGEX = /(http:\/\/)?(\w+\.{0,1})*client\.org/i;

function stripUrlPrefix (url) {
return url.replace(URL_REGEX, '');
}



stripUrlPrefix() removes the scheme, if present, and the server name, and it usually works like a champ.

However, Tony on the testing team found that the following input string (a real path name from one of our servers) sends the regex engines in IE 7 and Firefox 3 completely out to lunch:



/images/ap//AP_News_Wire:_World_News/3_Australia_Thirsty_Camels.sff_300.jpg



On my middle-of-the-line Windows XP laptop, IE 7 takes about 10 minutes to execute stripUrlPrefix(), given this input string; Firefox just pegs the CPU and never does return. Jason is going to give this code a spin on Chrome to see what happens.

I have somehow stumbled into some kind of backtracking morass with a regex that looks pretty vanilla to me, and an input string that's likewise not too gnarly.

It turns out that we can fix the problem by trimming leading whitespace from the input and adding a beginning of string anchor to the regular expression, thus:



var URL_REGEX = /^(http:\/\/)?(\w+\.{0,1})*client\.org/i;



I haven't checked to see whether explicitly using the RegExp class would make a difference.

15 December 2009

All your database are belong to us

The current number of IEEE Annals of the History of Computing is loaded with tasty pieces: the theme is early DBMSs, and there are articles on the roots of Adabas, IDS, Total, IDMS, System 2000, IMS -- all those pre-relational acronyms that once filled the job listings for programmer/analysts. I once had limited familiarity with Adabas and IMS: in each case I was writing application code in COBOL and I used some glue-layer code (macros called Adamints, IIRC; and proprietary code written by AMS) to talk to the DB.

Plus, for dessert, Dan Murphy writes about the origins of TECO. Back in graduate school, the guy that turned over to me the project supporting the marketing research study tried to teach me TECO, but I bailed out and made do with SOS.