30 November 2010

Regex mystery, once more

I spent some quality time with Friedl's Mastering Regular Expressions, and I'm beginning to get a better understanding of JavaScript's NFA regular expression engine. I dimly understand how the case I described in an earlier post sent the engine into backtracking perdition. I tested (but did not commit to the codebase) this version. It addresses the specific bad data case, but I don't think it's a comprehensive solution.



var PROBE_REGEX = /client\.org/;
var URL_REGEX = /^(http:\/\/)?(\w+\.?)*client\.org(.*)$/i;

stripUrlPrefix = function (url) {
var regex = new RegExp(URL_REGEX);
var result = jQuery.trim(url);
if (result.search(PROBE_REGEX) == -1) {
return result;
}
var matches = regex.exec(result);
if (matches) {
return matches[3];
} else {
return result;
}
}



It's one or both of the quantified expressions (\w+\.?)* and (.*) that are responsible for the performance issues.

Bronze metaphor

New research by James Evans suggests that the artisans who built the Antikythera Mechanism suggested epicycles to the Greek astronomers as an explanation of planetary motions, and not the other way around. Jo Merchant reports.

11 November 2010

Friendly is good

Joanne Garlow reports on a just-released increment of software to which I contributed, broad support of semantic URLs at npr.org.