30 November 2010

Regex mystery, once more

I spent some quality time with Friedl's Mastering Regular Expressions, and I'm beginning to get a better understanding of JavaScript's NFA regular expression engine. I dimly understand how the case I described in an earlier post sent the engine into backtracking perdition. I tested (but did not commit to the codebase) this version. It addresses the specific bad data case, but I don't think it's a comprehensive solution.



var PROBE_REGEX = /client\.org/;
var URL_REGEX = /^(http:\/\/)?(\w+\.?)*client\.org(.*)$/i;

stripUrlPrefix = function (url) {
var regex = new RegExp(URL_REGEX);
var result = jQuery.trim(url);
if (result.search(PROBE_REGEX) == -1) {
return result;
}
var matches = regex.exec(result);
if (matches) {
return matches[3];
} else {
return result;
}
}



It's one or both of the quantified expressions (\w+\.?)* and (.*) that are responsible for the performance issues.

No comments: