22 February 2010

Another regex issue

For a bit of code designed to block all HTML tags from an input string, I picked up the following regular expression from Friedl, Mastering Regular Expressions, 3/e:



<("[^"]*"|'[^']*'|[^'">])*>



As in:



if (inputString.match(/<("[^"]*"|'[^']*'|[^'">])*>/)) {
CLIENT.Utilities.addValidationMessage($(this), 'No HTML tags, please.');
isValid = false;
}



Unfortunately, colleague Jared points out that ill-formed HTML tags will pass this validation, and colleague Jason demonstrated that browsers (at least some, under certain conditions) will (more or less) render the ill-formed HTML. Jared's examples:



<a href="bad link" attr'>click me</a attr=">



Since this code is used for an internal app where users aren't actively trying to clobber things, we've chosen to live with the situation that the fishy markup can slip through.