27 October 2007

SurveyNOW

The current project to which I contributed, SurveyNOW, has been launched into the webby wide world.

11 October 2007

The Pragmatic Programmer

When I was just starting out in the field, the best way to get all of us wound up (at happy hour, say) was to restart the argument, "Is software development an art or a science?" And after all of us had talked ourselves out, my team leader Larry would quietly smile and say, "But of course, it's neither: it's a craft." Larry would find lots of common ground with Andrew Hunt and David Thomas, authors of The Pragmatic Programmer: From Journeyman to Master. Their instructive, enjoyable, at times even inspiring book, published in 2000, is a blend of theory and practice. Dedicated to software engineering principles yet refusing to become enslaved by formal methods, the authors believe that it is possible to carry out a tradition of craftsmanship within an engineering discipline. In a tidy 321 pages, they have assembled a philosophy of software design and construction, illustrated with current industry best practices.

Thomas and Hunt's target reader is an object-oriented programmer in a command line environment, but the bulk of their advice applies equally well to the developer in a legacy language (and, as they point out in a footnote, "All software becomes legacy as soon as it's written") or to the practitioner in the latest whizbang development studio. Theirs is one of the most literate works in the field: they have a gift for metaphor ("tracer bullets," "broken windows," "orthogonality," "rubber ducking") and a penchant for epigraphs from sources ranging from Wittgenstein to Oscar Wilde, Santayana to Arlo Guthrie.

The authors are perhaps best known for promulgating the DRY principle—Don't Repeat Yourself:

Every piece of knowledge must have a single, unambiguous, authoritative representation in a system.


Violations of this principle, in other words, duplication, are multivalent—redundant bits of code, documentation that must explain difficult code (or worse, that contradicts it), the exposure of class members to direct manipulation rather than through accessor methods—and Hunt and Thomas recommend code generators and automated builds, configuration files, MVC techniques, and vigorous and proactive refactoring as ways to forestall duplication.

They also advocate tight interlacing of documentation and code (à la Knuth's Literate Programming). In much the same way, the book is constructed from interlocking elements: 70 pithy tips and eleven checklists (indexed at the back of the book), section cross-references, and sidebars. This material isn't book-designer fluff, but rather the essence of the book. For use in a course, the book offers chapter exercises; as a stepping stone to further knowledge, Thomas and Hunt annotate books and other resources in a bibliography.

The companion website for the book also links to the authors' more recent publishing and training endeavors.

Table of contents:


  • Foreword
  • Preface
  • 1 A Pragmatic Philosophy

    • 1. The Cat Ate My Source Code
    • 2. Software Entropy
    • 3. Stone Soup and Boiled Frogs
    • 4. Good-Enough Software
    • 5. Your Knowledge Portfolio
    • 6. Communicate!

  • 2 A Pragmatic Approach

    • 7. The Evils of Duplication
    • 8. Orthogonality
    • 9. Reversibility
    • 10. Tracer Bullets
    • 11. Prototypes and Post-It Notes
    • 12. Domain Languages
    • 13. Estimating

  • 3 The Basic Tools

    • 14. The Power of Plain Text
    • 15. Shell Games
    • 16. Power Editing
    • 17. Source Code Control
    • 18. Debugging
    • 19. Text Manipulation
    • 20. Code Generators

  • 4 Pragmatic Paranoia

    • 21. Design by Contract
    • 22. Dead Programs Tell No Lies
    • 23. Assertive Programming
    • 24. When to Use Exceptions
    • 25. How to Balance Resources

  • 5 Bend, or Break

    • 26. Decoupling and the Law of Demeter
    • 27. Metaprogramming
    • 28. Temporal Coupling
    • 29. It's Just a View
    • 30. Blackboards

  • 6 While You Are Coding

    • 31. Programming by Coincidence
    • 32. Algorithm Speed
    • 33. Refactoring
    • 34. Code That's Easy to Test
    • 35. Evil Wizards

  • 7 Before the Project

    • 36. The Requirements Pit
    • 37. Solving Impossible Puzzles
    • 38. Not Until You're Ready
    • 39. The Specification Trap
    • 40. Circles and Arrows

  • 8 Pragmatic Projects

    • 41. Pragmatic Teams
    • 42. Ubiquitous Automation
    • 43. Ruthless Testing
    • 44. It's All Writing
    • 45. Great Expectations
    • 46. Pride and Prejudice

  • Appendix A: Resources

    • Professional Societies
    • Building a Library
    • Internet Resources
    • Bibliography

  • Appendix B: Answers to Exercises
  • Index

05 October 2007

Fonts for coders: 2

Jeff Atwood gives an update on monospace fonts for software development. He presents screen shots of the same piece of specimen code viewed with ten different fonts, popular and more obscure. His favorite is Consolas, and I have to admit that I could be tempted to give up Bitstream Vera Sans Mono for it. Consolas slashes zeroes and it puts an aggressive hook on the comma so that you can separate colons from semicolons.

17 September 2007

Cheapo productions surfing

I was shuffling reference books about, generally moving the less frequently-used ones to the shelves in the basement, and I picked up Shishir Gundavaram's CGI Programming on the World Wide Web (O'Reilly, 1996). Definitely one to move to the archive shelves. And yet—there is a sticky note on page 373, and it's there to mark a passage that describes a low-tech way to check on a web server using telnet:

% telnet www.google.com 80
Trying 64.233.169.104...
Connected to www.l.google.com.
Escape character is '^]'.
GET / HTTP/1.0

HTTP/1.0 200 OK
Cache-Control: private
Content-Type: text/html; charset=ISO-8859-1
Server: gws
Date: Tue, 18 Sep 2007 02:16:00 GMT
Connection: Close

<html><head><meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"><title>Google</title>
...
</body></html>Connection closed by foreign host.
%

You can use this technique to see the unvarnished HTML payload without doing a View>Page Source, as well as the HTTP headers. Best way to find out who's running Apache, who's running IIS, who's running something custom.

It's easy to run telnet from a Mac OS Terminal window: just remember to hit the enter key twice after you type type the GET line. I was less successful running telnet from a Windows Command Prompt window.


% telnet www.microsoft.com 80
Trying 207.46.19.254...
Connected to toggle.www.ms.akadns.net.
Escape character is '^]'.
GET / HTTP/1.0

HTTP/1.1 302 Found
Cache-Control: private
Content-Type: text/html; charset=utf-8
Location: /en/us/default.aspx
Server: Microsoft-IIS/7.0
X-AspNet-Version: 2.0.50727
P3P: CP="ALL IND DSP COR ADM CONo CUR CUSo IVAo IVDo PSA PSD TAI TELo OUR SAMo CNT COM INT NAV ONL PHY PRE PUR UNI"
X-Powered-By: ASP.NET
Date: Tue, 18 Sep 2007 02:26:39 GMT
Connection: keep-alive
Content-Length: 136

<html><head><title>Object moved</title></head><body>
<h2>Object moved to <a href="/en/us/default.aspx">here</a>.</h2>
</body></html>
Connection closed by foreign host.
%


When I hit the server at Amazon.com, the last line of HTML consisted of the comment string <!-- MEOW -->. Go figure.

29 August 2007

DUCET not dulcet

So I was working on a little piece of code that was responsible for sorting a list of names—titles of surveys, to be specific. And I noticed that my test data was sorting a little funny. I had a couple of surveys named "Case 4310-1" and "Case 4310/2" and I saw that the former sorted after the latter, even though the crib sheet that I always keep in my Day-Timer says that "-" is ASCII 45 (decimal, hex 2D) and "/" is ASCII 47 (hex 2F). How could this be?

Well, the first thing that I did was explicitly specify the method that I wanted to use to perform the sorting. We're developing in .NET 2.0, and I'm using a generic SortedList to build up the list of survey titles. If I chose, I could specify a strict binary character-by-character sort with this constructor


SortedList sortedList = new SortedList(StringComparer.Ordinal);


and I would get the sort that I "expected." But an ordinal comparer is case-insensitive, and I really want to be able to sort "case 10" and "Case 10" together. Now, fortunately, this code runs only on our servers and there is no provision in the app (at present!) to allow a user to specify a culture (what we called a "locale" in my old UNIX days) for sorting: one size fits all. So instead I wrote


SortedList sortedList = new SortedList(StringComparer.InvariantCultureIgnoreCase);


and I was back to the odd behavior that initially puzzled me. Furthermore, inserting space around the punctuation changed the sort order: "Case 4310 - 1" sorts ahead of "Case 4310 / 2".

It was clear that some culture-based behavior was in play—some kind of special treatment of a hyphen in certain cases—and I was perfectly happy to ship the app this way: all we really cared about was getting the names sorted into some usable order. But I was curious: what is the sort order for the "invariant culture"? I prowled around the Microsoft documentation and found little more than this explanation:

InvariantCulture retrieves an instance of the invariant culture. It is associated with the English language but not with any country/region.


and the hand-waving

The .NET Framework uses three distinct ways of sorting: word sort, string sort, and ordinal sort. Word sort performs a culture-sensitive comparison of strings. Certain nonalphanumeric characters might have special weights assigned to them; for example, the hyphen ("-") might have a very small weight assigned to it so that "coop" and "co-op" appear next to each other in a sorted list. String sort is similar to word sort, except that there are no special cases; therefore, all nonalphanumeric symbols come before all alphanumeric characters. Ordinal sort compares strings based on the Unicode values of each element of the string.


and a pointer to the Unicode docs.

So I opened up Unicode Technical Standard #10: Unicode Collation Algorithm, and OMG life is so much more complicated than the old POSIX days when about you had to know was that "ll" sorted after "l" in Spanish. Consider this tidbit from the introduction:

For example, Swedish and French have clear and different rules on sorting ä (either after z or as an accented character with a secondary difference from a), but neither defines the ordering of other characters such as Ж, ש, ♫, ∞, ◊, or ⌂.


I was taken back to the days when I worked in the music library, where the librarians had to figure out how to shelve a score whose title was in Russian.

I also learned about the concept of equivalence: different sequences of Unicode characters that can be treated exactly the same for collation purposes. For instance, there are three different ways to represent the angstrom symbol, a capital A with a ring.

UTS #10 points to the Default Unicode Collation Element Table (DUCET), a huge text file that provides, for one collation, all the data to the sorting algorithm for all Unicode characters and their combinations. Here's a snip of what I think is the relevant data for my question:


002A ; [*02FB.0020.0002.002A] # ASTERISK
002B ; [*04B8.0020.0002.002B] # PLUS SIGN
002C ; [*0232.0020.0002.002C] # COMMA
002D ; [*0222.0020.0002.002D] # HYPHEN-MINUS
002E ; [*0266.0020.0002.002E] # FULL STOP
002F ; [*02FF.0020.0002.002F] # SOLIDUS
003A ; [*0241.0020.0002.003A] # COLON
003B ; [*023E.0020.0002.003B] # SEMICOLON



With a little more patience, and some stumbling through the algorithm, I may be able to derive an explanation for the sorting behavior I've observed. But it's too bad that it can't just be reduced to a brief explanation in words, like "ignore a hyphen when it's followed by another alphanumeric character." The problem is just too elaborate now.