23 December 2011

So long for now

The car for D.S. would pull up hereMy engagement here is complete, but they haven't asked me to return my badge just yet.

20 December 2011

Quick with a Sharpie

This is why I have loved working with this client for nearly three years: everyone here lives at the intersection of unstinting accuracy (I am acquainted with the copy editor who marked up the milk carton) and the willingness to share a joke. This is what Russell Baker called being serious, not solemn.

19 December 2011

Links roundup 2

More interesting links in the read-and-file pile.

16 December 2011

Soft launch

We're rolling out a new look for live events (music now, but next year, who knows?). The Tiny Desk Concerts are in the process of being migrated to this new wide-screen theater-like experience. As usual, my contributions were down in the gritty works, rather than up front. But I was very happy to be on the team.

13 December 2011

Links roundup

Lots of interesting material accumulating in my Instapaper account that I need to read and/or shuffle into my bookmarks repository and/or link to here.
  • Swizec Teller and his commenters have been working on coding a Turing machine in JavaScript in as little source code as possible. (I was about to write "as compactly as possible," but optimization in space and time of this little beastie is a project for another day.)
  • Man, I need me a Directive 1.
  • Wonderful vintage video of LEO, Lyons Electronic Office, placed into service 17 November 1951. LEO was built, not by a business machines manufacturer, but by J. Lyons & Co., a large British baking firm and chain of tea shops.
    LEO was such a success that Lyons set up a commercial subsidiary to sell spare time on LEO to other businesses, including the Ford Motor Company, which used it to process the payroll for the thousands of workers at its U.K. plant. Later, Lyons also built entirely new LEOs and sold them to other blue-chip companies of the era. In total, more than 70 LEOs were built, with the last remaining in service until the 1980s....
  • Peter Norvig gives a balanced appraisal of Christopher Strachey's "System Analysis and Programming," written for the September 1966 issue of Scientific American. In the original article (available online), Strachey walks through the process of analyzing, designing, and coding a program to play checkers. Unfortunately, Strachey probably never compiled (by hand: at the time, his high-level CPL language had no compiler, nor even a complete formal description) and executed his demonstration program, as it has typos and bugs. But the trick (borrowed from Arthur Samuel) that he uses to number the squares of a checkerboard is quite clever.
(Links via @NPRTechTeam, The Code Project, and others.)

07 December 2011

dmr

Nice recap by Warren Toomey of the genesis of the ultimate under-the-radar project, Unix.
But [Ken] Thompson and the others helping him knew that the PDP‑7, which was already obsolete, would not be able to sustain their skunkworks for long. They also knew that the lab's management wasn't about to allow any more research on operating systems.

So Thompson and [Dennis] Ritchie got crea­tive. They formulated a proposal to their bosses to buy one of DEC's newer minicomputers, a PDP-11, but couched the request in especially palatable terms. They said they were aiming to create tools for editing and formatting text, what you might call a word-processing system today. The fact that they would also have to write an operating system for the new machine to support the editor and text formatter was almost a footnote.

05 November 2011

This week's mystery

I'm building a little interface between my client's CMS and Flickr. The new tool will harvest images that the client (and its photographers on assignment) has posted to Flickr and will incorporate them into the CMS, with editors adding additional metadata subsequent to publication.

First let me say how impressed I am overall with the Flickr API. The services are comprehensive; names of things are consistent and parallel; each service has its own web-based test harness so you can try it out. And the API Just Works.

Now, to get the full-size images originally posted to Flickr, my interface app has to authenticate to Flickr using the OAuth protocol. This is somewhat fiddly code, but Flickr's documentation explains the process flow quite well, and with the help of the Scribe library from Pablo Ferndandez, I had my app working in short order.

Until this past week, that is. A few days ago I was actively working on my app and doing a lot of debugging and banging through the user authorization page. And then, sporadically, instead of the expected "[David's app] wants to link to your Flickr account." the page would display this message: "Oops! Flickr doesn't recognise the "oauth_token" this application is trying to use." Long story short, I tried just about everything, wiggling all the wires that I could, logging out of Flickr, restarting my Tomcat and Apache—and then sometimes the right page would appear and sometimes it wouldn't.

It turns out that other developers are experiencing the same problem. See, for instance, this discussion thread on the Flickr API group, as well as this post to Scribe's support board. No replies from Flickr's tech team as to what's going on. I suspect that Flickr is silently throttling the number of requests to the authorization page. This may not be a problem for my client when this tool goes into production, but it makes it dang hard to code and test.

28 October 2011

two EdsEd the fish, Betta sp., graciously agreed to pose with me and one of the honors the team has received recently, a 2011 Edward R. Murrow Award granted by the Radio Television Digital News Association. The award recognizes NPR.org in the Radio-Network Market for excellence as a Broadcast Affiliated Website.

24 October 2011

5 pages

Sometimes a photo of a whiteboard isn't such a creative, cheerful thing. The horror! the horror! As clear as I can read, one of the tables in this query is called BEST_MURDER.

Two for one

With Zach Brand as co-author (and curator of penguin images) (and editing by Kim Bryant and Wright Bryan), I put together a post that describes the team's recent forays into low-impact skunkworks, known as Serendipity Days. Siteworx graciously accepted the material as a cross-post.

20 October 2011

Compact

Lea Verou explains some simple, clever ways to do your own bitpacking in JavaScript, including a sneaky way to break up text into 16-character chunks with a regular expression match.

18 October 2011

What if...?

Registration is still open for the 2011 Grace Hopper Celebration of Women in Computing. This year's conference will be held in Portland, Ore., in November. Career fair, workshops, and more.

25 September 2011

STEM outreach

Ari Levey profiles Maria Klawe, president of Harvey Mudd College (one of the Claremont colleges). Since her arrival in 2006, the percentage of female computer science majors at Mudd has more than tripled, to 42%.

(Link via Felicia Day.)

Clear as a bell

Saul Stahl explains how the Gaussian distribution, friend and bane of every new statistics student, came to be, well, normal. It's far from obvious that the best estimator of some unknown (but presumably fixed, measurable) quantity is the one that minimizes the squared errors of all the imperfect observations, but that's what we do when we compute a mean. And indeed, in the 17th and 18th centuries some scientists (like Robert Boyle of the Royal Society) argued that averaging all observations was a bad way to summarize data:

... experiments ought to be estimated by their value, not their number; ... a single experiment... may as well deserve an entire treatise.... As one of those large and orient pearls... may outvalue a very great number of those little... pearls, that are to be bought by the ounce...


Read as much or as little of the math in Stahl's paper as you care to.

(Link via The Endeavour.)

09 September 2011

31 August 2011

Wish I had learned this

John D. Cook reviews Peteris Krumins's new e-book, Awk One-Liners Explained. The need for one-line programs accounts for the long-tail popularity of an old command-line language like Awk. As Cook writes,

If something takes more than one line of awk, I probably don’t want to use awk.


My copy of Aho, Kernighan, and Weinberger has this one-liner pencilled on the back page. I've never found a simpler way to find the longest pathnames in a directory tree:


find . | awk '{print length($0) " " $0}' | sort -r -n


As I remember, I once prototyped a COBOL pretty-printer in Awk. That was more than a one-liner.

26 August 2011

Taling to the duck

Harry Roberts offers coding conventions for writing CSS. His tip about pattern-matching selectors with regular expressions is nifty.

11 August 2011

Not many happy campers

Erin Griffith surveys media companies and finds very few of them that are successful with the CMS (content management system) they use—be it open source, proprietary, or purpose-built.

“When you try to build a product that works for everybody, it works for nobody,” a former AOL employee says.


(Link via The Morning News.)

27 July 2011

Geek girls return

Anna Lewis interviews Fog Creek Software's Leah Hanson, currently that organization's only woman intern.

Q: As you know, Fog Creek would like to attract and hire more developers who are women. Is there anything you’d recommend we do in our recruiting process to attract more women?

Leah: ...one of the things that happens is that women don’t even think they’re qualified for something because it’s advertised in competitive language. The language of competition not only doesn’t appeal to many women, it actually puts them off. Google advertises their Summer of Code with very competitive language. In 2006, GNOME received almost two hundred GSoC applicants – all male. When GNOME advertised an identical program for women, but emphasizing the opportunities for mentorship and learning, they received over a hundred highly qualified female applicants for the three spots they were able to fund.




Lewis leads the post with an excerpt from the April 1967 Cosmopolitan, which makes the point that programming is very Cosmo girl, especially when you get to use the cool light pen. And for a time, women were attracted to the field, but the proportion of female CS majors peaked in the mid-1980's, when Ronald Reagan was president and hacking moved to the desktop.



(Link via The Code Project.)

Just in time for Adam Mansbach's 15 minutes

Ladies and gentlemen, may I present the new NPR Books home page (with links to landing pages for each book and author featured)? As usual on my projects for this client, I was on the team that built back-end tools to manage the content (and automate extraction of it from third-party sources), while other specialists did the work to make the content look good to the outside world.

20 July 2011

Antikythera recap

Michael Edmunds and Tony Freeth review the computational tools and techniques used to analyze the Antikythera Mechanism, ranging from the exotic (Tom Malzbender's polynomial texture maps) through the imaginative (DNA sequence matching tools) to the mundane (Excel macros).

13 July 2011

Go geek girls!

Congratulations to the three winners of Google's first worldwide science fair:

* Lauren Hodge (age13-14): Hodge studied the effect of different marinades on the level of potentially harmful carcinogens in grilled chicken
* Naomi Shah (age 15-16): Shah endeavored to prove that making changes to indoor environments that improve indoor air quality can reduce people's reliance on asthma medications
* Shree Bose (17-18): Bose discovered a way to improve ovarian cancer treatment for patients when they have built up a resistance to certain chemotherapy drugs

Survey design 101

Back when I worked at Vovici, I saw my share of badly designed online surveys. Great sweeping masses of matrix questions were always popular, alas. One of the services we provided was consulting with our customers to make their surveys more sensible and thereby to improve completion rates.

Because I once had this professional interest, and because I actually have a graduate degree with a concentration in marketing and market research, I try to respond to solicitations to take a survey. And if I see that it's poorly designed, I have no compunction about bailing out after the first page. Forcing an answer to a question where my response is really "don't know/don't care" especially peeves me.

DCist's 2011 readership survey, therefore, comes as a pleasant surprise. It's short and to the effing point (three pages plus the thank-you page), that is, it's focused on getting business intelligence in just a few areas. All the intrusive (for some people) demographic questions are on the last page, where they should be, so the respondent can skip them if he chooses to. This is not the sort of survey you usually see hosted by the free service SurveyMonkey. The only quibble I have is that the demographic questions are required-response.

But it's this sequence of "don't care" response alternatives that tickles me.


8. What device(s) do you use to get your TV programming? Check all that apply.


  • Computer

  • A television

  • Apple TV/Google TV/Boxee

  • Game Console

  • Smartphone

  • Tablet (iPad, Playbook, etc.)

  • I don't watch TV

  • Other (please specify)



9. What service(s) do you use to access your TV shows? Check all that apply.


  • Local network TV

  • Cable/Satellite/FIOS TV

  • Network sites (NBC.com, ABC.com)

  • Free streaming video sites (Hulu, Veoh)

  • Premium streaming video sites (Hulu, Netflix)

  • Subscriber exclusive apps (HBO Go, TWCable App, etc.)

  • I don't watch TV

  • Other (please specify)



10. What kind of TV programming do you enjoy? Check all that apply.


  • Drama

  • News

  • Sci-fi

  • Sports

  • Movies

  • Educational

  • Reality

  • Comedy

  • Food & Home

  • I told you I don't watch TV, dammit!

  • Other (please specify)


28 June 2011

Quick reaction

We have a little corner of the application that's a low-volume, lightweight workflow and e-mail notification system. On demand, the web app assembles an e-mail message containing a summary of the article that the user has written; it then bundles that message into a mailto: URL, attaches the URL to a hidden <a> tag on the page, and simulates a click of the tag's hyperlink. The browser routes the request to the appropriate e-mail client (usually Outlook), the user adds any other desired information to the message, and sends it from the e-mail client.

We found that we had to fudge things a bit so that the various browsers we support don't complain about this indirection. The production JavaScript that was working smoothly up until last week follows. (The method name is a little funky, but I think I had a good reason for it at the time.)



//////
//
// Open a new window for an e-mail message with the specified properties.
//
// Arguments:
// subject, body, toAddress: expected to be URI encoded, special characters scrubbed
// mailToLink: DOM object to attach the mailto: link to
//
//////
CLIENT.Utilities.openEmailBaseWindow = function (subject, body, toAddress, mailtoLink) {
try {
var url = 'mailto:' + toAddress +
'?subject=' + subject +
'&body=' + body;
if (mailtoLink.click) {
//IE path: avoids the 'sending by e-mail' warning
mailtoLink.href = url;
mailtoLink.click();
} else {
var form = document.forms['mailto'];
//Firefox path
form.action = url;
form.submit();
}
} catch (e) {
//write error message to the log; show error to the user;
//drop hints that the browser may not be properly config'd for mailto:
}
};




This code uses the presence of the .click() method to detect browser capabilities. But this month's release of Firefox 5, which apparently caught some people unprepared, broke this JavaScript. Firefox 5 now defines a .click() method for anchor tags. So, in the code passage above, Firefox 5 takes the true path on the if statement (commented as the Internet Explorer path). Not a problem in itself, except that in this context, the .click() method doesn't do anything—more specifically, it does not cause the browser to navigate to the new URL.

Our current workaround is to disable the if test and to always use the form submission technique. Using the form, Internet Explorer 8 under Windows 7 produces two separate warning popups, but we can live with this wart: few of our users depend on IE.

16 June 2011

Murrow Awards

npr.org is the recipient of the Radio Television Digital News Association's 2011 award for best web site in the network radio grouping, along with awards for audio reporting: hard news, audio investigative reporting, and (with Youth Radio) audio news series.

15 June 2011

Content management ecosystems

Patrick Cooper talks to Matt Thompson in a good piece that paints the big picture of which a news organization's CMS is just one element.
We’ve finally begun to accept that no single CMS can handle all of a digital news organization’s content functions. A good content management system today is designed to interact with lots of other software. There’s now a genuine expectation that a CMS will play nicely with videos stored on YouTube, or comments managed by Disqus, or live chats embedded from CoverItLive. Other environments such as Facebook, Twitter and Tumblr come with their own suites of tools. And increasingly, what we call a “content management system” is actually a combo of multiple tightly-integrated systems.

31 May 2011

No argument here

Andrew Binstock rues that, 33 years after the first edition of The C Programming Language, Brian Kernighan and Dennis Ritchie's approach to teaching a language is still the gold standard.

The second thing K&R omits is spoon feeding. You have to think as you work through it. All the information is there, but you're forced to engage the language through the examples to get what you need. The authors expect you to be an attentive reader. As a result, you can move quickly through the language because the book supports you, rather than forcing you to read pages that add little to your comprehension.

30 May 2011

A great day in Arlington

reunionThough it lacks a lot of the swing of the original, 'twill serve. Gary Long and Greg Lupfer rounded up two dozen of their former staff for a reunion. These are folks that worked for Lupfer & Long, Inc. and/or L&L Software and/or L&L Products in the late 1970s into the mid 1980s. The various companies operated out of McLean, Va., and Hanover, N.H., providing professional services and off-the-shelf software products for accounting and general database applications. Our computing platforms included the mainframes of the day under timesharing; in the 1980s, we moved onto minicomputers from Digital Equipment, Prime, Wang, and Hewlett-Packard. Remnants of the companies made the transition to desktop computing and PCs toward the end of the decade.

Top row, L-R: Louise, Ceil, Peter (seated); next row: Steve, Ken, Anne, Susan; next row: Eric, John, Jenny, Gary, Hao; next row: ?, Joanne, Amy, Dave J., Donna, Bill; next row: Elizabeth, Julia, David G.; bottom left: Greg; bottom right: Aubrey.

29 May 2011

Shades of gray

Marcin Kozak proposes a hybrid design for scatterplots, using the best ideas of Edward Tufte and William Cleveland. Kozak's improvement really shines when applied to multipanel plots.

27 May 2011

Fonts for coders: 3

Dan Benjamin updates his top-ten list of monospaced fonts for coding work. I rather like the swingin' look of Monofur.

Benjamin is more interested in how fonts work at small point sizes than I am. The smallest type that my tired old eyes can tolerate, for reading the excessively wide log files that this project generates, is Bitstream Vera Sans Mono at 8 points.

(Link via The Code Project.)

18 May 2011

What? no man pages?

Farbice Bellard has built a Linux emulator in JavaScript. One dependency: the draft Typed Arrays feature, supported by Firefox 4 and Chrome 11.

(Link via ReadWriteWeb.)

16 May 2011

Supercalifragilisticexpialidocious

Colleague Todd directed our attention to the CSS word-wrap property. On newer browser platforms, it solves the problem of long unhyphenated words set in a narrow inset column or sidebar.

09 May 2011

Not to mention Rosalind and Grace

...you don't become great by trying to be great. You become great by wanting to do something, and then doing it so hard that you become great in the process.

xkcd offers Mother's Day wishes.

05 May 2011

Not just slinging rivets

Steven Cherry interviews LeAnn Erickson, director of Top Secret Rosies: The Female Computers of World War II. The difference engines that were used to compute artillery tables were cranked by workers of the feminine gender. Something as regrettable as waging war, throwing projectiles at people, at least had the favorable side effect of providing a few brain-job opportunities for women.

15 April 2011

Just sayin'

When the exec for your client's unit brags about how productive the team is, you have to link to it: Kinsey Wilson talks to Andrew Phelps of the Nieman Journalism Lab:
NPR’s digital team works on a bold schedule: Programmers work on two-week coding cycles to encourage rapid development. These so-called sprints encourage both failure and innovation. It’s what allowed NPR to develop its iPad app in one month, or two sprints, just in time for the iPad’s launch in April 2010. (That app just surpassed one million downloads.)

I asked Wilson, who used to run digital operations at USA Today, why it can be so difficult for other large organizations to churn out new projects — and how he’s able to do it now. “From my perspective, it comes from long, hard experience doing it badly,” he said. “Resources are always tight and so there’s probably a fear of burning too many cycles on something that…doesn’t go right.” But he said the rapid-release schedule encourages unconventional projects like I Heart NPR, and very few ideas are swatted down.

“The digital media staff here is about half the size of the one I had at USA Today and probably produces twice the output,” he said.


(Link via Javaun.)

08 April 2011

Scribing

More secrets of the Antikythera Mechanism revealed: last year James Evans et al. published evidence that subtle asymmetries in the zodiacal dial markings match slight differences in the sun's yearly apparent motion across the sky. Lisa Hopkins explains.

(Link via ReadWriteWeb.)

31 March 2011

Inklings

For the watch-later pile: Klint Finley rounds up several presentations on R. Hmm. I wonder what sort of project I could make with R for the statistical package and the Bird Phenology Project's data set.

28 March 2011

Might not have helped with the scores, though

Man, if we'd had this tool when I had a work-study job with the university music library... well, I wouldn't have had a job. Audrey Watters points to a demo by Matt Hodges of a prototype AR application for Android tablets. It automates a time-consuming, tedious, knee-wrinkling task: shelf reading.

01 March 2011

O tempora

Alas, my scrum team, once named for a Saturday morning cartoon series that I don't remember, is now named Jedi Squirrels, in honor of a bit of internet-dispersed Photoshoppery (that also turned into a troublesome test case for the CMS).

09 February 2011

"A high degree of chattiness"

Daniel Jacobson of Netflix explores strategies for coping with the explosive growth in API requests.
...we will also be looking at ways to handle partial response through the API. Our goal in this approach will be to conceptualize the API as a database.... We want the API to be able to answer questions with the same degree of variability that SQL can for a database.

03 February 2011

What did I do?

Wright Bryan is generous with the credits for Wednesday's update to the Shots blog. I did some back-end work, but I also (for once) worked on a module that you can see in the final product: I wrote the Perl script that scrapes links from Scott Hensley's Tweeted Times topics and drops them into the right-rail "popular news from the field" sections of the mega-category pages, like this one. Marc provided the insights; Todd and David W. provided the style.

01 February 2011

League tables

Andrew Binstock analyzes this year's release of the TIOBE Programming Community Index, a rough and ready measure of programming language popularity based on search engine hits. Bubbling under the hot 10, Ada and RPG are resurgent. Either that, or the methodology still needs some tweaking. At position 44 on the chart, PL/I is tanned, rested, and ready for its comeback.

31 January 2011

Fun with MySQL

The problem: Given a pair of tables in a header-detail relationship (or any one-to-many relationship), write a query that returns one row for each header row, including a column that is a comma-delimited vector of the subkeys that are found in the detail rows (for instance, line item numbers). For example:

Table INVOICE

Invoice  Date
-------  ----
AZ456    1-Jan-2010
AZ457    2-Jan-2010
AZ459    2-Jan-2010

Table INVOICE_LINE

Invoice  Line  Part
-------  ----  ----
AZ456    1     ZQX3
AZ456    2     WF612
AZ457    1     TY301
AZ459    3     TY301

Colleague Jared gave me this screwdriver from the MySQL toolbox: the GROUP_CONCAT() function.

SELECT I.Invoice, I.Date, GROUP_CONCAT(IL.Line) AS Vector
FROM INVOICE I
INNER JOIN INVOICE_LINE IL ON IL.Invoice = I.Invoice
GROUP BY I.Invoice

The results:

Invoice  Date        Vector
-------  ----        ------
AZ456    1-Jan-2010  1,2
AZ457    2-Jan-2010  1
AZ459    2-Jan-2010  3

In particular, we used this query to produce a table of photo image assets retrieved from the CMS (that met certain search criteria against the metadata like caption, photographer, agency). Each image asset has one or more crops in various aspect ratios (standard 4:3, square, wide 16:9). The search results table includes a vector of which crops are available for each image asset. In the UI, this vector is rendered as a nifty, compact row of gray and black icons designed by colleague Vincent.

This technique would work for any other attributes of the detail table, not just keys, in which case you may want to add the DISTINCT keyword to GROUP_CONCAT's argument. The SEPARATOR clause can overide the default comma delimiter.

14 January 2011

12 January 2011

Close enough

Michael Donohoe describes an update New York Times's Emphasis feature, which enables readers and bloggers to deep link to individual paragraphs within a story.

What's most interesting is the solution that the team devised to automatically assign durable keys to paragraphs of text, in a dynamic news environment when grafs are being added, subtracted, and moved around post-publication.

Each six-character key refers to a specific paragraph. Keys are generated by breaking a paragraph into sentences. Then, using the first and last sentences (which are sometimes the same), the key-generation code takes the first character from the first three words of each sentence.

* * *

For example, let’s look at this paragraph — the relevant characters for key generation are [set off as code]:

Last summer, after 10 years of debate and interagency wrangling, a prestigious committee from the National Academy of Sciences gave highest priority among big space projects in the coming decade to a satellite telescope that would take precise measure of dark energy, as it is known, and also look for planets beyond our solar system. The proposed project goes by the slightly unwieldy acronym Wfirst, for Wide-Field Infrared Survey Telescope.


The result is this key: LsaTpp


The matching process uses two means to perform a fuzzy equality match between the key derived from the current story text and the key on the hyperlink. If there is no perfect match, the 3-character partial keys for the first and last sentence are compared, in hopes of a match on either one. In the event that rewriting has changed the text that affects both parts of the keys, the Levenshtein distance between the linking key and the text's key is computed. If it's small enough, it's considered to be a match.

The Levenshtein metric makes perfect sense in this context, because the two keys have moved away from one another precisely because of insertion, deletion, and substitution. Andrew Hedges supplies a web app and JavaScript source for computing Levenshtein distances.

My only qualm with this strategy is that the keys it generates are not evenly distributed. There will be a lot of 3-character partial keys that start with "T" and "A."