Judging the Rumble
Whew! In case you didn't know, last weekend was the Rails Rumble - an annual, 48-hour sprint in which teams of 1-4 people build complete, functioning web applications from scratch. I'd competed in the two prior Rumbles (winning the solo division both years </shameless plug>), but this year I participated in a different way: I was one of the expert panelists tasked with judging the submitted applications prior to public voting. Surprisingly, I found judging to be at least as tiring as competing had been the previous years - the overall quality of submissions this year was noticeably higher than it's been in the past, which made the evaluations that much more difficult. As a panelist, I made a commitment to providing constructive comments on each of the applications I judged, as well as rating them on design and interface, completeness, innovativeness, and usefulness. I think I did a pretty good job with the comments; I've gotten thanks from several of the contestants for the feedback, which is nice. I wanted to clarify my particular approach to the ratings in this post, however, since it's the most opaque part of the process of judging. Each of us on the panel has a different background, so my interpretation of, say, usefulness might differ dramatically from some other judge's.
Design and Interface
When deciding what score to give for design and interface, I looked at several factors. Pleasant-looking sites rated higher for me than sloppier ones, and I gave a fair amount of weight to sites that went beyond a simple themed prototype look. Sites that used the standard error pages lost points, because that's an easy place to customize and make your site friendlier and more usable. I tried to go easy on minor flaws, since those are bound to occur in such a limited time period - but at the same time, I tried to reward sites that put on those finishing touches. More important than most of those factors, however, was message: if I couldn't understand what a site was for, then its design failed. I ran across several sites that lacked the appropriate contextual help to get me to the finish line, which was a problem.
Completeness
This is a hard category to judge, because who's to say if someone fulfilled their vision or not? Sometimes, it was easier - several teams made the mistake of talking about the features they wanted to implement but couldn't, or pointing out broken functionality. Here's a tip to future Rumble contestants: if you don't finish something, don't call attention to it. Fix it after the competition, but don't advertise your failures to the judges. Other applications had obvious problems - broken authentication and links, that sort of thing. Those were easier to pick out, but you always feel bad when that happens. At the same time, there were applications that gave me everything I could want and more.
Innovation
Here's another tough category, though it's not quite as bad as completeness. When evaluating innovativeness, I tried to figure out if the site represented something really new (either wholly original or a surprising revision, like putting multiplayer Asteroids on the web), or if it was Yet Another Twitter Filterer. The Rumble's interesting because there are usually a few categories of application that have four or five entries - this year, it was stuff-tracking apps (I loaned you X, give it back), secret crush apps, and movie night planning apps. I tend to dock points if you end up in one of those categories, or if you're a standard entry in an already crowded market. Even if you move into a popular niche, though, you can still score highly on innovation. The key to doing so is trumpeting your differences - why are you better? Too few apps do a good job of that, even when they have something worth marketing.
Usefulness
My primary metric for evaluating usefulness is myself: can I see myself using this app tomorrow? Next week? Frequently? Rarely? Obviously, this doesn't fit all sites, so when I'm not part of the target audience I try to project how they might feel about the app - and you could have an app that only needs to be used once to prove its helpfulness. This is the category where I'm most flexible in my ratings - partly because of the odd nature of the category, and partly because I feel it somewhat-unfairly penalizes games and other off-the-beaten path sites. So, that's how I did it. Hopefully this sheds some light on what can be an anxiety-producing process, and maybe even helps next year's competitors (who may have me to contend with again - I was very jealous of all the fun that seemed to be happening over the weekend!)
Update
Jim Minatel has also posted his thoughts on being an expert panelist over on the Wrox blog. It's nice to see how others were thinking, and I agree with a lot of what he said (though I'm of a different mind on the IE support).