15 May 2018

Judging expertise

One of the reasons researchers focus on experts instead of expertise is that it seems easier to find the one than the other — you can look to community-acclamation, or to the gold medalists, or to people with 2500+ Elo ratings and pick out “experts,” but expertise itself is harder to see — you can’t just rely on outcomes, because expertise is just one of the factors that influence performance.

But is there a way to judge expertise itself? Ideally, independently (to some extent) of outcomes? The expert-performance approach that Ericsson and Smith (1991) advocate— reliably superior performance on domain-representative tasks in a laboratory setting — is one attempt, but it’s far from clearly the best. Representative tasks are not the same as performance in the domain; solving a chess problem is not the same as staring down a game in progress against a skilled opponent, and predicting which way a soccer player will pass when a video is paused is not the same as setting yourself up to intercept that pass in the flow of a World Cup match.

So what would an ecologically valid approach look like? We might say that expertise is exhibited when you’ve got consistently superior performance in the domain across many contexts — so the golfer who wins only once in her career is judged to have less expertise than the one who wins year after year, on different courses, against different opponents, in different weather, etc. This seems like a more accurate approach, but it’s often not practical because the feedback cycles are too long; you can run subjects through scenarios in a lab much faster than you can run full chess or golf tournaments. Additionally, if you extend the evaluation time you aren’t really measuring expertise at any one point, since the subject may be learning as she goes .

Fundamentally, the problem with both approaches is that they’re based solely on outcomes. Ericssonian researchers look at judgements made and actions taken in the lab, measured against some ideal response (chess problems with a known best solution, for instance); ecological researchers look at actual games won and lost. What seems to be missing is a focus on how those outcomes are achieved. We don’t (usually) want to say that a chess player who gets into his opponent’s head and throws them off their game is more expert at chess than the person he beats (though that might be part of someone’s expertise in another domains, like the ability to put opponents on tilt in poker)

How, then, can we factor the methods and mechanisms people use to succeed into our judgements of expertise? Community recognition is one option — we look the people identified as top performers by their peers, relying on the community of practice to capture the history of performances and, implicitly, to be able to identify the underlying factors, the expertise, that led to past successes. Looking to the community also lets us factor in performance across different contexts, which helps minimize the effect of confounding variables.

A major problem with community recognition, though, is that it is subject to the same stereotypes and biases that afflict our judgements of other people. You see this clearly in the professional sports scout community (who aren’t a peer group for the athletes they judge, but illustrate many of the same benefits and problems of this approach); scouts have a long history of experience watching the top performers in their sport, and of making judgements about the potential of young athletes — but they’re wrong a surprising amount of the time because a kid “doesn’t have the right look,” not seeing the true extent of the underlying expertise (or potential).

This might incline us to swing the pendulum back the other way, and focus more on observable outcomes instead of on the invisible, or “intangible” qualities that, e.g., scouts look for. A pure focus on outcomes can be twisted around to exactly the opposite effect, though. For example: surgeons in some jurisdictions are given ratings based on their patients’ outcomes. The “best” surgeons are those with the lowest mortality, judged (by the press and the public) as the most expert. As it turns out, however, the simple binary of mortality combined with the publicity of the rankings means that the judgements are gameable — in fact, what you see happening in such systems is that cunning surgeons start turning away the most challenging patients for fear that their rating will fall, and the “best” surgeons (by other measures) end up taking more than their share of the hardest cases, which increases their mortality rates, which implies to the world at large that they’re less expert than their peers. This mismatch between judgement and reality can actively harm patients, which is the worst-case outcome.

Sadly, there doesn’t appear to be a universally-best option here. The expert-performance approach is (comparatively) easy; the ecological approach lets us average our jugements over a wide range of contexts; the community approach brings the possibility of understanding how results are achieved beyond just the final outcome.

I think the best option would start by exploring what expertise actually is for a given domain. What are the components — the knowledge, reflexes, and other factors acquired through training — that ground success in, say, playing the cello? And, once we identify those, how can we test for them (in or out of the lab)?