Better estimating: SPIES (Subjective Probability Interval Estimates). Instead of thinking about the two endpoints, you estimate the probability of several possible outcomes – several intervals within the entire range of possible values. You first set up the inrevals to cover *all* possible outcomes. Then, one by one, you think about how likely each interval is and write down your estimate. Like this:

INTERVAL (Length of Project) | ESTIMATED LIKELIHOOD ———————————————————————————— Less than 1 month | 0% 1-2 months | 5% 2-3 months | 35% 3-4 months | 35% 4-5 months | 15% 5-6 months | 5% 6-7 months | 2% More than 8 months | 0%

From these likelihood estimates, you can then estimate a confidence interval. If you want 90 percent confidence, for example, you’d ignore the intervals that give you a total of 5 percent probability on the lower end (the less than 1 month and the 1-2 months intervals) and you’d also ignore the intervals that give you a 5 percent total probability on the the higer end ( the 6-7 months, the 7-8 months, and the more than 8 months intervas). Whatever is left is your 90 percent confidence interval : 2-6 months.

TEPCO’s engineers worked in what psychologists call a wicked environment. In such environments, it’s hard to check how good our predictions and decisions are. It’s like trying to learn how to cook without being able to taste the food. WIthout feedback, experience doesn’t make us into better decision makers. We don’t develop the skill to tell whether adding a teblespoon of salt will yield a bland soup or a salty mess.

Other types of problems – those in so-called kind environments – provide frequent feedback on how decisions turn out. In these environments, people *do* develop the ability to recognize patterns to make effective snap judgments. […] These experts get feedback all the time. […] They get to taste the metaphorical soup that they’re making.

But people who work in wicked environments never get a chance to get feedback on the quality of their decisions.

Like the pre-Ottawa Ankle Rules doctors, we often use our gut instinct to make decisions with ad hoc, rather than predetermined, criteria. For example, think about how we usuaully pick who should run an important, high-risk project. We might consider a pool of potential project managers, intuitively compare them, and then make a choice. But that would be letting our gut feelings lead us astray in a wicked environment.

Instead, we should develop criteria based on *the project*. We first figure out the essential skills that the project manager will need to be successful. We then compare potential candidates along these criteria by simply scoring them with a 1, 0, or -1. If we’re working with a group to make the decision, we independently score each person and then average the results. This gives us a numerical representaition of the overall strength of each person. Something like this:

```
SKILL - AVERAGE RATING | GARY | ALICE | SU-MI
——————————————————————————————————|———————|———————|——————
Engineering understanding | 1 | 1 | 0.25
Ability to connect with customer | -0.25 | 0.5 | 0.75
Ability to get internal buy-in | 0.5 | 0.75 | 1
——————————————————————————————————|——————-|——————-|————
Total score | 1.25 | 2.25 | 2
```

The process is simple, but it helps us avoid being blinded by a gregarious and personable employee who may lack the technical or organizational skills needed to succeed in the role (like Su-mi) or an engineering superstar who won’t be able to connect with customers (like Gary). And, of course, your list of criteria can be much longer, and you might weight some items more heavily than others.

After four futile months, the couple adopted a new approach. As the first step, they listed all the criteria that might matter to them, a dozen items ranging from the flow of the house to the quality of the neighborhood. Next, they prioritized the criteria using an online tool called a pairwise wiki survey. This tool randomly selected two criteria from the list, and each of them had to click on the item that they thought was more important

After making dozens of these choices, the tool calculated a score for each item, ranging from 9 (never preferred) to 100 (always preferred). “Good flow”, for example, had a score of 79, which meant that it had a 79 percent chance of being chosen when paired with a randomly selected item from the list of criteria. The couple used these scores to weight the criteria.

Then, when they saw a house, they independently gave a score of -1, 0, or +1 to each item. If they disagreed about a score, they took the average. The weighted sum gave them an overall score for each house. Here’s an excerpt from their spreadsheet showing their actual ratings for a few houses:

CRITERIA | WEIGHT | House D | House J | House T —————————————————————————— Crit 1 | 89 | 1 | 1 | 0 Crit 2 | 79 | 0.5 | 1 | 0.5 —————————————————————————— Total | weighted. | score | for house | (max: 168) | | 128.5 | 168 | 39.5 —————————————————————————— Total score for house as % (of 168). | | 76% | 100% | 23.5% ——————————————————————————

Create pairwise wiki survey at www.allourideas.org