Blog post

Fixing Performance Reviews for Good

Performance reviews are broken, but not for the reasons we thought. Here's why—and how we fixed them.

We’re hearing a lot lately about the return of performance reviews. At Confirm, we began testing our performance review software 18 months ago. Now we run many cycles for our customers every day.

Performance reviews are broken, but not for the reasons we thought. Here's what we learned.

1. How we work isn't how we're measured

2. Employee performance follows a power law

3. Calibrations make bad manager ratings worse

How we work isn’t how we’re measured

After World War I, an industrial psychologist named Walter Scott introduced a consistent rating scale for the U.S. military to evaluate recruits. In the 1920s, this model was brought into the workplace, and the manager review was born.

Back then, manufacturing was booming. Typical factory jobs were repetitive and solitary. A line worker had just a few stakeholders: their manager, and perhaps a coworker to their left or right. And their manager had near-perfect visibility into the work they were doing.

In the 1930s, Nazi military psychologists led by Max Simoneit found that incorporating peer feedback lead to better officer selection. This method was adapted for American workplaces in the 1950s. After some refinement, and better branding, this method became the 360° review.

We’re still using these old methods to measure performance. But the way we work has changed. Today, anyone can log into Slack or Teams and message anyone else in the company. We form cross-functional teams to solve a problem. We connect from all over the world in Zoom rooms and Google Hangouts.

In short, we work in networks. But we still evaluate work in hierarchies.

No alt text provided for this image
We measure work in hierarchies. But we work in networks.

In the new world of work, the number of an employee’s touchpoints has gone up, but their manager’s visibility has actually gone down. Remote work makes it hard for managers to know what’s really going on. Zoom and Slack obscure conversations that used to happen in the open. Early research suggests that in hybrid and remote work, Dunbar’s number is much lower than it is in person.

At Confirm, we use organizational network analysis (ONA) to provide a quantitative view of performance based on every employee’s view of one another. ONA allows us to measure performance in the way it really happens: through networks.

No alt text provided for this image
Top performers impact dozens of their coworkers.

In our reviews, we find that top performers impact dozens of their coworkers. Not just their managers or the three or four peers who contribute to a 360° review. And that impact cuts across job functions, levels, and geographies.

Traditional performance reviews weren’t built for this world of work. The more networked our jobs become, the more broken our reviews will feel.

Employee performance follows a power law

When companies run traditional performance cycles, they produce normally distributed bell curves of manager ratings like this one.

No alt text provided for this image
Many companies force-fit manager ratings to a bell curve.

A bell curve is what you’d expect to find when the underlying variables you’re measuring don’t affect each other. For example, the distribution of height, or IQ.

In early manufacturing, a normal distribution made sense. Employees only interacted with a supervisor and a few coworkers within physical proximity. And because no employee could work faster than the rest of the manufacturing line, the best workers were the ones who made the fewest mistakes. So the majority of workers became “average,” with tails of above and below-average performers on either side.

Factory workers in the 1920s.
The way we work today is completely different.

In every performance cycle we’ve run at Confirm, we see employee performance follow a power law. Not just at the org level—within teams and job functions, too. It means that a small number of employees are driving disproportionate impact.

A power law of employee performance.
A small number of employees drive disproportionate impact.

Since we work in networks, this distribution is no surprise. The employees in the network do affect one another. Work isn’t solitary or repetitive the way it used to be. The factory workers above didn’t have a lot of say in how they assembled parts. But there’s a lot of variability in how employees plan a marketing campaign or design a new software feature.

Bill Gates understood this. He once said: “A great lathe operator commands several times the wage of an average lathe operator. But a great writer of software is worth 10,000 times the price of an average software writer.”

Today’s top performers aren’t trying to make the fewest mistakes. They’re trying to maximize impact. And the companies that can identify and retain them have a competitive advantage in the war for talent.

But most companies are stuck in the industrial past. So they squash the power law into a bell curve. The 10,000x employees that Bill Gates talked about will appear merely above average. And the impact of underperformers becomes overestimated.

No alt text provided for this image
Compressing the power law into a bell curve creates unintended consequences.

But which top performers will get squashed down? That question will be answered using another broken practice: calibration.

Calibrations make bad manager ratings worse

If you’ve ever wondered why terrible performers hang on while top performers go unrecognized, calibration is often the answer.

During calibrations, managers meet to compare their proposed employee ratings with other managers. Conceptually, ratings are adjusted with the goal of creating consistency across the organization. But in practice, calibrations are Game of Thrones sessions that often defer to the loudest or most important manager in the room. They introduce bias and reward politics.

No alt text provided for this image
Employees aren't in control of their career destinies.

It’s through calibrations that companies hit quotas by increasing the number of “1s” (thereby forcing more employees onto PIPs) or reducing the number of “5s” (thereby rewarding fewer employees with performance pay). In neither of those cases is the employee’s actual performance an input.

The real problem is what companies are calibrating against: a bell curve. And the only data they have to calibrate biased manager ratings to are other biased manager ratings!

ONA offers a new baseline to calibrate against. Only this baseline is built from every employee’s view of one another. And the network around an employee often has a different opinion than their manager.

No alt text provided for this image
Networks reveal what managers can't see.

In this example, Tracy and Michael are doing the same job. Tracy is performing exceptionally well according to dozens of people around her. But the company had too many employees receiving “Exceeds Expectations” ratings. So she was calibrated down to the same “Meets Expectations” rating that Michael got from a different manager.

From manager ratings alone, you might think Tracy and Michael are equivalent. They’ll receive the same merit compensation and development opportunities. But the ONA data from Confirm says that Tracy and Michael are completely different performers.

How often does this happen? Our research shows that managers over or underestimate their employee’s performance nearly half the time. Academic research suggests that more than 60% of a performance rating can be attributed solely to the idiosyncrasies of the manager.

If you were Tracy, you might wonder, “what do I need to do to get ahead?” Sadly, for many unrecognized top performers, the answer is simple: leave.

What’s next for performance reviews?

We used to hear about the death of performance reviews. They were supposed to be replaced by “continuous feedback” and “career pathing.”

Those are both great. But they don’t help leaders struggling to decide who to promote or PIP. Only performance measurement can do that.

We used to think of performance reviews as a nuisance. Now we understand them as the symptom of a bigger problem: the nature of our work has changed faster than our ability to measure it.

What does it mean for this problem to go unsolved? An alternate question might be, what does it mean for human progress if we never unlock the full potential of our smartest, most creative contributors? For them to spend their energy grasping for recognition instead of making an impact?

We're optimistic that we've found a solution. We owe it to our employees. The status quo deserves a PIP.

Joshua Merrill and David Murray, cofounders of Confirm

Special thanks to Zubin Davar, who always exceeds expectations.

Ready to see Confirm in Action?

See why forward-thinking enterprises use Confirm to make fairer, faster talent decisions and build high-performing teams.