Part 4 of 5 in our Modern Performance Management series.

Here's a finding that should make every HR leader uncomfortable: research suggests that up to 62% of the variance in performance ratings reflects idiosyncratic rater bias, meaning the manager doing the rating, rather than the actual performance of the employee being rated. Learn more about performance calibration at Confirm.

That's not a theoretical problem. When performance ratings drive compensation, promotion, and reduction-in-force decisions, rater bias shapes careers. The manager you happened to report to matters more than it should.

Performance calibration is the most direct tool available to fix this. When it's run well, it works. When it's run badly, it's a conference room full of managers defending their ratings rather than examining them.

Here's how to run it well.

What calibration actually is

A calibration session brings together a group of managers to compare their performance assessments before they become final. The goal is fit, ensuring that a "Meets Expectations" rating in Engineering means roughly the same thing as a "Meets Expectations" in Marketing, and that a manager who consistently rates everyone highly is checked against peers who see the same population differently.

It's not about forcing everyone to fit a bell curve. It's about surfacing inconsistencies and having the conversations needed to resolve them before they become pay decisions.

The common failure modes

What goes wrong	Why it happens	Fix
Managers just defend their ratings	No shared framework for what ratings mean	Anchor ratings to behavioral examples before the session
HiPPO effect (highest paid person dominates)	No structure for equal participation	Facilitate with a neutral HR leader; present ratings anonymously first
Sessions focus only on outliers	Pressure to finish quickly	Review the full distribution, the top and bottom
Decisions don't stick	No documentation or accountability	Record all rating changes and rationale; revisit next cycle
Bias enters through the backdoor	Irrelevant information in the room (tenure, likability)	Focus sessions on documented evidence of performance vs. expectations

How to run a calibration session that works

Before the session

Prepare the data. Managers should bring documented evidence of performance, their assessment, but specific examples. What did this person deliver? How did they handle a difficult situation? What do peers say?

Organizational network analysis (ONA) data is particularly useful here. It shows collaboration patterns, who people work with, who relies on them, who they support. A manager may not know that their "meets expectations" employee is the informal mentor for three other team members and a critical node in the team's information flow. ONA surfaces this.

During the session

Start with the distribution, not the individuals. Show the group their aggregated ratings before discussing anyone specifically. A manager who rated 80% of their team "Exceeds Expectations" will see how that compares to peers.
Discuss outliers in both directions. High ratings and low ratings both need justification. "She's just great" is not a justification, it's a feeling.
Use behavioral language. "He delivers on time" is better than "he's reliable." Specific behavior is easier to discuss and more resistant to bias than personality assessments.
Flag patterns explicitly. If one manager has never given a rating below "Meets Expectations" in five years, say so. Not as an accusation, as a data point to examine.
Seek confirmation from peers. Before finalizing a change, ask if other managers in the room have information that supports or contradicts the proposed rating.

After the session

Document every rating change and the rationale behind it. This creates accountability, managers know their calibration decisions can be revisited, and it builds institutional knowledge for future cycles.

Communicate clearly to employees. If someone's rating changed in calibration, they shouldn't hear a different number than their manager discussed with them. Ensure managers are briefed on changes before they have final review conversations.

The question calibration can't answer

Calibration agrees ratings across managers. It doesn't make ratings accurate in the first place. A group of managers can all agree that someone is a "3 out of 5" and still be wrong if they're all working from incomplete or biased information.

This is why calibration works best when it's paired with multiple data sources: peer feedback, 360s, project outcomes, and network data, manager assessment. The more evidence in the room, the better the calibration.

Who should be in the room

The managers being calibrated
Their shared manager or director (as facilitator or observer)
An HR business partner (as facilitator, not participant)
No one who hasn't worked directly with the employees being reviewed

Keep the group small enough that everyone can speak. Sessions with more than 10–12 managers become unwieldy.

Next in this series: AI in Performance Management: Real Opportunities and Honest Pitfalls

Run fairer performance reviews with Confirm

Confirm brings ONA data, peer feedback, and calibration tooling into one place, so the evidence is in the room before decisions are made. See a demo →

Want to see how Confirm handles this? Request a demo — we'll walk you through the platform in 30 minutes.