Part 4 of 5 in our Modern Performance Management series.
Here's a finding that should make every HR leader uncomfortable: research suggests that up to 62% of the variance in performance ratings reflects idiosyncratic rater bias, meaning the manager doing the rating, rather than the actual performance of the employee being rated.
That's not a theoretical problem. When performance ratings drive compensation, promotion, and reduction-in-force decisions, rater bias shapes careers. The manager you happened to report to matters more than it should.
Performance calibration is the most direct tool available to fix this. When it's run well, it works. When it's run badly, it's a conference room full of managers defending their ratings rather than examining them.
Here's how to run it well.
What calibration actually is
A calibration session brings together a group of managers to compare their performance assessments before they become final. The goal is alignment, ensuring that a "Meets Expectations" rating in Engineering means roughly the same thing as a "Meets Expectations" in Marketing, and that a manager who consistently rates everyone highly is checked against peers who see the same population differently.
It's not about forcing everyone to fit a bell curve. It's about surfacing inconsistencies and having the conversations needed to resolve them before they become pay decisions.
The common failure modes
| What goes wrong | Why it happens | Fix |
|---|---|---|
| Managers just defend their ratings | No shared framework for what ratings mean | Anchor ratings to behavioral examples before the session |
| HiPPO effect (highest paid person dominates) | No structure for equal participation | Facilitate with a neutral HR leader; present ratings anonymously first |
| Sessions focus only on outliers | Pressure to finish quickly | Review the full distribution, the top and bottom |
| Decisions don't stick | No documentation or accountability | Record all rating changes and rationale; revisit next cycle |
| Bias enters through the backdoor | Irrelevant information in the room (tenure, likability) | Focus sessions on documented evidence of performance vs. expectations |
How to run a calibration session that works
Before the session
Prepare the data. Managers should bring documented evidence of performance, their assessment, but specific examples. What did this person deliver? How did they handle a difficult situation? What do peers say?
Organizational network analysis (ONA) data is particularly useful here. It shows collaboration patterns, who people work with, who relies on them, who they support. A manager may not know that their "meets expectations" employee is the informal mentor for three other team members and a critical node in the team's information flow. ONA surfaces this.
Related reading: Step-by-Step Guide to Performance Calibration Sessions
During the session
- Start with the distribution, not the individuals. Show the group their aggregated ratings before discussing anyone specifically. A manager who rated 80% of their team "Exceeds Expectations" will see how that compares to peers.
- Discuss outliers in both directions. High ratings and low ratings both need justification. "She's just great" is not a justification, it's a feeling.
- Use behavioral language. "He delivers on time" is better than "he's reliable." Specific behavior is easier to discuss and more resistant to bias than personality assessments.
- Flag patterns explicitly. If one manager has never given a rating below "Meets Expectations" in five years, say so. Not as an accusation, as a data point to examine.
- Seek confirmation from peers. Before finalizing a change, ask if other managers in the room have information that supports or contradicts the proposed rating.
After the session
Document every rating change and the rationale behind it. This creates accountability, managers know their calibration decisions can be revisited, and it builds institutional knowledge for future cycles.
Communicate clearly to employees. If someone's rating changed in calibration, they shouldn't hear a different number than their manager discussed with them. Ensure managers are briefed on changes before they have final review conversations.
The question calibration can't answer
Calibration aligns ratings across managers. It doesn't make ratings accurate in the first place. A group of managers can all agree that someone is a "3 out of 5" and still be wrong if they're all working from incomplete or biased information.
This is why calibration works best when it's paired with multiple data sources: peer feedback, 360s, project outcomes, and network data, manager assessment. The more evidence in the room, the better the calibration.
Who should be in the room
- The managers being calibrated
- Their shared manager or director (as facilitator or observer)
- An HR business partner (as facilitator, not participant)
- No one who hasn't worked directly with the employees being reviewed
Keep the group small enough that everyone can speak. Sessions with more than 10–12 managers become unwieldy.
Next in this series: AI in Performance Management: Real Opportunities and Honest Pitfalls
Run fairer performance reviews with Confirm
Confirm brings ONA data, peer feedback, and calibration tooling into one place, so the evidence is in the room before decisions are made. See a demo →
