Your leadership team has finished their performance reviews. Everyone worked through the ratings independently. Now you're realizing something: nobody used the same standard. See how Confirm handles performance calibration.
One manager rated their whole team 4s and 5s. Another handed out 2s like they were contagious. A third rated people based on who they liked. Three months later, you've got a salary and promotion system that's essentially random.
This is normal. But it's also fixable.
Calibration is the antidote. Get your managers in a room for 60 minutes. Look at how they rated the same people. Align on what "exceeds expectations" actually means. Do it once, and every review that comes after gets better.
This guide walks you through running your first calibration session. It's not theoretical. It's what works.
Why Calibration Matters
Without calibration, your ratings drift. Manager A thinks "exceeds" means "does the job well." Manager B thinks it means "exceptional performance." Manager C thinks it means "I feel good about them today."
You get the same word used three different ways. You get the same employee rated three different ways.
After calibration? Your managers have agreed definitions. "Exceeds expectations" means something. Your top performer doesn't end up rated as "meets" because their manager had a bad quarter. Your salary increases actually reflect work, not manager mood.
It's not magic. Bias still exists. But now it's visible, discussable, and correctable.
The Setup (Before the Meeting)
You need four things ready. People. Data. Definitions. Time.
Pick your participants (3–5 managers).Include the people who wrote the reviews you'll be looking at. Add their skip-level manager. Don't invite the whole company. Smaller groups move faster and people speak more openly.
Select 3–4 employees to calibrate.Pick people who were rated by different managers. Include a mix of ratings. If everyone got a 4, there's nothing to calibrate. If you have a 2, a 3, and two 4s, you've got real disagreement to work through.
Grab the review summaries.For each person, print or pull up:
- The rating each manager gave
- 2–3 key observations from their review
- One example of strong work and one of weaker work (if noted)
You need enough to understand why each manager rated them as they did. More is clutter.
Block 60 minutes on the calendar.Not 90. Not "as long as it takes." Sixty minutes. Constraints force you to move.
The Meeting: 60-Minute Breakdown
Minutes 0–5: Set the Frame
Start here. Don't skip this.
"We've done our first round of reviews. Today we're making sure we're fair and consistent. We're going to look at a few people and check whether we all mean the same thing when we rate performance. This isn't about changing reviews. It's about building a system where the same work gets the same rating, no matter who manages the person."
That's it. Say it and move on.
Minutes 5–15: Define Your Anchor (Meets Expectations)
Every rating hangs off one definition: what does "meets expectations" actually mean?
Ask the group: "What does a 3 look like? Not in theory. In practice. Describe someone on your team who clearly meets expectations."
Someone will volunteer a name. Don't name them publicly, just describe the work. "Someone who completes their core work, doesn't need constant correction, takes initiative on small improvements, but occasionally needs a reminder on deadlines."
Write that down. Word for word.
Now ask: "Does everyone agree that's a 3?" If someone says "I'd rate that person higher," that's diagnostic. Ask why. "What's the gap?" You'll hear: "They also mentor other people." Good. Now you know mentoring is part of a 4 in that manager's mind.
Refine the 3 definition based on the conversation. This takes 10 minutes. Get it close enough. It doesn't need to be perfect.
Minutes 15–25: Define Exceeds (Your 4)
"Given what we said about a 3, what's the gap to a 4?"
Someone will say: "They own projects instead of just executing them." Or "They mentor people." Or "They find and fix problems without being asked."
Write down three characteristics that separate a 3 from a 4. This is your 4 definition.
Again: does everyone agree? If one manager says "I'd rate that as a 3, not a 4," ask them what they'd need to see to make it a 4. The conversation, not your answer, creates calibration.
Minutes 25–35: Test on Real People
Pick the first employee. Read out the ratings each manager gave them. Nothing else yet.
"Sarah got a 3 from Marcus and a 4 from Priya."
Now ask Marcus: "Walk me through what you saw."
Marcus explains: "She executed the project well, but I didn't see her taking ownership beyond what I asked. She executed, but I had to steer."
Now ask Priya: "What did you see?"
Priya: "On my projects, she identified problems I didn't see and fixed them before I found them. She owned it end-to-end."
Silence. Then someone usually says, "Oh. Different projects. Different contexts."
Facilitator: "Right. Sarah executes strongly and needs direction on strategy. Marcus directs strategy. Priya doesn't. Is that fair?"
Marcus: "Yeah, that's fair."
Facilitator: "So does she own work or not?"
Marcus: "For technical implementation, yes. For strategic direction, she asks me."
Facilitator: "Is that a 4 or a 3 given our definition?"
Priya: "The way we defined it, a 4 owns projects. She owns technical projects. That's a 4 for technical."
Marcus: "I still think she needs more strategic ownership. But I hear you."
Facilitator: "Do we move her to a 4 or stay at a 3?"
The group decides. You document it. Move on.
This takes 5 minutes per person. Three people = 15 minutes.
Minutes 35–55: Repeat and Watch Patterns Emerge
Do this twice more. By the third person, managers start self-correcting. They'll say things like: "Based on what we defined, I think I underweighted that."
If you see a real outlier someone rated a person very differently catch it: "I'm noticing Sarah rated everyone high and Marcus rated everyone lower. Help me understand. What's your threshold for a 3 vs. a 4?"
The conversation fixes it. You're not telling them. They're figuring it out.
Minutes 55–60: Summarize and Commit
Read back what you agreed on:
"Here's what we defined today:
- Meets expectations: [your 3 definition]
- Exceeds expectations: [your 4 definition]
- For these three people, we landed on [ratings]"
Ask: "Does everyone commit to using these definitions in the next round of reviews?"
Get a verbal yes. You're done.
What You're Listening For
As you run the session, tune into three things:
Explicit standards. The group moves from vague ("They're just strong") to specific ("They own projects end-to-end and mentor others"). That's progress. Where disagreement actually lives. Sometimes two managers disagree because they value different things. Sometimes they disagree because they saw different work. Sometimes one manager is measuring against a different standard. You're trying to figure out which. Who adjusts and who doesn't. If someone hears an argument and says "I see what you mean, that changes my rating," that's calibration working. If someone digs in with "No, that's a 3 for sure," don't fight. Note the dissent and move on.Real Scenarios (What Actually Happens)
The Outlier Rater
Three managers rated Alex. Two said "meets." One said "exceeds."
Facilitator: "Sarah, you gave a 4. What did you see?"
Sarah: "Alex redesigned our onboarding flow. It cut time by 40%. That's exceeds."
Other manager: "I didn't know about the redesign."
Sarah: "She owns it. I just gave her the goal."
Facilitator: "So Alex took ownership of something beyond her core work?"
Sarah: "Yeah. That's why I rated her higher."
Facilitator: "Given our definition of exceeds, which includes taking ownership of projects, is that a 4?"
Consensus: Yes. The outlier wasn't wrong. They just saw different work.
The Grade Inflation Manager
Marcus has rated everyone a 4 or 5. Three rounds of calibration, same pattern.
Facilitator: "Marcus, you've rated everyone high. Help me understand what 'meets' looks like to you."
Marcus: "Someone who does their job well?"
Other manager: "Then everyone exceeds. We use exceeds for people doing exceptional work, owning projects, leading."
Marcus: "Oh. I thought exceeds meant doing well."
Facilitator: "For next cycle, exceeds is someone who owns projects and drives change. Meets is someone who executes their core role well. Does that change how you'd rate people?"
Marcus: "Yeah, a lot of these would be 3s."
Calibration win. Not a reflection on Marcus. He just had a different definition.
The Silent Rater
You invited a senior stakeholder who didn't interact much with the team. They're quiet.
Facilitator: "We haven't heard from you. What's your take on Alex?"
Stakeholder: "I don't work with her day-to-day, so I don't want to rate her."
Facilitator: "Fair. Did you observe anything about her work?"
Stakeholder: "Actually, yes. I saw her present Q1 strategy. She was really sharp about tradeoffs."
Facilitator: "That's useful. It's one data point from someone outside her day-to-day. Does that inform how you think about her ownership and judgment?"
Even limited perspective is useful. You're building a fuller picture.
After the Meeting
Document what you agreed on. Send a one-page summary to the group:
"We defined:
- Meets: [definition]
- Exceeds: [definition]
- Significantly exceeds: [definition if you covered it]
For the three people we reviewed, we landed on [ratings]."
Include one note about any patterns you noticed: "We noticed one manager rates higher on execution and lower on leadership. Another focuses on collaboration. Both are valid, but let's be intentional about what we weight."
That's your artifact. It becomes the basis for the next round of reviews.
What Success Looks Like
A successful first calibration session doesn't need perfection. You win if:
- Managers articulated why they rated people as they did
- The group landed on 3–4 explicit definitions
- At least one disagreement was discussed
- Someone said: "That changes how I'd rate going forward"
- You left with one-page of agreed standards
That's it. You don't need consensus on everything. You need to have made your standards visible.
The Real Move
Calibration isn't a one-time event. It's the start of a system.
Do this every cycle. Each time, standards get sharper. Ratings get more consistent. Employees stop wondering why they got a 3 when their peer got a 4 for similar work.
The first session feels awkward. Your team might worry about changing ratings or defending themselves. That's normal. You're asking people to make something explicit that's usually implicit. That's uncomfortable.
But it's the only way to build fairness into your system.
Ready to Calibrate?
Print this framework. Block 60 minutes. Invite your managers. Walk through the scenario.
The first session is the hardest. By the second one, people anticipate the questions. By the third, they're using the same definitions without you prompting.
If you run a team and haven't calibrated your first performance cycle yet, this week is the time.
Fairness doesn't happen by accident. It happens because you made the standards explicit and everyone agreed to apply them.
Want to see how Confirm handles this? Request a demo — we'll walk you through the platform in 30 minutes.
