Blog post

Standard Calibration Framework for Multi-Portfolio Talent Reviews

A practical framework for running consistent talent calibration across portfolio companies — rating definitions, session structure, bias detection, and portfolio-level governance.

Standard Calibration Framework for Multi-Portfolio Talent Reviews
Last updated: March 2026

Standard Calibration Framework for Multi-Portfolio Talent Reviews

Running talent calibration at one company is an HR project. Running it across a portfolio of 8 to 12 companies is an operating discipline.

The difference is a framework. Without a shared structure, every portfolio company runs its own process, uses its own rating definitions, and produces data that tells you how each company internally ranks people but tells you nothing about how Company A's VP of Sales compares to Company B's. You get 10 separate answers to 10 separate questions when what you need is one consistent view.

This article lays out the framework. It's designed for PE operating partners and portfolio CHROs who want talent calibration that scales.

The core requirement: shared definitions

The first thing that breaks cross-portfolio calibration is rating scale disagreement. One portfolio company uses a 5-point scale where "3" means "fully meets expectations." Another uses a 4-point scale where "3" means "exceeds." A third uses narrative-only reviews with no numeric ratings at all.

Before any multi-portfolio calibration program can work, the fund needs to define:

1. The rating scale. Use a single scale across all portfolio companies for the leadership population being assessed. A 4-level scale (Exceeds, Meets, Developing, Not Meeting) works well for senior roles. A 5-level scale is fine but introduces a middle "Meets" bucket that often becomes a catch-all.

2. Rating definitions. Define each level in behavioral terms, not aspirational terms. "Exceeds" is not "goes above and beyond." It is: "Consistently delivers results beyond what the role requires, proactively solves problems before they escalate, and develops the capability of the team around them." The more concrete the definition, the less room for rater interpretation.

3. The leadership criteria. Define the 6–8 competencies being rated. For PE operating environments, a standard set includes: financial acumen, talent development, strategic execution, change management, cross-functional collaboration, results orientation, communication, and customer/market focus. Adjust for the specific portfolio company type, but keep the core criteria consistent.

4. Evidence requirements. Specify what a rater needs to provide to support a rating. At minimum: one specific behavioral example per rating given in the top or bottom category. This prevents ratings from drifting based on recent events or personal impression.

Getting these four definitions agreed on at the fund level, before calibration starts, is 80 percent of the work.

Session structure

Once the framework is defined, the calibration session structure is mostly the same across all portfolio companies. Here is the standard format:

Pre-session (1–2 weeks before)

  • Raters (typically: CEO, CHRO, and 1–2 operating partner representatives) complete initial ratings independently for each leader being reviewed
  • Each rater submits their ratings and supporting evidence before the session
  • CHRO or operating partner reviews submissions for obvious anchoring issues (all ratings at one level, evidence gaps) and flags them ahead of the session

Session (2–3 hours per 10–15 leaders reviewed)

  • Operating partner or CHRO facilitates
  • Each leader is reviewed in sequence: 5–10 minutes per person
  • Starting rater presents their rating and primary evidence
  • Other raters note agreement or disagreement, with evidence
  • Discussion surfaces until the group reaches a calibrated rating or surfaces a genuine disagreement requiring the CEO/operating partner decision
  • Final ratings are logged in real time

Post-session

  • Calibrated ratings are finalized and documented
  • Gaps are categorized: coaching need, role redefinition, or replacement
  • Action plan is assigned to named owners with timelines
  • Data is uploaded to the portfolio-level tracking system

The session itself is not where the real work happens. The real work is the pre-session evidence collection and the post-session action planning. Sessions without both become discussion groups rather than decision engines.

The bias detection layer

PE-backed companies face DEI reporting pressure from LPs at a level that most privately held companies don't. Calibration data creates a record that can surface demographic patterns in rating distributions, which is both a risk and an opportunity.

The risk: if ratings skew systematically by gender, race, or age, that pattern creates legal and reputational exposure. The opportunity: catching those patterns during calibration, before compensation and promotion decisions are finalized, is exactly the right time to address them.

Standard bias detection checks in multi-portfolio calibration:

  • Distribution review. After calibration, review the rating distribution by demographic group at each company. If there is a statistically significant gap (e.g., women at VP level are rated "Meets" at a higher rate than men in comparable roles), that requires a second-look session before ratings are finalized.
  • Recency bias flag. If a leader's rating shifts dramatically from the previous cycle based on one recent event (positive or negative), flag it for discussion. Good calibration captures performance over the full period, not the last 30 days.
  • Anchoring check. If one rater consistently rates higher or lower than peers across the board, their ratings need discussion before being accepted. This is especially relevant in cross-portfolio sessions where calibrators from different companies have different standards.

Most calibration software can automate the distribution review. The recency and anchoring checks require human facilitation.

Portfolio-level aggregation

Once calibration is complete at each portfolio company, the fund needs a way to aggregate the data. The output should answer:

  • What is the rating distribution across the full VP-and-above population, by portfolio company?
  • Which companies have the strongest leadership bench? Which have the most concentration of risk?
  • Are there talent movement opportunities — leaders at one portfolio company who would fill a gap at another?
  • What is the aggregate proportion of leaders in the "Not Meeting" or "Developing" categories who are on formal improvement plans vs. unaddressed?

This requires that the underlying calibration data uses the same rating scale, the same criteria definitions, and the same leadership levels. That is why the shared definitions are non-negotiable upfront.

The aggregation view is most valuable for the managing partner and fund leadership. It turns talent review from a company-by-company report into a portfolio-level risk and opportunity map.

Governance: who owns what

Multi-portfolio calibration programs fail when ownership is unclear. The standard governance model:

Role Owns
Operating partner Framework standards, portfolio-level view, escalated decisions
Portfolio CHRO Session facilitation, company-level execution, action plan tracking
Portfolio CEO Participation in sessions, final rating decisions for direct reports
Fund HR lead Cross-portfolio data aggregation, bias review, program improvement

When portfolio companies don't have a CHRO (common at smaller companies), the operating partner either facilitates directly or assigns a fund-level HR resource to the role. Leaving this position vacant means calibration won't happen consistently.

Common failure modes

Starting without defined criteria. Every multi-portfolio calibration program that fails does so because "we'll define what good looks like in the session." The session then consumes half its time on definitions and produces ratings that no one trusts.

Inconsistent facilitation. If the operating partner facilitates some sessions and delegates others to portfolio company HR teams without training, the process will produce incomparable data. Whoever facilitates must use the same process and standards.

No action plan ownership. Calibration that produces ratings but no assigned actions is documentation, not management. Every gap identified in calibration needs an owner, a defined action, and a timeline before the session closes.

Calibrating only once. Post-close calibration is the foundation. Annual calibration is the operating discipline. Funds that only run calibration at close find that by Year 2, they have no idea whether the leadership team has improved or deteriorated since the baseline.

Getting started

The fastest path to a working multi-portfolio calibration program:

  1. Month 1: Define the framework at the fund level. Rating scale, criteria definitions, evidence requirements. Get operating partner and fund HR alignment.

  2. Month 2: Pilot at one portfolio company. Run the full process. Document what worked and what didn't.

  3. Month 3: Refine the framework based on the pilot. Train portfolio CHROs at other companies.

  4. Month 4–6: Roll out to remaining portfolio companies on a staggered schedule. The stagger matters — running calibrations simultaneously at 8 companies overwhelms operating partner bandwidth.

  5. Ongoing: Annual calibration cycle at each company, aligned to performance review cycles. Quarterly portfolio-level talent review at the fund level.

The framework investment is front-loaded. Companies that get past month 3 consistently describe the program as one of the most useful operating tools in the portfolio.

See Confirm in action

See why forward-thinking enterprises use Confirm to make fairer, faster talent decisions and build high-performing teams.

G2 High Performer Enterprise G2 High Performer G2 Easiest To Do Business With G2 Highest User Adoption Fast Company World Changing Ideas 2023 SHRM partnership badge — Confirm backed by Society for Human Resource Management

Ready to see Confirm in action?