The KPI Framework for 5G NR
5G network performance is measured through Key Performance Indicators (KPIs) organized into three pillars: accessibility (can the user connect?), retainability (does the connection stay up?), and integrity (is the quality good enough?). This framework is standardized in 3GPP TS 28.554 (end-to-end KPI definitions) with the underlying PM counters defined in 3GPP TS 28.552 (NR performance measurements).
Unlike 4G KPIs, 5G introduces slice-aware KPIs, beam-level measurements, and dual-connectivity counters. An operator running NSA and SA simultaneously must track both NR and LTE-anchor KPIs.
Master KPI Table
The following table maps each KPI to its formula, the PM counters from TS 28.552, the typical tier-1 operator threshold, and the performance category.
| KPI | Formula | PM Counters (TS 28.552) | Threshold | Category |
|---|---|---|---|---|
| RRC Setup Success Rate (CSSR) | (RRC Setup Complete / RRC Setup Request) x 100 | RRC.ConnEstabSucc.sum / RRC.ConnEstabAtt.sum | >= 99.0% | Accessibility |
| NG Setup Success Rate | (NG Setup Success / NG Setup Attempts) x 100 | NGAP.ConnEstabSucc / NGAP.ConnEstabAtt | >= 99.5% | Accessibility |
| RRC Drop Rate | (RRC Abnormal Release / RRC Connected UEs) x 100 | RRC.ConnEstabFail.sum / RRC.ConnMean | <= 1.0% | Retainability |
| E-RAB/QoS Flow Setup Success Rate | (QoS Flow Setup Succ / QoS Flow Setup Att) x 100 | QosFlow.EstabSucc.sum / QosFlow.EstabAtt.sum | >= 98.5% | Accessibility |
| Session Drop Rate | (Abnormal QoS Flow Releases / Total QoS Flow Releases) x 100 | QosFlow.AbnormalRel.sum / QosFlow.Rel.sum | <= 0.5% | Retainability |
| DL User Throughput | Total DL PDCP SDU Volume / Total DL Active Time | DRB.PdcpSduVolumeDl / DRB.PdcpSduDelayTimeDl | >= 100 Mbps (mid-band) | Integrity |
| UL User Throughput | Total UL PDCP SDU Volume / Total UL Active Time | DRB.PdcpSduVolumeUl / DRB.PdcpSduDelayTimeUl | >= 20 Mbps (mid-band) | Integrity |
| Latency (DRB) | Mean DL PDCP SDU Delay | DRB.PdcpSduDelayDl (in ms) | <= 10 ms (eMBB) | Integrity |
| Intra-gNB Handover Success Rate | (Intra HO Succ / Intra HO Att) x 100 | HO.IntraGnbSucc / HO.IntraGnbAtt | >= 98.0% | Retainability |
| Inter-gNB Handover Success Rate | (Inter HO Succ / Inter HO Att) x 100 | HO.InterGnbSucc / HO.InterGnbAtt | >= 96.0% | Retainability |
| RACH Success Rate | (RACH Preamble Succ / RACH Preamble Att) x 100 | RACH.PreambleSucc / RACH.PreambleAtt | >= 99.0% | Accessibility |
| CQI Distribution | % of samples with CQI >= 10 | L1M.RS-SINR.BinX distribution | >= 70% above CQI 10 | Integrity |
| PRB Utilization (DL) | (Used DL PRBs / Available DL PRBs) x 100 | RRU.PrbUsedDl / RRU.PrbAvailDl | <= 70% (congestion warning) | Capacity |
Accessibility vs Retainability vs Integrity
Understanding which category a KPI falls into determines escalation paths and troubleshooting focus.
Accessibility KPIs measure whether users can establish connections. Poor accessibility typically indicates RF coverage issues (low RSRP/SINR), RRC congestion (max connected UEs reached), or core network signaling failures (AMF overload, SCTP failures). Retainability KPIs measure whether established connections stay active. Drops are caused by handover failures, radio link failure (RLF) due to weak coverage, transport congestion (F1/Xn backhaul saturation), or core network issues (GTP-U path failure). Integrity KPIs measure quality of the active connection. Poor integrity manifests as low throughput (high PRB utilization, poor CQI), high latency (scheduling delays, HARQ retransmissions), or jitter (affecting voice/video QoE).| Category | Focus | Typical Root Causes | Primary Counters |
|---|---|---|---|
| Accessibility | Can the user connect? | Poor RSRP, RRC congestion, AMF failures | RRC.ConnEstabAtt, RACH.PreambleAtt |
| Retainability | Does it stay connected? | RLF, HO failure, transport loss | RRC.ConnEstabFail, HO.InterGnbFail |
| Integrity | Is quality sufficient? | Low CQI, high PRB load, HARQ retx | DRB.PdcpSduVolumeDl, L1M.RS-SINR |
| Capacity | Is the network saturated? | PRB utilization, connected UE count | RRU.PrbUsedDl, RRC.ConnMean |
Worked Example 1: Calculating CSSR
Scenario: A gNB in downtown Seoul (SK Telecom) reports the following PM counter values for a busy hour (17:00-18:00):RRC.ConnEstabAtt.sum= 45,200RRC.ConnEstabSucc.sum= 44,870RRC.ConnEstabFail.sum= 330
`
CSSR = (RRC.ConnEstabSucc.sum / RRC.ConnEstabAtt.sum) x 100
CSSR = (44,870 / 45,200) x 100
CSSR = 99.27%
`
Assessment: This exceeds the 99.0% threshold. The 330 failures break down further by cause:
| Failure Cause | Counter | Count | % of Failures |
|---|---|---|---|
| T300 expiry (no RRC Setup) | RRC.ConnEstabFail.T300Expiry | 180 | 54.5% |
| Rejection due to overload | RRC.ConnEstabFail.Rej | 95 | 28.8% |
| Other causes | RRC.ConnEstabFail.Other | 55 | 16.7% |
The high T300 expiry count suggests weak coverage at cell edges. The overload rejections may indicate the gNB is hitting max RRC connected UE limits during peak hours. An RF optimization action (downtilt adjustment or SSB beam repointing) is warranted.
Worked Example 2: Drop Rate and Retainability Analysis
Scenario: An operator (T-Mobile US) monitors a cluster of 15 gNBs on n41 (2.5 GHz) in Chicago. Weekly aggregated counters:QosFlow.Rel.sum= 2,340,000 (total QoS flow releases)QosFlow.NormalRel.sum= 2,320,200 (normal releases -- user-initiated or inactivity)QosFlow.AbnormalRel.sum= 19,800 (abnormal releases)
`
Session Drop Rate = (QosFlow.AbnormalRel.sum / QosFlow.Rel.sum) x 100
Session Drop Rate = (19,800 / 2,340,000) x 100
Session Drop Rate = 0.846%
`
Assessment: This exceeds the 0.5% threshold and requires investigation. The abnormal releases decompose as:
- Radio link failure (RLF): 11,200 (56.6% of abnormal)
- Handover failure: 5,400 (27.3%)
- Transport layer failure: 3,200 (16.1%)
The dominant contributor is RLF. Cross-referencing with RSRP distribution shows 18% of samples below -110 dBm, indicating coverage holes. The handover failures correlate with inter-gNB X2/Xn delays exceeding 100 ms during congestion.
Corrective actions: Increase SSB transmit power by 3 dB on affected cells, adjust A3 handover offset from 3 dB to 2 dB to trigger earlier handovers, and verify F1 transport link capacity.Real NOC Dashboard Thresholds
Tier-1 operators configure their NOC dashboards with multi-level thresholds that trigger different escalation paths. Below is a representative configuration based on published data from Deutsche Telekom's 5G NOC operations and Reliance Jio's network monitoring framework.
| KPI | Green (Normal) | Yellow (Warning) | Orange (Minor) | Red (Critical) |
|---|---|---|---|---|
| CSSR | >= 99.0% | 98.0-99.0% | 96.0-98.0% | < 96.0% |
| Session Drop Rate | <= 0.5% | 0.5-1.0% | 1.0-2.0% | > 2.0% |
| DL Throughput (avg) | >= 100 Mbps | 50-100 Mbps | 20-50 Mbps | < 20 Mbps |
| Latency (user plane) | <= 10 ms | 10-20 ms | 20-50 ms | > 50 ms |
| HO Success Rate | >= 98.0% | 96.0-98.0% | 92.0-96.0% | < 92.0% |
| PRB Utilization | <= 60% | 60-75% | 75-85% | > 85% |
| RACH Success Rate | >= 99.0% | 97.0-99.0% | 95.0-97.0% | < 95.0% |
Deutsche Telekom reported that their 5G SA network in Germany maintained a monthly average CSSR of 99.4% and a session drop rate of 0.38% across approximately 18,000 gNBs in 2024. Jio's 5G SA network across India achieved CSSR of 99.1% with a DL throughput average of 220 Mbps on n78 (3.5 GHz) during the same period.
Advanced KPI Considerations
Slice-Aware KPIs
5G introduces S-NSSAI-level KPI reporting per TS 28.554 Section 6. Each KPI can be broken down per network slice. For example, a factory automation slice (SST=1, SD=0x000002) may have a latency KPI threshold of 5 ms, while a consumer eMBB slice (SST=1, SD=0x000001) may tolerate 15 ms.
Beam-Level KPIs
Massive MIMO deployments require per-beam performance monitoring. The PM counter L1M.RS-SINR.BinX.BmIdx provides SINR distribution per SSB beam index. Operators use beam-level SINR to detect blocked beams (e.g., new construction obstructing a specific azimuth) and trigger beam reconfiguration.
EN-DC KPIs (NSA Mode)
In NSA mode, the UE maintains dual connectivity to both LTE (master node) and NR (secondary node). Additional KPIs include:
- SgNB Addition Success Rate: (SgNB Add Succ / SgNB Add Att) x 100; threshold >= 95%
- SgNB Abnormal Release Rate: (SgNB Abnormal Rel / SgNB Rel) x 100; threshold <= 2%
- SCG Failure Rate: Tracks secondary cell group radio link failures
KPI Correlation Patterns
Experienced NOC engineers recognize correlations that accelerate troubleshooting:
- Low CSSR + High RACH failure: Points to uplink coverage issue (UE cannot reach gNB). Check UL noise floor and RACH configuration.
- High drop rate + High HO failure: Mobility-related drops dominate. Check neighbor relations, A3/A5 event thresholds, and X2/Xn transport.
- Low throughput + High PRB utilization: Capacity exhaustion. Consider carrier aggregation, additional spectrum, or traffic offload.
- High latency + Normal PRB utilization: Scheduling inefficiency or transport bottleneck. Check PDCP/RLC retransmissions and backhaul RTT.
PM Counter Collection
PM counters are collected via the O1 interface (using NETCONF/YANG or file-based reporting) at configurable intervals -- typically 15-minute or 1-hour granularity per TS 28.550. The counters feed into OSS/BSS platforms (Ericsson ENM, Nokia NetAct, Huawei iManager U2020) where KPIs are computed and visualized.
For real-time KPI monitoring, operators increasingly deploy streaming telemetry using gRPC-based interfaces that push counter updates every 1-10 seconds, enabling near-real-time anomaly detection and automated optimization loops.
Key Takeaway: 5G KPIs are organized around accessibility, retainability, and integrity, each mapped to specific PM counters in TS 28.552. Mastering the formulas, understanding threshold levels, and knowing how to decompose aggregate KPIs into root causes is the core competency of 5G network operations.