The KPI Framework for 5G NR

5G network performance is measured through Key Performance Indicators (KPIs) organized into three pillars: accessibility (can the user connect?), retainability (does the connection stay up?), and integrity (is the quality good enough?). This framework is standardized in 3GPP TS 28.554 (end-to-end KPI definitions) with the underlying PM counters defined in 3GPP TS 28.552 (NR performance measurements).

Unlike 4G KPIs, 5G introduces slice-aware KPIs, beam-level measurements, and dual-connectivity counters. An operator running NSA and SA simultaneously must track both NR and LTE-anchor KPIs.

Master KPI Table

The following table maps each KPI to its formula, the PM counters from TS 28.552, the typical tier-1 operator threshold, and the performance category.

KPIFormulaPM Counters (TS 28.552)ThresholdCategory
RRC Setup Success Rate (CSSR)(RRC Setup Complete / RRC Setup Request) x 100RRC.ConnEstabSucc.sum / RRC.ConnEstabAtt.sum>= 99.0%Accessibility
NG Setup Success Rate(NG Setup Success / NG Setup Attempts) x 100NGAP.ConnEstabSucc / NGAP.ConnEstabAtt>= 99.5%Accessibility
RRC Drop Rate(RRC Abnormal Release / RRC Connected UEs) x 100RRC.ConnEstabFail.sum / RRC.ConnMean<= 1.0%Retainability
E-RAB/QoS Flow Setup Success Rate(QoS Flow Setup Succ / QoS Flow Setup Att) x 100QosFlow.EstabSucc.sum / QosFlow.EstabAtt.sum>= 98.5%Accessibility
Session Drop Rate(Abnormal QoS Flow Releases / Total QoS Flow Releases) x 100QosFlow.AbnormalRel.sum / QosFlow.Rel.sum<= 0.5%Retainability
DL User ThroughputTotal DL PDCP SDU Volume / Total DL Active TimeDRB.PdcpSduVolumeDl / DRB.PdcpSduDelayTimeDl>= 100 Mbps (mid-band)Integrity
UL User ThroughputTotal UL PDCP SDU Volume / Total UL Active TimeDRB.PdcpSduVolumeUl / DRB.PdcpSduDelayTimeUl>= 20 Mbps (mid-band)Integrity
Latency (DRB)Mean DL PDCP SDU DelayDRB.PdcpSduDelayDl (in ms)<= 10 ms (eMBB)Integrity
Intra-gNB Handover Success Rate(Intra HO Succ / Intra HO Att) x 100HO.IntraGnbSucc / HO.IntraGnbAtt>= 98.0%Retainability
Inter-gNB Handover Success Rate(Inter HO Succ / Inter HO Att) x 100HO.InterGnbSucc / HO.InterGnbAtt>= 96.0%Retainability
RACH Success Rate(RACH Preamble Succ / RACH Preamble Att) x 100RACH.PreambleSucc / RACH.PreambleAtt>= 99.0%Accessibility
CQI Distribution% of samples with CQI >= 10L1M.RS-SINR.BinX distribution>= 70% above CQI 10Integrity
PRB Utilization (DL)(Used DL PRBs / Available DL PRBs) x 100RRU.PrbUsedDl / RRU.PrbAvailDl<= 70% (congestion warning)Capacity

Accessibility vs Retainability vs Integrity

Understanding which category a KPI falls into determines escalation paths and troubleshooting focus.

Accessibility KPIs measure whether users can establish connections. Poor accessibility typically indicates RF coverage issues (low RSRP/SINR), RRC congestion (max connected UEs reached), or core network signaling failures (AMF overload, SCTP failures). Retainability KPIs measure whether established connections stay active. Drops are caused by handover failures, radio link failure (RLF) due to weak coverage, transport congestion (F1/Xn backhaul saturation), or core network issues (GTP-U path failure). Integrity KPIs measure quality of the active connection. Poor integrity manifests as low throughput (high PRB utilization, poor CQI), high latency (scheduling delays, HARQ retransmissions), or jitter (affecting voice/video QoE).
CategoryFocusTypical Root CausesPrimary Counters
AccessibilityCan the user connect?Poor RSRP, RRC congestion, AMF failuresRRC.ConnEstabAtt, RACH.PreambleAtt
RetainabilityDoes it stay connected?RLF, HO failure, transport lossRRC.ConnEstabFail, HO.InterGnbFail
IntegrityIs quality sufficient?Low CQI, high PRB load, HARQ retxDRB.PdcpSduVolumeDl, L1M.RS-SINR
CapacityIs the network saturated?PRB utilization, connected UE countRRU.PrbUsedDl, RRC.ConnMean

Worked Example 1: Calculating CSSR

Scenario: A gNB in downtown Seoul (SK Telecom) reports the following PM counter values for a busy hour (17:00-18:00):
  • RRC.ConnEstabAtt.sum = 45,200
  • RRC.ConnEstabSucc.sum = 44,870
  • RRC.ConnEstabFail.sum = 330
CSSR Calculation: `

CSSR = (RRC.ConnEstabSucc.sum / RRC.ConnEstabAtt.sum) x 100

CSSR = (44,870 / 45,200) x 100

CSSR = 99.27%

` Assessment: This exceeds the 99.0% threshold. The 330 failures break down further by cause:
Failure CauseCounterCount% of Failures
T300 expiry (no RRC Setup)RRC.ConnEstabFail.T300Expiry18054.5%
Rejection due to overloadRRC.ConnEstabFail.Rej9528.8%
Other causesRRC.ConnEstabFail.Other5516.7%

The high T300 expiry count suggests weak coverage at cell edges. The overload rejections may indicate the gNB is hitting max RRC connected UE limits during peak hours. An RF optimization action (downtilt adjustment or SSB beam repointing) is warranted.

Worked Example 2: Drop Rate and Retainability Analysis

Scenario: An operator (T-Mobile US) monitors a cluster of 15 gNBs on n41 (2.5 GHz) in Chicago. Weekly aggregated counters:
  • QosFlow.Rel.sum = 2,340,000 (total QoS flow releases)
  • QosFlow.NormalRel.sum = 2,320,200 (normal releases -- user-initiated or inactivity)
  • QosFlow.AbnormalRel.sum = 19,800 (abnormal releases)
Session Drop Rate Calculation: `

Session Drop Rate = (QosFlow.AbnormalRel.sum / QosFlow.Rel.sum) x 100

Session Drop Rate = (19,800 / 2,340,000) x 100

Session Drop Rate = 0.846%

` Assessment: This exceeds the 0.5% threshold and requires investigation. The abnormal releases decompose as:
  • Radio link failure (RLF): 11,200 (56.6% of abnormal)
  • Handover failure: 5,400 (27.3%)
  • Transport layer failure: 3,200 (16.1%)

The dominant contributor is RLF. Cross-referencing with RSRP distribution shows 18% of samples below -110 dBm, indicating coverage holes. The handover failures correlate with inter-gNB X2/Xn delays exceeding 100 ms during congestion.

Corrective actions: Increase SSB transmit power by 3 dB on affected cells, adjust A3 handover offset from 3 dB to 2 dB to trigger earlier handovers, and verify F1 transport link capacity.

Real NOC Dashboard Thresholds

Tier-1 operators configure their NOC dashboards with multi-level thresholds that trigger different escalation paths. Below is a representative configuration based on published data from Deutsche Telekom's 5G NOC operations and Reliance Jio's network monitoring framework.

KPIGreen (Normal)Yellow (Warning)Orange (Minor)Red (Critical)
CSSR>= 99.0%98.0-99.0%96.0-98.0%< 96.0%
Session Drop Rate<= 0.5%0.5-1.0%1.0-2.0%> 2.0%
DL Throughput (avg)>= 100 Mbps50-100 Mbps20-50 Mbps< 20 Mbps
Latency (user plane)<= 10 ms10-20 ms20-50 ms> 50 ms
HO Success Rate>= 98.0%96.0-98.0%92.0-96.0%< 92.0%
PRB Utilization<= 60%60-75%75-85%> 85%
RACH Success Rate>= 99.0%97.0-99.0%95.0-97.0%< 95.0%

Deutsche Telekom reported that their 5G SA network in Germany maintained a monthly average CSSR of 99.4% and a session drop rate of 0.38% across approximately 18,000 gNBs in 2024. Jio's 5G SA network across India achieved CSSR of 99.1% with a DL throughput average of 220 Mbps on n78 (3.5 GHz) during the same period.

Advanced KPI Considerations

Slice-Aware KPIs

5G introduces S-NSSAI-level KPI reporting per TS 28.554 Section 6. Each KPI can be broken down per network slice. For example, a factory automation slice (SST=1, SD=0x000002) may have a latency KPI threshold of 5 ms, while a consumer eMBB slice (SST=1, SD=0x000001) may tolerate 15 ms.

Beam-Level KPIs

Massive MIMO deployments require per-beam performance monitoring. The PM counter L1M.RS-SINR.BinX.BmIdx provides SINR distribution per SSB beam index. Operators use beam-level SINR to detect blocked beams (e.g., new construction obstructing a specific azimuth) and trigger beam reconfiguration.

EN-DC KPIs (NSA Mode)

In NSA mode, the UE maintains dual connectivity to both LTE (master node) and NR (secondary node). Additional KPIs include:

  • SgNB Addition Success Rate: (SgNB Add Succ / SgNB Add Att) x 100; threshold >= 95%
  • SgNB Abnormal Release Rate: (SgNB Abnormal Rel / SgNB Rel) x 100; threshold <= 2%
  • SCG Failure Rate: Tracks secondary cell group radio link failures

KPI Correlation Patterns

Experienced NOC engineers recognize correlations that accelerate troubleshooting:

  1. Low CSSR + High RACH failure: Points to uplink coverage issue (UE cannot reach gNB). Check UL noise floor and RACH configuration.
  2. High drop rate + High HO failure: Mobility-related drops dominate. Check neighbor relations, A3/A5 event thresholds, and X2/Xn transport.
  3. Low throughput + High PRB utilization: Capacity exhaustion. Consider carrier aggregation, additional spectrum, or traffic offload.
  4. High latency + Normal PRB utilization: Scheduling inefficiency or transport bottleneck. Check PDCP/RLC retransmissions and backhaul RTT.

PM Counter Collection

PM counters are collected via the O1 interface (using NETCONF/YANG or file-based reporting) at configurable intervals -- typically 15-minute or 1-hour granularity per TS 28.550. The counters feed into OSS/BSS platforms (Ericsson ENM, Nokia NetAct, Huawei iManager U2020) where KPIs are computed and visualized.

For real-time KPI monitoring, operators increasingly deploy streaming telemetry using gRPC-based interfaces that push counter updates every 1-10 seconds, enabling near-real-time anomaly detection and automated optimization loops.

Key Takeaway: 5G KPIs are organized around accessibility, retainability, and integrity, each mapped to specific PM counters in TS 28.552. Mastering the formulas, understanding threshold levels, and knowing how to decompose aggregate KPIs into root causes is the core competency of 5G network operations.