Why HOSR is the wrong KPI to start with
Handover Success Rate is a useful summary, but it is a terrible diagnostic. A cell with 98% HOSR can still have a specific neighbour pair that is dropping every other call. You need to break HOSR into its constituent parts: prepared, executed, completed, and the reason failures occur at each stage.
This article assumes Xn-based and N2-based handover (3GPP TS 38.300 Section 9.2.3 and TS 23.502). Conditional handover (CHO) and DAPS are mentioned where relevant.
The handover stages and where they fail
Measurement -> Decision -> Preparation -> Execution -> Completion
Each stage has its own failure mode. Engineers who only look at "HO Failure" without splitting by stage end up chasing the wrong root cause.
Preparation failures
The source gNB sends a HANDOVER REQUEST (Xn) or HANDOVER REQUIRED (N2) to the target. Preparation fails when:
- Target gNB rejects with
cause = no-radio-resources-available - Target gNB rejects with
cause = unknown-target-id(Xn config out of sync) - Target gNB rejects with
cause = invalid-qos-combination(5QI not supported on target) - N2 path: AMF cannot find the target NG-RAN node
Preparation rejects are visible as HO_PREP_FAIL counters with reason breakdowns. If you see a spike on a single neighbour pair, it is almost always Xn config drift or target congestion.
Execution failures (the common ones)
The source sends RRCReconfiguration to the UE and starts T304 (3GPP TS 38.331). The UE attempts random access on the target. Failures here:
- T304 expiry: UE never completes RA on target. Usually a coverage hole at the target cell edge or wrong target PCI.
- Random access failure on target: PRACH preamble failed, often because the assigned dedicated preamble (CFRA) collided or target PRACH config is wrong.
- RLF on source before HO command: late HO trigger, the source link died first.
- HO command lost: source RLC failed to deliver the RRCReconfiguration.
The distinction between T304 expiry and RA timeout matters. T304 is the UE's overall timer (typically 1000-2000 ms in NR); RA timeout is the lower-layer PRACH attempt limit. If your traces show RA failures with attempts < preambleTransMax, the UE is still trying — T304 just ran out first. Increase T304 cautiously: too long and the UE clings to a dead link.
Completion failures
The UE completes RA on the target and sends RRCReconfigurationComplete, but the path switch fails. On Xn-based HO, this is the path switch request to the AMF. On N2-based HO, the UE Context Release Command from source never arrives. Symptoms:
- UE briefly works on target, then drops
- Source still holds the context (zombie UE)
- Path switch reject with cause
multiple-PDU-session-id-instances
Common failure patterns
Late HO drop
The classic pattern: HO is triggered too late, the source link is already collapsing, and the UE never receives the HO command cleanly.
Indicators:
- High RLF count on source cells
- HO failures clustered on cells with steep coverage cliffs
- A3 event offset too high (UE waits too long to report)
Fix: lower a3-offset (typically from 3 dB to 2 dB), reduce timeToTrigger from 320 ms to 160 ms on fast-mobility cells, and verify hysteresis is not stacking with cell-individual offset.
Ping-pong
UE bounces between two cells. Visible in trace as alternating handovers within seconds. Causes:
- Hysteresis too low
- Time-to-trigger too short for the deployment
- Equal RSRP at the boundary (legitimate; needs CIO bias)
Wrong neighbour
UE reports a strong PCI that is not in the source's neighbour list, or the PCI maps to two different cells (PCI confusion). The HO never gets prepared because the source has no Xn link. ANR should pick this up but ANR has its own failure modes — confirm the ANR-detected neighbour count is increasing and that automatic Xn setup is enabled.
Beam-failure recovery interaction
In NR, BFR can mask incipient HO conditions. The UE recovers a beam on the source and never reports a measurement event that would have triggered HO. Then when BFR fails, the link drops. If you see RLF without preceding HO attempts on a beamformed cell, check BFR counter activity. Tightening beamFailureInstanceMaxCount shifts more events to HO instead of BFR.
Reading the PM counters
Generic counter mapping (vendor names differ):
| KPI Component | Typical Counter Name |
|---|---|
| HO attempts (intra-gNB) | pmHoExeAttIntraGnb |
| HO success (intra-gNB) | pmHoExeSuccIntraGnb |
| HO prep attempts Xn | pmHoPrepAttXn |
| HO prep failures Xn | pmHoPrepFailXn |
| HO exec failures (T304) | pmHoExeFailT304 |
| HO exec failures (RA) | pmHoExeFailRa |
Formulas:
- HO Prep SR = pmHoPrepSucc / pmHoPrepAtt
- HO Exec SR = pmHoExeSucc / pmHoExeAtt
- Total HOSR = HO Prep SR * HO Exec SR
Report per source-target cell pair, not per cell. A cell-level KPI hides the bad neighbour.
Diagnostic methodology
- Pull HOSR per cell pair for the last 24 hours.
- Identify pairs with HOSR < 95% and HO attempts > 50.
- Split failures into Prep, Exec-T304, Exec-RA, Completion.
- For Prep failures: check Xn association, neighbour config, target load.
- For T304 failures: check target coverage, neighbour relation accuracy.
- For RA failures: check target PRACH config, CFRA preamble allocation.
- For Completion failures: check AMF path switch logs, NGAP transport.
> Drive test or trace one failed HO end-to-end before changing parameters. Bulk parameter changes based on aggregate KPIs cause more problems than they solve.
Conditional Handover (CHO) note
CHO (TS 38.300 Section 9.2.3.4) prepares multiple targets in advance. Failure modes shift: more preparation load, less execution failure, but new patterns of stale prepared contexts. Counter pmCondHoCancAtt is your friend — high cancellation rates mean you are over-preparing.