Why HOSR is the wrong KPI to start with

Handover Success Rate is a useful summary, but it is a terrible diagnostic. A cell with 98% HOSR can still have a specific neighbour pair that is dropping every other call. You need to break HOSR into its constituent parts: prepared, executed, completed, and the reason failures occur at each stage.

This article assumes Xn-based and N2-based handover (3GPP TS 38.300 Section 9.2.3 and TS 23.502). Conditional handover (CHO) and DAPS are mentioned where relevant.

The handover stages and where they fail

Measurement -> Decision -> Preparation -> Execution -> Completion

Each stage has its own failure mode. Engineers who only look at "HO Failure" without splitting by stage end up chasing the wrong root cause.

Preparation failures

The source gNB sends a HANDOVER REQUEST (Xn) or HANDOVER REQUIRED (N2) to the target. Preparation fails when:

Target gNB rejects with cause = no-radio-resources-available
Target gNB rejects with cause = unknown-target-id (Xn config out of sync)
Target gNB rejects with cause = invalid-qos-combination (5QI not supported on target)
N2 path: AMF cannot find the target NG-RAN node

Preparation rejects are visible as HO_PREP_FAIL counters with reason breakdowns. If you see a spike on a single neighbour pair, it is almost always Xn config drift or target congestion.

Execution failures (the common ones)

The source sends RRCReconfiguration to the UE and starts T304 (3GPP TS 38.331). The UE attempts random access on the target. Failures here:

T304 expiry: UE never completes RA on target. Usually a coverage hole at the target cell edge or wrong target PCI.
Random access failure on target: PRACH preamble failed, often because the assigned dedicated preamble (CFRA) collided or target PRACH config is wrong.
RLF on source before HO command: late HO trigger, the source link died first.
HO command lost: source RLC failed to deliver the RRCReconfiguration.

The distinction between T304 expiry and RA timeout matters. T304 is the UE's overall timer (typically 1000-2000 ms in NR); RA timeout is the lower-layer PRACH attempt limit. If your traces show RA failures with attempts < preambleTransMax, the UE is still trying — T304 just ran out first. Increase T304 cautiously: too long and the UE clings to a dead link.

Completion failures

The UE completes RA on the target and sends RRCReconfigurationComplete, but the path switch fails. On Xn-based HO, this is the path switch request to the AMF. On N2-based HO, the UE Context Release Command from source never arrives. Symptoms:

UE briefly works on target, then drops
Source still holds the context (zombie UE)
Path switch reject with cause multiple-PDU-session-id-instances

Common failure patterns

Late HO drop

The classic pattern: HO is triggered too late, the source link is already collapsing, and the UE never receives the HO command cleanly.

Indicators:

High RLF count on source cells
HO failures clustered on cells with steep coverage cliffs
A3 event offset too high (UE waits too long to report)

Fix: lower a3-offset (typically from 3 dB to 2 dB), reduce timeToTrigger from 320 ms to 160 ms on fast-mobility cells, and verify hysteresis is not stacking with cell-individual offset.

Ping-pong

UE bounces between two cells. Visible in trace as alternating handovers within seconds. Causes:

Hysteresis too low
Time-to-trigger too short for the deployment
Equal RSRP at the boundary (legitimate; needs CIO bias)

Wrong neighbour

UE reports a strong PCI that is not in the source's neighbour list, or the PCI maps to two different cells (PCI confusion). The HO never gets prepared because the source has no Xn link. ANR should pick this up but ANR has its own failure modes — confirm the ANR-detected neighbour count is increasing and that automatic Xn setup is enabled.

Beam-failure recovery interaction

In NR, BFR can mask incipient HO conditions. The UE recovers a beam on the source and never reports a measurement event that would have triggered HO. Then when BFR fails, the link drops. If you see RLF without preceding HO attempts on a beamformed cell, check BFR counter activity. Tightening beamFailureInstanceMaxCount shifts more events to HO instead of BFR.

Reading the PM counters

Generic counter mapping (vendor names differ):

KPI Component	Typical Counter Name
HO attempts (intra-gNB)	pmHoExeAttIntraGnb
HO success (intra-gNB)	pmHoExeSuccIntraGnb
HO prep attempts Xn	pmHoPrepAttXn
HO prep failures Xn	pmHoPrepFailXn
HO exec failures (T304)	pmHoExeFailT304
HO exec failures (RA)	pmHoExeFailRa

Formulas:

HO Prep SR = pmHoPrepSucc / pmHoPrepAtt
HO Exec SR = pmHoExeSucc / pmHoExeAtt
Total HOSR = HO Prep SR * HO Exec SR

Report per source-target cell pair, not per cell. A cell-level KPI hides the bad neighbour.

Diagnostic methodology

Pull HOSR per cell pair for the last 24 hours.
Identify pairs with HOSR < 95% and HO attempts > 50.
Split failures into Prep, Exec-T304, Exec-RA, Completion.
For Prep failures: check Xn association, neighbour config, target load.
For T304 failures: check target coverage, neighbour relation accuracy.
For RA failures: check target PRACH config, CFRA preamble allocation.
For Completion failures: check AMF path switch logs, NGAP transport.

> Drive test or trace one failed HO end-to-end before changing parameters. Bulk parameter changes based on aggregate KPIs cause more problems than they solve.

Conditional Handover (CHO) note

CHO (TS 38.300 Section 9.2.3.4) prepares multiple targets in advance. Failure modes shift: more preparation load, less execution failure, but new patterns of stale prepared contexts. Counter pmCondHoCancAtt is your friend — high cancellation rates mean you are over-preparing.

Takeaway: Diagnose handovers per cell pair and per failure stage — cell-level HOSR will hide the one bad neighbour killing your KPI.

5G Handover Failures: Root Causes and How to Diagnose Them