Why "the throughput is bad" is not a layer

A user sees a stalled video. The KPI dashboard shows degraded DL throughput. Where do you look? PDCP, RLC, MAC, and the physical layer all have their own retransmission and discard mechanisms, and they fail in distinct ways. Knowing which layer is misbehaving is the difference between fixing the problem in five minutes and chasing it for two days.

This is the mental model I use when debugging NR radio protocols (3GPP TS 38.321 MAC, TS 38.322 RLC, TS 38.323 PDCP).

How each layer fails

MAC: HARQ retransmissions

MAC HARQ is fast (4-8 slot RTT in NR), per-TB, and stateless above the MAC layer. Failures show up as:

  • High BLER (block error rate) before HARQ
  • High residual BLER after HARQ (target is < 1% for eMBB)
  • HARQ NACK ratio rising without RSRP/SINR explanation

MAC failure does not retransmit forever. After maxHARQ-Tx (typically 4 for DL, 8 for UL), the TB is dropped and RLC takes over (if AM). Look for:

  • DL HARQ NACK rate > 10% with stable SINR — possible interference burst or wrong MCS/CQI mapping
  • UL HARQ NACK with stable PUSCH SINR — likely UL grant size vs payload mismatch

RLC: ARQ retransmissions (AM mode)

RLC AM retransmits PDUs based on STATUS PDU feedback. RTT is much longer than HARQ — tens to hundreds of milliseconds. RLC failure modes:

  • t-PollRetransmit firing repeatedly (sender suspects loss)
  • t-Reassembly expiring (receiver gives up waiting for missing SN)
  • Maximum retransmission count reached -> RLF declared (TS 38.331)

A cell with rising RLC retransmission ratio but stable HARQ BLER points to bursty losses where multiple HARQ processes failed simultaneously. Often this is co-channel interference or a transient handover transient.

PDCP: integrity, ciphering, duplicate detection, discard

PDCP does not retransmit on its own (except in PDCP duplication scenarios for URLLC). PDCP failures:

  • Integrity check failure -> packet dropped, COUNT desync usually follows
  • discardTimer expiry -> SDU dropped, never reached the air
  • Out-of-window SN -> duplicate or stale, dropped silently

PDCP integrity failures on SRB are catastrophic — TS 38.331 says they trigger RLF. Seeing one in a trace means the security context is broken (key derivation mismatch) or someone is replaying packets.

Mapping symptoms to layers

User symptomFirst checkLayer hint
Throughput floor at high SINRMCS index, max layers, CA configMAC scheduling
Throughput collapses periodicallyRLC retx ratio, t-Reassembly eventsRLC
Latency spikes but no lossRLC reordering window, PDCP reorder timerRLC/PDCP
Voice clipping (VoNR)PDCP discardTimer, RLC mode (UM vs AM)PDCP/RLC
RLF after Security ModePDCP integrity failuresPDCP
Random short stalls under mobilityHARQ NACK during HO, PDCP reorderMAC + PDCP

Reading a captured failure

Assume you have a vendor-internal trace with MAC, RLC, and PDCP layers visible (Ericsson CT-trace, Nokia BTS-CT, or open-source srsRAN logs).

Symptom: 200ms gap in user data

Look at the trace in this order:

  1. Was there a HARQ DTX/NACK burst? (check NACK count per HARQ process)
  2. If yes, did RLC AM trigger retransmission?
  3. Did t-Reassembly fire on the receiver?
  4. Did PDCP deliver in order, or was there a reordering hold?

A 200ms gap with no MAC errors but PDCP reorder timer expiry means the issue is upstream — likely a missed RLC PDU that never arrived because the RLC sender did not poll. Check pollPDU and pollByte thresholds.

Symptom: persistent low throughput

  1. Check MAC scheduler grants — is the UE getting BWP allocation it expects?
  2. CQI report — is it stuck at a low value?
  3. RLC buffer status — is data backing up at RLC?
  4. PDCP discard counter — are packets being discarded before transmission?

If PDCP discard is non-zero with healthy MAC throughput, the problem is bursty downlink scheduling combined with too-aggressive PDCP discardTimer. Increase discardTimer for the affected QoS flow (5QI-dependent).

Useful counters per layer

MAC

  • pmMacHarqDlNack / pmMacHarqDlAck
  • pmMacPduDlDiscard (TB drop after maxHARQ-Tx)
  • pmMacSchedReqRcvd vs grants issued

RLC

  • pmRlcArqDlRetransPdu
  • pmRlcDlSduDiscard
  • pmRlcUlSduSegmentRcvd
  • t-Reassembly expiries per UE

PDCP

  • pmPdcpDiscardDlPdu
  • pmPdcpIntegrityFailDl/Ul
  • pmPdcpDuplicateDiscardDl (duplication mode)
  • pmPdcpReordWinExpiry

Configuration parameters to tune

ParameterLayerTypical valueTuning hint
maxHARQ-TxMAC4 (DL) / 8 (UL)Lower for URLLC, higher reduces RLC load
t-PollRetransmitRLC50-100 msLower causes more STATUS, higher delays retx
t-ReassemblyRLC50 msToo low causes false discard
pollPDU / pollByteRLCdepends on bearerAffects feedback frequency
discardTimerPDCP100 ms (eMBB), 50 ms (VoNR)Lower drops more, higher buffers more
reorderingTimerPDCP50 msCritical for in-order delivery vs latency

A practical methodology

> Walk the layers in order: PHY/MAC first, RLC second, PDCP third. If you start at PDCP you will be debugging symptoms, not causes.

  1. Look at PHY KPIs: SINR, BLER, CQI distribution.
  2. Look at MAC: HARQ NACK ratio, scheduler grants.
  3. Look at RLC: retransmission ratio, t-Reassembly events.
  4. Look at PDCP: discard counters, integrity failures, reorder events.

Each layer's failure produces specific traces in the layer above. If you see RLC retransmissions, MAC had failures the layer below could not absorb. If you see PDCP discards, RLC could not deliver in time.

Takeaway: Radio protocol debugging is a vertical scan — start from MAC and work up; the failure is almost always at the lowest layer with abnormal counters.