Voice in 5G: From VoLTE to VoNR

Voice over LTE (VoLTE) revolutionized mobile voice by replacing circuit-switched calls with SIP-based sessions over the IMS (IP Multimedia Subsystem). Voice over NR (VoNR) extends this architecture to 5G NR with improvements in call setup time, voice quality, and QoS handling. Both services rely on the same IMS core and SIP signaling framework -- the primary difference lies in the radio access and bearer/QoS flow management.

3GPP defines IMS-based voice in TS 23.228 (IMS architecture), SIP procedures in TS 24.229 (SIP/SDP for IMS), and VoNR-specific enhancements in TS 23.501 clause 5.7 (IMS voice over PS sessions). As of 2025, the GSMA reports that VoLTE is deployed by 310+ operators in 140+ countries, while VoNR has launched on 45+ networks globally.

IMS Architecture for Voice

The IMS core uses a SIP-based architecture with the following key nodes:

IMS FunctionRoleProtocolLocation
P-CSCF (Proxy-CSCF)First SIP contact point, SIP proxy, IPSec/TLS with UESIP, Diameter (Rx)Visited or Home PLMN
I-CSCF (Interrogating-CSCF)Routes SIP to correct S-CSCF, queries HSS/UDRSIP, Diameter (Cx)Home PLMN
S-CSCF (Serving-CSCF)SIP registrar, session control, invokes application serversSIP, Diameter (Cx)Home PLMN
TAS (Telephony Application Server)Supplementary services (call forwarding, call waiting, conferencing)SIP (ISC interface)Home PLMN
MGCF/MGWInterworking with PSTN/CS domainSIP + ISUP/BICCHome PLMN
PCRF/PCFPolicy control, QoS authorizationDiameter Rx / HTTP/2 (Npcf)Home PLMN

In VoNR, the P-CSCF communicates with the 5G Core's PCF via the N5 interface (Rx equivalent using HTTP/2 SBI) to authorize dedicated QoS flows. In VoLTE, the P-CSCF uses the Diameter Rx interface to communicate with the PCRF.

IMS Registration Flow

Before any voice call can be made, the UE must register with the IMS. This involves SIP REGISTER messages exchanged between the UE and the IMS via the P-CSCF.

SIP Registration Message Sequence

StepDirectionSIP MessageKey Headers / Purpose
1UE -> P-CSCFREGISTER (initial)To: sip:user@ims.mnc260.mcc310.3gppnetwork.org, Contact: UE IP, Expires: 3600
2P-CSCF -> I-CSCFREGISTER (forwarded)Via: P-CSCF, Path: P-CSCF SIP URI
3I-CSCF -> HSS/UDRCx: UAR (User Authorization Request)Queries assigned S-CSCF or selects one
4I-CSCF -> S-CSCFREGISTERS-CSCF selected based on capabilities
5S-CSCF -> HSS/UDRCx: MAR (Multimedia Auth Request)Fetches IMS AKA authentication vectors
6S-CSCF -> UE401 UnauthorizedWWW-Authenticate: Digest with RAND, AUTN (IMS AKA challenge)
7UE -> P-CSCFREGISTER (with credentials)Authorization: Digest with RES, Security-Client: ipsec-3gpp
8P-CSCF + UEEstablish IPSec SABidirectional IPSec tunnel for SIP protection
9S-CSCF -> HSS/UDRCx: SAR (Server Assignment Request)Register user binding at S-CSCF
10S-CSCF -> UE200 OKService-Route: S-CSCF path, P-Associated-URI: public IDs

The IPSec security association (SA) established in step 8 protects all subsequent SIP signaling between the UE and P-CSCF. The UE uses the IMS-specific AKA credentials stored in the ISIM application on the UICC.

T-Mobile US measured a median IMS registration time of 320 ms on their VoNR network, compared to 480 ms on VoLTE. The improvement is attributed to the faster 5G NR air interface and reduced bearer setup time.

VoNR Call Setup -- SIP INVITE Flow

A mobile-originated VoNR call involves SIP signaling for session negotiation and parallel QoS flow establishment for the voice media.

Complete Call Flow

StepDirectionMessageProtocolKey Details
1UE -> P-CSCFSIP INVITESIPSDP offer: AMR-WB, EVS codecs; Precondition: required
2P-CSCF -> PCFNpcf_PolicyAuthorization_CreateHTTP/2 (N5)Request QoS for voice: 5QI=1, GBR=56 kbps
3PCF -> SMFPCC Rule updateN7 interfaceInstall dedicated QoS flow for voice bearer
4SMF -> UPFPFCP Session ModificationPFCP (N4)Add QER for GBR flow, PDR for voice SDF
5SMF -> AMF -> gNBQoS Flow setupN2/NGAPDedicated DRB for 5QI=1, GBR=56 kbps UL+DL
6gNB -> UERRC ReconfigurationRRCAdd DRB for voice QoS flow, ROHC profile
7P-CSCF -> S-CSCF -> TASSIP INVITE (routed)SIPTAS applies supplementary services
8S-CSCF -> Terminating sideSIP INVITESIPVia I-CSCF if inter-network
9Remote UEAlertingSIP180 Ringing (SDP answer may be included)
10P-CSCF -> UE180 RingingSIPRingback tone generated locally
11Remote UEAnswerSIP200 OK with SDP answer: selected codec, IP/port
12UE -> P-CSCFACKSIP3-way handshake complete
13Both UEsRTP media flowsRTP/UDPVoice packets on dedicated QoS flow

The SDP (Session Description Protocol) offer in the INVITE contains the codec preferences, IP address, and port. A typical VoNR SDP offer includes:

  • EVS (Enhanced Voice Services): 5.9--128 kbps, superior quality at 13.2 kbps
  • AMR-WB (Adaptive Multi-Rate Wideband): 6.6--23.85 kbps, most common VoLTE codec
  • AMR-NB (Adaptive Multi-Rate Narrowband): 4.75--12.2 kbps, fallback

Worked Example 1 -- Voice QoS Flow Bandwidth

For a VoNR call using EVS codec at 13.2 kbps with 20 ms frame duration:

Codec payload per frame:
  • EVS at 13.2 kbps, 20 ms frame: 13,200 x 0.020 / 8 = 33 bytes per frame
RTP/UDP/IP overhead per packet:
  • RTP header: 12 bytes
  • UDP header: 8 bytes
  • IP header (IPv6): 40 bytes (IPv4: 20 bytes)
  • Total overhead (IPv6): 60 bytes
Without ROHC (header compression):
  • Total packet: 33 + 60 = 93 bytes per 20 ms
  • Bandwidth: 93 x 8 / 0.020 = 37.2 kbps per direction
With ROHC (typical 3-byte compressed header):
  • Total packet: 33 + 3 = 36 bytes per 20 ms
  • Bandwidth: 36 x 8 / 0.020 = 14.4 kbps per direction

ROHC reduces the voice bandwidth requirement by 61%. This is why 3GPP mandates ROHC for VoLTE/VoNR, with profiles 0x0001 (RTP/UDP/IP) and 0x0002 (UDP/IP) configured in the PDCP layer.

The 5QI=1 QoS flow for voice is configured with a GBR of 56 kbps (to accommodate AMR-WB at highest rate + overhead), packet delay budget of 100 ms, and packet error rate of 10^-2, as defined in TS 23.501 Table 5.7.4-1.

Worked Example 2 -- Call Setup Time Analysis

VoNR call setup time is measured from SIP INVITE to 180 Ringing (alerting). Based on SK Telecom's 2025 VoNR performance report:

SegmentTimeNotes
UE SIP INVITE generation5 msSDP construction, SIP encoding
UE -> P-CSCF (over radio + transport)8 msVia IPSec SA
P-CSCF -> PCF QoS authorization12 msN5 policy request/response
PCF -> SMF -> UPF + gNB QoS flow setup25 msPFCP + NGAP + RRC Reconfig
P-CSCF -> S-CSCF -> TAS routing15 msSIP routing, supplementary service check
S-CSCF -> terminating IMS (same network)10 msTerminating S-CSCF lookup
Terminating UE paging + QoS setup45 msPage, RRC connection, QoS flow
Terminating UE SIP alerting10 ms180 Ringing generated
Total: INVITE to 180 Ringing~130 msSame-network VoNR call

SK Telecom reported a median VoNR call setup time of 1.2 seconds (INVITE to 200 OK, including user ring time), compared to 2.8 seconds for VoLTE. The sub-200 ms signaling latency for alerting represents a significant improvement in user-perceived responsiveness.

VoLTE vs VoNR: Technical Comparison

AspectVoLTEVoNR
Radio accessLTE (E-UTRA)5G NR
Bearer typeDedicated EPS bearer (QCI=1)Dedicated QoS flow (5QI=1)
Bearer setupPGW creates TFT, eNB adds DRBSMF creates QER/PDR, gNB adds DRB
Policy interfaceRx (Diameter) to PCRFN5 (HTTP/2 SBI) to PCF
Codec supportAMR-NB, AMR-WBAMR-NB, AMR-WB, EVS (primary)
Call setup (alerting)300--500 ms100--200 ms
Handover to CSSRVCC (Single Radio VCC)EPS Fallback or SRVCC via EPC
Typical MOS score3.8--4.1 (AMR-WB 23.85 kbps)4.2--4.5 (EVS 13.2 kbps)
ROHC profile0x0001, 0x00020x0001, 0x0002, 0x0006

EPS Fallback for Voice

In early 5G deployments where VoNR is not yet enabled, the network uses EPS Fallback to redirect voice calls to VoLTE:

MethodMechanismDelayUse Case
Redirection-basedRRC Release with redirectionCarrierFreqInfo to LTE500--800 msSimple deployment, no tight interworking
Handover-basedInter-RAT handover from NR to LTE before INVITE200--400 msBetter UX, requires X2/Xn interface
N26-basedInterworking via N26 (AMF-MME)300--500 msFull state transfer, seamless

Reliance Jio reported that 72% of their 5G voice calls in 2024 used EPS Fallback (redirection-based), with a plan to enable VoNR across all 5G SA sites by mid-2026. The additional 500--800 ms fallback delay was the primary motivator for VoNR enablement.

Voice Quality Metrics and Operator Benchmarks

KPIDefinitionVoLTE TargetVoNR TargetMeasurement Method
MOS (Mean Opinion Score)Perceptual voice quality (1--5)> 3.8> 4.0POLQA (ITU-T P.863)
Call Setup Success Rate (CSSR)Successful calls / total attempts> 99%> 99.5%Network KPI counters
Call Drop Rate (CDR)Dropped calls / total calls< 1%< 0.5%Network KPI counters
Post-dial delayINVITE to 180 Ringing< 3 s< 2 sSIP trace timing
E2E one-way delayMouth-to-ear latency< 150 ms< 100 msRTP timestamp analysis

Operator Voice Performance Data

OperatorServiceMedian MOSCSSRCDRPost-dial DelayCodec
T-Mobile USVoNR4.399.6%0.3%1.4 sEVS 13.2 kbps
SK TelecomVoNR4.499.7%0.2%1.2 sEVS 13.2 kbps
Deutsche TelekomVoLTE4.099.3%0.6%2.6 sAMR-WB 23.85 kbps
Vodafone UKVoLTE3.999.1%0.7%2.9 sAMR-WB 23.85 kbps
NTT DOCOMOVoNR4.399.5%0.3%1.5 sEVS 13.2 kbps

The EVS codec at 13.2 kbps consistently delivers higher MOS than AMR-WB at 23.85 kbps despite lower bitrate, due to EVS's superior coding efficiency with full-band audio (20 Hz -- 20 kHz) compared to AMR-WB's wideband (50 Hz -- 7 kHz).

IMS Emergency Calls over 5G

Emergency calls (e.g., 911 in the US, 112 in Europe) require special handling in IMS:

  1. The UE sets the Emergency indication in the PDU Session Establishment Request (TS 24.501 clause 6.4.1).
  2. The SMF establishes an emergency PDU session with priority QoS (5QI=69 for IMS signaling, 5QI=70 for media).
  3. The P-CSCF routes the emergency SIP INVITE to the E-CSCF (Emergency CSCF), which determines the appropriate PSAP (Public Safety Answering Point) based on the UE's location.
  4. Location information is conveyed via the SIP P-Access-Network-Info header and the GMLC (Gateway Mobile Location Centre).

3GPP defines emergency IMS procedures in TS 23.167 and emergency bearer handling in TS 23.501 clause 5.16.4.

Key Takeaway: VoNR delivers measurably better voice quality (MOS 4.2--4.5 vs 3.8--4.1 for VoLTE) and faster call setup (1.2--1.5 s vs 2.6--2.9 s) by leveraging 5G NR's lower-latency radio, dedicated QoS flows with 5QI=1, and the EVS codec. The underlying SIP signaling through the IMS is architecturally identical -- the improvements come from the radio access, QoS flow management, and codec evolution. Understanding the end-to-end SIP flow from REGISTER through INVITE, including the parallel QoS authorization via PCF, is essential for IMS and VoNR certification.