Voice in 5G: From VoLTE to VoNR
Voice over LTE (VoLTE) revolutionized mobile voice by replacing circuit-switched calls with SIP-based sessions over the IMS (IP Multimedia Subsystem). Voice over NR (VoNR) extends this architecture to 5G NR with improvements in call setup time, voice quality, and QoS handling. Both services rely on the same IMS core and SIP signaling framework -- the primary difference lies in the radio access and bearer/QoS flow management.
3GPP defines IMS-based voice in TS 23.228 (IMS architecture), SIP procedures in TS 24.229 (SIP/SDP for IMS), and VoNR-specific enhancements in TS 23.501 clause 5.7 (IMS voice over PS sessions). As of 2025, the GSMA reports that VoLTE is deployed by 310+ operators in 140+ countries, while VoNR has launched on 45+ networks globally.
IMS Architecture for Voice
The IMS core uses a SIP-based architecture with the following key nodes:
| IMS Function | Role | Protocol | Location |
|---|---|---|---|
| P-CSCF (Proxy-CSCF) | First SIP contact point, SIP proxy, IPSec/TLS with UE | SIP, Diameter (Rx) | Visited or Home PLMN |
| I-CSCF (Interrogating-CSCF) | Routes SIP to correct S-CSCF, queries HSS/UDR | SIP, Diameter (Cx) | Home PLMN |
| S-CSCF (Serving-CSCF) | SIP registrar, session control, invokes application servers | SIP, Diameter (Cx) | Home PLMN |
| TAS (Telephony Application Server) | Supplementary services (call forwarding, call waiting, conferencing) | SIP (ISC interface) | Home PLMN |
| MGCF/MGW | Interworking with PSTN/CS domain | SIP + ISUP/BICC | Home PLMN |
| PCRF/PCF | Policy control, QoS authorization | Diameter Rx / HTTP/2 (Npcf) | Home PLMN |
In VoNR, the P-CSCF communicates with the 5G Core's PCF via the N5 interface (Rx equivalent using HTTP/2 SBI) to authorize dedicated QoS flows. In VoLTE, the P-CSCF uses the Diameter Rx interface to communicate with the PCRF.
IMS Registration Flow
Before any voice call can be made, the UE must register with the IMS. This involves SIP REGISTER messages exchanged between the UE and the IMS via the P-CSCF.
SIP Registration Message Sequence
| Step | Direction | SIP Message | Key Headers / Purpose |
|---|---|---|---|
| 1 | UE -> P-CSCF | REGISTER (initial) | To: sip:user@ims.mnc260.mcc310.3gppnetwork.org, Contact: UE IP, Expires: 3600 |
| 2 | P-CSCF -> I-CSCF | REGISTER (forwarded) | Via: P-CSCF, Path: P-CSCF SIP URI |
| 3 | I-CSCF -> HSS/UDR | Cx: UAR (User Authorization Request) | Queries assigned S-CSCF or selects one |
| 4 | I-CSCF -> S-CSCF | REGISTER | S-CSCF selected based on capabilities |
| 5 | S-CSCF -> HSS/UDR | Cx: MAR (Multimedia Auth Request) | Fetches IMS AKA authentication vectors |
| 6 | S-CSCF -> UE | 401 Unauthorized | WWW-Authenticate: Digest with RAND, AUTN (IMS AKA challenge) |
| 7 | UE -> P-CSCF | REGISTER (with credentials) | Authorization: Digest with RES, Security-Client: ipsec-3gpp |
| 8 | P-CSCF + UE | Establish IPSec SA | Bidirectional IPSec tunnel for SIP protection |
| 9 | S-CSCF -> HSS/UDR | Cx: SAR (Server Assignment Request) | Register user binding at S-CSCF |
| 10 | S-CSCF -> UE | 200 OK | Service-Route: S-CSCF path, P-Associated-URI: public IDs |
The IPSec security association (SA) established in step 8 protects all subsequent SIP signaling between the UE and P-CSCF. The UE uses the IMS-specific AKA credentials stored in the ISIM application on the UICC.
T-Mobile US measured a median IMS registration time of 320 ms on their VoNR network, compared to 480 ms on VoLTE. The improvement is attributed to the faster 5G NR air interface and reduced bearer setup time.
VoNR Call Setup -- SIP INVITE Flow
A mobile-originated VoNR call involves SIP signaling for session negotiation and parallel QoS flow establishment for the voice media.
Complete Call Flow
| Step | Direction | Message | Protocol | Key Details |
|---|---|---|---|---|
| 1 | UE -> P-CSCF | SIP INVITE | SIP | SDP offer: AMR-WB, EVS codecs; Precondition: required |
| 2 | P-CSCF -> PCF | Npcf_PolicyAuthorization_Create | HTTP/2 (N5) | Request QoS for voice: 5QI=1, GBR=56 kbps |
| 3 | PCF -> SMF | PCC Rule update | N7 interface | Install dedicated QoS flow for voice bearer |
| 4 | SMF -> UPF | PFCP Session Modification | PFCP (N4) | Add QER for GBR flow, PDR for voice SDF |
| 5 | SMF -> AMF -> gNB | QoS Flow setup | N2/NGAP | Dedicated DRB for 5QI=1, GBR=56 kbps UL+DL |
| 6 | gNB -> UE | RRC Reconfiguration | RRC | Add DRB for voice QoS flow, ROHC profile |
| 7 | P-CSCF -> S-CSCF -> TAS | SIP INVITE (routed) | SIP | TAS applies supplementary services |
| 8 | S-CSCF -> Terminating side | SIP INVITE | SIP | Via I-CSCF if inter-network |
| 9 | Remote UE | Alerting | SIP | 180 Ringing (SDP answer may be included) |
| 10 | P-CSCF -> UE | 180 Ringing | SIP | Ringback tone generated locally |
| 11 | Remote UE | Answer | SIP | 200 OK with SDP answer: selected codec, IP/port |
| 12 | UE -> P-CSCF | ACK | SIP | 3-way handshake complete |
| 13 | Both UEs | RTP media flows | RTP/UDP | Voice packets on dedicated QoS flow |
The SDP (Session Description Protocol) offer in the INVITE contains the codec preferences, IP address, and port. A typical VoNR SDP offer includes:
- EVS (Enhanced Voice Services): 5.9--128 kbps, superior quality at 13.2 kbps
- AMR-WB (Adaptive Multi-Rate Wideband): 6.6--23.85 kbps, most common VoLTE codec
- AMR-NB (Adaptive Multi-Rate Narrowband): 4.75--12.2 kbps, fallback
Worked Example 1 -- Voice QoS Flow Bandwidth
For a VoNR call using EVS codec at 13.2 kbps with 20 ms frame duration:
Codec payload per frame:- EVS at 13.2 kbps, 20 ms frame: 13,200 x 0.020 / 8 = 33 bytes per frame
- RTP header: 12 bytes
- UDP header: 8 bytes
- IP header (IPv6): 40 bytes (IPv4: 20 bytes)
- Total overhead (IPv6): 60 bytes
- Total packet: 33 + 60 = 93 bytes per 20 ms
- Bandwidth: 93 x 8 / 0.020 = 37.2 kbps per direction
- Total packet: 33 + 3 = 36 bytes per 20 ms
- Bandwidth: 36 x 8 / 0.020 = 14.4 kbps per direction
ROHC reduces the voice bandwidth requirement by 61%. This is why 3GPP mandates ROHC for VoLTE/VoNR, with profiles 0x0001 (RTP/UDP/IP) and 0x0002 (UDP/IP) configured in the PDCP layer.
The 5QI=1 QoS flow for voice is configured with a GBR of 56 kbps (to accommodate AMR-WB at highest rate + overhead), packet delay budget of 100 ms, and packet error rate of 10^-2, as defined in TS 23.501 Table 5.7.4-1.
Worked Example 2 -- Call Setup Time Analysis
VoNR call setup time is measured from SIP INVITE to 180 Ringing (alerting). Based on SK Telecom's 2025 VoNR performance report:
| Segment | Time | Notes |
|---|---|---|
| UE SIP INVITE generation | 5 ms | SDP construction, SIP encoding |
| UE -> P-CSCF (over radio + transport) | 8 ms | Via IPSec SA |
| P-CSCF -> PCF QoS authorization | 12 ms | N5 policy request/response |
| PCF -> SMF -> UPF + gNB QoS flow setup | 25 ms | PFCP + NGAP + RRC Reconfig |
| P-CSCF -> S-CSCF -> TAS routing | 15 ms | SIP routing, supplementary service check |
| S-CSCF -> terminating IMS (same network) | 10 ms | Terminating S-CSCF lookup |
| Terminating UE paging + QoS setup | 45 ms | Page, RRC connection, QoS flow |
| Terminating UE SIP alerting | 10 ms | 180 Ringing generated |
| Total: INVITE to 180 Ringing | ~130 ms | Same-network VoNR call |
SK Telecom reported a median VoNR call setup time of 1.2 seconds (INVITE to 200 OK, including user ring time), compared to 2.8 seconds for VoLTE. The sub-200 ms signaling latency for alerting represents a significant improvement in user-perceived responsiveness.
VoLTE vs VoNR: Technical Comparison
| Aspect | VoLTE | VoNR |
|---|---|---|
| Radio access | LTE (E-UTRA) | 5G NR |
| Bearer type | Dedicated EPS bearer (QCI=1) | Dedicated QoS flow (5QI=1) |
| Bearer setup | PGW creates TFT, eNB adds DRB | SMF creates QER/PDR, gNB adds DRB |
| Policy interface | Rx (Diameter) to PCRF | N5 (HTTP/2 SBI) to PCF |
| Codec support | AMR-NB, AMR-WB | AMR-NB, AMR-WB, EVS (primary) |
| Call setup (alerting) | 300--500 ms | 100--200 ms |
| Handover to CS | SRVCC (Single Radio VCC) | EPS Fallback or SRVCC via EPC |
| Typical MOS score | 3.8--4.1 (AMR-WB 23.85 kbps) | 4.2--4.5 (EVS 13.2 kbps) |
| ROHC profile | 0x0001, 0x0002 | 0x0001, 0x0002, 0x0006 |
EPS Fallback for Voice
In early 5G deployments where VoNR is not yet enabled, the network uses EPS Fallback to redirect voice calls to VoLTE:
| Method | Mechanism | Delay | Use Case |
|---|---|---|---|
| Redirection-based | RRC Release with redirectionCarrierFreqInfo to LTE | 500--800 ms | Simple deployment, no tight interworking |
| Handover-based | Inter-RAT handover from NR to LTE before INVITE | 200--400 ms | Better UX, requires X2/Xn interface |
| N26-based | Interworking via N26 (AMF-MME) | 300--500 ms | Full state transfer, seamless |
Reliance Jio reported that 72% of their 5G voice calls in 2024 used EPS Fallback (redirection-based), with a plan to enable VoNR across all 5G SA sites by mid-2026. The additional 500--800 ms fallback delay was the primary motivator for VoNR enablement.
Voice Quality Metrics and Operator Benchmarks
| KPI | Definition | VoLTE Target | VoNR Target | Measurement Method |
|---|---|---|---|---|
| MOS (Mean Opinion Score) | Perceptual voice quality (1--5) | > 3.8 | > 4.0 | POLQA (ITU-T P.863) |
| Call Setup Success Rate (CSSR) | Successful calls / total attempts | > 99% | > 99.5% | Network KPI counters |
| Call Drop Rate (CDR) | Dropped calls / total calls | < 1% | < 0.5% | Network KPI counters |
| Post-dial delay | INVITE to 180 Ringing | < 3 s | < 2 s | SIP trace timing |
| E2E one-way delay | Mouth-to-ear latency | < 150 ms | < 100 ms | RTP timestamp analysis |
Operator Voice Performance Data
| Operator | Service | Median MOS | CSSR | CDR | Post-dial Delay | Codec |
|---|---|---|---|---|---|---|
| T-Mobile US | VoNR | 4.3 | 99.6% | 0.3% | 1.4 s | EVS 13.2 kbps |
| SK Telecom | VoNR | 4.4 | 99.7% | 0.2% | 1.2 s | EVS 13.2 kbps |
| Deutsche Telekom | VoLTE | 4.0 | 99.3% | 0.6% | 2.6 s | AMR-WB 23.85 kbps |
| Vodafone UK | VoLTE | 3.9 | 99.1% | 0.7% | 2.9 s | AMR-WB 23.85 kbps |
| NTT DOCOMO | VoNR | 4.3 | 99.5% | 0.3% | 1.5 s | EVS 13.2 kbps |
The EVS codec at 13.2 kbps consistently delivers higher MOS than AMR-WB at 23.85 kbps despite lower bitrate, due to EVS's superior coding efficiency with full-band audio (20 Hz -- 20 kHz) compared to AMR-WB's wideband (50 Hz -- 7 kHz).
IMS Emergency Calls over 5G
Emergency calls (e.g., 911 in the US, 112 in Europe) require special handling in IMS:
- The UE sets the Emergency indication in the PDU Session Establishment Request (TS 24.501 clause 6.4.1).
- The SMF establishes an emergency PDU session with priority QoS (5QI=69 for IMS signaling, 5QI=70 for media).
- The P-CSCF routes the emergency SIP INVITE to the E-CSCF (Emergency CSCF), which determines the appropriate PSAP (Public Safety Answering Point) based on the UE's location.
- Location information is conveyed via the SIP P-Access-Network-Info header and the GMLC (Gateway Mobile Location Centre).
3GPP defines emergency IMS procedures in TS 23.167 and emergency bearer handling in TS 23.501 clause 5.16.4.
Key Takeaway: VoNR delivers measurably better voice quality (MOS 4.2--4.5 vs 3.8--4.1 for VoLTE) and faster call setup (1.2--1.5 s vs 2.6--2.9 s) by leveraging 5G NR's lower-latency radio, dedicated QoS flows with 5QI=1, and the EVS codec. The underlying SIP signaling through the IMS is architecturally identical -- the improvements come from the radio access, QoS flow management, and codec evolution. Understanding the end-to-end SIP flow from REGISTER through INVITE, including the parallel QoS authorization via PCF, is essential for IMS and VoNR certification.