From VNF to CNF: Why Cloud-Native Matters

The 5G core was designed from the ground up as a Service-Based Architecture (SBA), as defined in TS 23.501 Section 4.2. This microservices design maps naturally to cloud-native principles: each Network Function (NF) runs as an independent, stateless, horizontally scalable service. The shift from monolithic VNFs (Virtual Network Functions) to CNFs (Cloud-Native Network Functions) is the most significant infrastructure transformation in telecom history.

A VNF is essentially a physical appliance ported to a virtual machine. It retains monolithic design, vertical scaling patterns, and vendor-specific lifecycle management. A CNF decomposes the same functionality into microservices packaged as containers, orchestrated by Kubernetes, and managed through CI/CD pipelines.

VNF vs CNF Comparison

DimensionVNFCNF
PackagingVM image (QCOW2, VMDK)Container image (OCI)
OrchestrationVIM (OpenStack, VMware)Kubernetes (K8s)
Scaling unitEntire VMIndividual microservice pod
Scale-out time5--15 minutes5--30 seconds
State managementStateful, local storageStateless, external state store (Redis, etcd)
Resource efficiency30--40% overhead (hypervisor + guest OS)5--10% overhead (shared kernel)
Update mechanismVM snapshot, rolling upgradeRolling update, canary, blue-green
Failure recoveryVM restart (2--5 min)Pod restart (1--5 sec)
NetworkingSR-IOV, DPDK, OVSCNI plugins (Multus, Calico, Cilium)
Service discoveryStatic config, DNSK8s Service, service mesh (Istio/Linkerd)
Lifecycle managementVNFM (vendor-specific)Helm charts, Operators, GitOps
ObservabilityProprietary NMSPrometheus, Grafana, Jaeger, OpenTelemetry
Multi-tenancyVM-level isolationNamespace + NetworkPolicy isolation

The resource efficiency gain alone is compelling: a CNF-based 5G core requires roughly 40--60% fewer compute cores than the equivalent VNF deployment for the same subscriber capacity.

Kubernetes Components for Telco

Standard Kubernetes requires several enhancements for telco-grade workloads. The following table maps K8s components to their telco-specific roles.

K8s ComponentTelco RoleConfigurationNotes
kubeletNode agent managing NF podsCPU pinning, NUMA-aware topology managerCritical for UPF performance
kube-schedulerNF placement and anti-affinityCustom scheduling policies for NF redundancySpread AMF replicas across failure domains
Multus CNIMultiple network interfaces per podN2/N3/N4/N6 interface separationRequired for 3GPP interface isolation
SR-IOV Device PluginHardware-accelerated dataplaneNIC VF allocation to UPF podsEnables line-rate UPF forwarding
Topology ManagerNUMA-aware resource allocationsingle-numa-node policy for UPFPrevents cross-NUMA memory access latency
Node Feature DiscoveryHardware capability labelingGPU, FPGA, NIC feature labelsUPF pods scheduled to SR-IOV capable nodes
PersistentVolumeSession state backupCeph RBD or local NVMeUsed for UDR and UDSF
Horizontal Pod AutoscalerNF auto-scalingCPU and custom metrics (sessions, TPS)AMF scales on registration TPS
cert-managermTLS certificate lifecycleAutomated rotation for SBI TLSPer TS 33.501 Section 13.1
CoreDNSNF service discoverySBA NF registration and discoverySupplements NRF-based discovery
Multus CNI is particularly critical because 3GPP interfaces (N2, N3, N4, N6, N9) must be separated for security, QoS, and routing purposes. A single AMF pod needs at least three interfaces: management, N2 (toward gNB), and SBI (toward other NFs).

Helm Chart Structure: AMF Example

Helm is the standard package manager for deploying CNFs on Kubernetes. A production AMF Helm chart follows this structure:

`

amf-chart/

Chart.yaml # name: amf, version: 3.2.1, appVersion: R16.8

values.yaml # Default configuration

templates/

deployment.yaml # AMF pod spec with 3 replicas

service.yaml # ClusterIP for SBI, NodePort for N2

configmap.yaml # AMF config (PLMN, TAC, NSSAI, NRF URI)

hpa.yaml # Scale on CPU > 70% or registrations > 5000/s

networkattachment.yaml # Multus annotation for N2 interface

pdb.yaml # PodDisruptionBudget: minAvailable=2

serviceaccount.yaml # RBAC for NRF registration

servicemonitor.yaml # Prometheus scrape config

`

Key values.yaml parameters:

`yaml

replicaCount: 3

image:

repository: registry.vendor.com/5gc/amf

tag: "R16.8.2"

resources:

requests:

cpu: "4"

memory: "8Gi"

limits:

cpu: "8"

memory: "16Gi"

amf:

plmnId:

mcc: "310"

mnc: "260"

supportedNssai:

- sst: 1

sd: "000001"

- sst: 2

sd: "000002"

n2:

interface: net1 # Multus network attachment

port: 38412 # SCTP port per TS 38.412

sbi:

scheme: https

port: 8443

nrfUri: "https://nrf-svc.5gc:8443"

`

The PodDisruptionBudget (PDB) ensures that during rolling updates or node maintenance, at least 2 AMF replicas remain available, preventing service disruption during upgrades.

Real Deployment: Rakuten Symphony

Rakuten Mobile launched the world's first fully cloud-native mobile network in 2020 and has since commercialized the platform as Rakuten Symphony. Their architecture runs on bare-metal Kubernetes across 15 regional data centers in Japan.

Key metrics from Rakuten's production deployment:

  • Core NFs: All 5GC functions (AMF, SMF, UPF, NRF, UDM, AUSF, PCF) run as CNFs on K8s
  • Subscriber scale: 5+ million subscribers on cloud-native core as of 2025
  • Infrastructure: Custom platform based on upstream Kubernetes with Wind River StarlingX for edge
  • OpEx reduction: Rakuten claims 40% lower OpEx versus traditional VNF-based architectures
  • Scaling: AMF scales from 3 to 12 pods in under 60 seconds during registration storms
  • Update cadence: Bi-weekly rolling updates to core NFs with zero-downtime deployments

Real Deployment: Dish Network

Dish Network built a greenfield 5G network in the US using cloud-native, O-RAN-compliant architecture from day one. Their 5G core runs on AWS Outposts (on-premise AWS infrastructure) in 5 regional data centers.

  • Core vendor: Multiple CNF vendors including Mavenir and Oracle
  • Orchestration: Amazon EKS (Elastic Kubernetes Service) on Outposts
  • Coverage target: 70% US population by 2025 (FCC build-out commitment)
  • UPF placement: Distributed UPFs at 100+ edge locations for sub-20 ms latency
  • Automation: Full GitOps pipeline with ArgoCD for NF deployment and configuration

Real Deployment: AT&T

AT&T's 5G core runs on their Network Cloud platform, built on OpenStack (for VMs) and Kubernetes (for containers) across 100+ data centers. AT&T has been progressively migrating VNFs to CNFs:

  • Platform: AirShip (bare-metal K8s provisioning) + StarlingX for edge sites
  • Migration path: Ericsson dual-mode core running VNF and CNF modes simultaneously
  • Scale: Core serving 100+ million subscribers across 4G/5G
  • UPF: Ericsson UPF running as CNF with DPDK-accelerated user plane at 200+ Gbps per node

Observability Stack Comparison

Telco-grade observability requires metrics, logs, traces, and alerting across thousands of NF instances.

CapabilityOpen Source StackCommercial AlternativeTelco Consideration
MetricsPrometheus + ThanosDatadog, DynatracePrometheus at telco scale needs Thanos/Cortex for long-term storage
VisualizationGrafanaSplunk, KibanaGrafana dashboards for per-NF KPIs (registrations/s, sessions, latency)
LoggingFluentd + ElasticsearchSplunk EnterpriseLog volume at 10+ TB/day requires index lifecycle management
TracingJaeger / OpenTelemetryDynatrace, New RelicSBI call tracing across AMF-SMF-UPF for end-to-end latency analysis
AlertingAlertmanagerPagerDuty, OpsGenie3GPP fault management (TS 28.532) integration needed
Service mesh observabilityIstio + KialiTetrate, Solo.iomTLS enforcement and traffic visualization for SBI

Most Tier-1 operators run a hybrid approach: open-source Prometheus/Grafana for real-time metrics and a commercial platform (Splunk or Dynatrace) for log analytics, root cause analysis, and compliance reporting.

Worked Example: Prometheus Scaling

For a 5GC serving 10 million subscribers with 20 NF types averaging 500 metrics each, scraped at 15-second intervals:

`

Total time series: 20 NFs 10 replicas avg 500 metrics = 100,000

Ingestion rate: 100,000 / 15s = 6,667 samples/second

Storage (30 days): 6,667 86,400 30 * 2 bytes = ~34 GB compressed

`

A single Prometheus instance handles this comfortably. At 50+ million subscribers, Thanos or Cortex becomes necessary for horizontal scaling and long-term storage.

Worked Example: CNF Scaling Calculation

Calculate the number of AMF pods needed during a Monday morning registration storm:

`

Peak registrations: 50,000/second (morning attach storm)

AMF capacity per pod: 8,000 registrations/second (benchmarked)

Target utilization: 70%

Effective capacity per pod: 8,000 * 0.70 = 5,600/s

Required pods: 50,000 / 5,600 = 8.93 -> 9 pods

Add 1 for redundancy (PDB minAvailable): 10 pods

HPA config: minReplicas=3, maxReplicas=12

Scale-up trigger: CPU > 70% OR custom metric registrations_per_second > 5,000

`

Migration Strategy

Operators migrating from VNF to CNF follow a phased approach:

  1. Phase 1 --- Dual-mode: Run VNF and CNF side by side, routing new subscribers to CNF
  2. Phase 2 --- Drain: Gradually migrate existing subscribers via controlled re-registration
  3. Phase 3 --- Decommission: Shut down VNF instances after full migration
  4. Phase 4 --- Optimize: Implement advanced K8s features (service mesh, GitOps, FinOps)

The migration typically spans 18--24 months for a Tier-1 operator. The critical dependency is ensuring CNF feature parity with the incumbent VNF, particularly for regulatory features (lawful intercept, emergency calling) specified in TS 33.127 and TS 23.167.

Key Takeaway: Cloud-native 5G core on Kubernetes delivers 40--60% better resource efficiency, sub-minute scaling, and zero-downtime upgrades compared to VNF architectures. Multus CNI, SR-IOV, and NUMA-aware scheduling are essential K8s enhancements for telco workloads. Rakuten, Dish, and AT&T prove the model works at scale.