From VNF to CNF: Why Cloud-Native Matters
The 5G core was designed from the ground up as a Service-Based Architecture (SBA), as defined in TS 23.501 Section 4.2. This microservices design maps naturally to cloud-native principles: each Network Function (NF) runs as an independent, stateless, horizontally scalable service. The shift from monolithic VNFs (Virtual Network Functions) to CNFs (Cloud-Native Network Functions) is the most significant infrastructure transformation in telecom history.
A VNF is essentially a physical appliance ported to a virtual machine. It retains monolithic design, vertical scaling patterns, and vendor-specific lifecycle management. A CNF decomposes the same functionality into microservices packaged as containers, orchestrated by Kubernetes, and managed through CI/CD pipelines.
VNF vs CNF Comparison
| Dimension | VNF | CNF |
|---|---|---|
| Packaging | VM image (QCOW2, VMDK) | Container image (OCI) |
| Orchestration | VIM (OpenStack, VMware) | Kubernetes (K8s) |
| Scaling unit | Entire VM | Individual microservice pod |
| Scale-out time | 5--15 minutes | 5--30 seconds |
| State management | Stateful, local storage | Stateless, external state store (Redis, etcd) |
| Resource efficiency | 30--40% overhead (hypervisor + guest OS) | 5--10% overhead (shared kernel) |
| Update mechanism | VM snapshot, rolling upgrade | Rolling update, canary, blue-green |
| Failure recovery | VM restart (2--5 min) | Pod restart (1--5 sec) |
| Networking | SR-IOV, DPDK, OVS | CNI plugins (Multus, Calico, Cilium) |
| Service discovery | Static config, DNS | K8s Service, service mesh (Istio/Linkerd) |
| Lifecycle management | VNFM (vendor-specific) | Helm charts, Operators, GitOps |
| Observability | Proprietary NMS | Prometheus, Grafana, Jaeger, OpenTelemetry |
| Multi-tenancy | VM-level isolation | Namespace + NetworkPolicy isolation |
The resource efficiency gain alone is compelling: a CNF-based 5G core requires roughly 40--60% fewer compute cores than the equivalent VNF deployment for the same subscriber capacity.
Kubernetes Components for Telco
Standard Kubernetes requires several enhancements for telco-grade workloads. The following table maps K8s components to their telco-specific roles.
| K8s Component | Telco Role | Configuration | Notes |
|---|---|---|---|
| kubelet | Node agent managing NF pods | CPU pinning, NUMA-aware topology manager | Critical for UPF performance |
| kube-scheduler | NF placement and anti-affinity | Custom scheduling policies for NF redundancy | Spread AMF replicas across failure domains |
| Multus CNI | Multiple network interfaces per pod | N2/N3/N4/N6 interface separation | Required for 3GPP interface isolation |
| SR-IOV Device Plugin | Hardware-accelerated dataplane | NIC VF allocation to UPF pods | Enables line-rate UPF forwarding |
| Topology Manager | NUMA-aware resource allocation | single-numa-node policy for UPF | Prevents cross-NUMA memory access latency |
| Node Feature Discovery | Hardware capability labeling | GPU, FPGA, NIC feature labels | UPF pods scheduled to SR-IOV capable nodes |
| PersistentVolume | Session state backup | Ceph RBD or local NVMe | Used for UDR and UDSF |
| Horizontal Pod Autoscaler | NF auto-scaling | CPU and custom metrics (sessions, TPS) | AMF scales on registration TPS |
| cert-manager | mTLS certificate lifecycle | Automated rotation for SBI TLS | Per TS 33.501 Section 13.1 |
| CoreDNS | NF service discovery | SBA NF registration and discovery | Supplements NRF-based discovery |
Helm Chart Structure: AMF Example
Helm is the standard package manager for deploying CNFs on Kubernetes. A production AMF Helm chart follows this structure:
`
amf-chart/
Chart.yaml # name: amf, version: 3.2.1, appVersion: R16.8
values.yaml # Default configuration
templates/
deployment.yaml # AMF pod spec with 3 replicas
service.yaml # ClusterIP for SBI, NodePort for N2
configmap.yaml # AMF config (PLMN, TAC, NSSAI, NRF URI)
hpa.yaml # Scale on CPU > 70% or registrations > 5000/s
networkattachment.yaml # Multus annotation for N2 interface
pdb.yaml # PodDisruptionBudget: minAvailable=2
serviceaccount.yaml # RBAC for NRF registration
servicemonitor.yaml # Prometheus scrape config
`
Key values.yaml parameters:
`yaml
replicaCount: 3
image:
repository: registry.vendor.com/5gc/amf
tag: "R16.8.2"
resources:
requests:
cpu: "4"
memory: "8Gi"
limits:
cpu: "8"
memory: "16Gi"
amf:
plmnId:
mcc: "310"
mnc: "260"
supportedNssai:
- sst: 1
sd: "000001"
- sst: 2
sd: "000002"
n2:
interface: net1 # Multus network attachment
port: 38412 # SCTP port per TS 38.412
sbi:
scheme: https
port: 8443
nrfUri: "https://nrf-svc.5gc:8443"
`
The PodDisruptionBudget (PDB) ensures that during rolling updates or node maintenance, at least 2 AMF replicas remain available, preventing service disruption during upgrades.
Real Deployment: Rakuten Symphony
Rakuten Mobile launched the world's first fully cloud-native mobile network in 2020 and has since commercialized the platform as Rakuten Symphony. Their architecture runs on bare-metal Kubernetes across 15 regional data centers in Japan.
Key metrics from Rakuten's production deployment:
- Core NFs: All 5GC functions (AMF, SMF, UPF, NRF, UDM, AUSF, PCF) run as CNFs on K8s
- Subscriber scale: 5+ million subscribers on cloud-native core as of 2025
- Infrastructure: Custom platform based on upstream Kubernetes with Wind River StarlingX for edge
- OpEx reduction: Rakuten claims 40% lower OpEx versus traditional VNF-based architectures
- Scaling: AMF scales from 3 to 12 pods in under 60 seconds during registration storms
- Update cadence: Bi-weekly rolling updates to core NFs with zero-downtime deployments
Real Deployment: Dish Network
Dish Network built a greenfield 5G network in the US using cloud-native, O-RAN-compliant architecture from day one. Their 5G core runs on AWS Outposts (on-premise AWS infrastructure) in 5 regional data centers.
- Core vendor: Multiple CNF vendors including Mavenir and Oracle
- Orchestration: Amazon EKS (Elastic Kubernetes Service) on Outposts
- Coverage target: 70% US population by 2025 (FCC build-out commitment)
- UPF placement: Distributed UPFs at 100+ edge locations for sub-20 ms latency
- Automation: Full GitOps pipeline with ArgoCD for NF deployment and configuration
Real Deployment: AT&T
AT&T's 5G core runs on their Network Cloud platform, built on OpenStack (for VMs) and Kubernetes (for containers) across 100+ data centers. AT&T has been progressively migrating VNFs to CNFs:
- Platform: AirShip (bare-metal K8s provisioning) + StarlingX for edge sites
- Migration path: Ericsson dual-mode core running VNF and CNF modes simultaneously
- Scale: Core serving 100+ million subscribers across 4G/5G
- UPF: Ericsson UPF running as CNF with DPDK-accelerated user plane at 200+ Gbps per node
Observability Stack Comparison
Telco-grade observability requires metrics, logs, traces, and alerting across thousands of NF instances.
| Capability | Open Source Stack | Commercial Alternative | Telco Consideration |
|---|---|---|---|
| Metrics | Prometheus + Thanos | Datadog, Dynatrace | Prometheus at telco scale needs Thanos/Cortex for long-term storage |
| Visualization | Grafana | Splunk, Kibana | Grafana dashboards for per-NF KPIs (registrations/s, sessions, latency) |
| Logging | Fluentd + Elasticsearch | Splunk Enterprise | Log volume at 10+ TB/day requires index lifecycle management |
| Tracing | Jaeger / OpenTelemetry | Dynatrace, New Relic | SBI call tracing across AMF-SMF-UPF for end-to-end latency analysis |
| Alerting | Alertmanager | PagerDuty, OpsGenie | 3GPP fault management (TS 28.532) integration needed |
| Service mesh observability | Istio + Kiali | Tetrate, Solo.io | mTLS enforcement and traffic visualization for SBI |
Most Tier-1 operators run a hybrid approach: open-source Prometheus/Grafana for real-time metrics and a commercial platform (Splunk or Dynatrace) for log analytics, root cause analysis, and compliance reporting.
Worked Example: Prometheus Scaling
For a 5GC serving 10 million subscribers with 20 NF types averaging 500 metrics each, scraped at 15-second intervals:
`
Total time series: 20 NFs 10 replicas avg 500 metrics = 100,000
Ingestion rate: 100,000 / 15s = 6,667 samples/second
Storage (30 days): 6,667 86,400 30 * 2 bytes = ~34 GB compressed
`
A single Prometheus instance handles this comfortably. At 50+ million subscribers, Thanos or Cortex becomes necessary for horizontal scaling and long-term storage.
Worked Example: CNF Scaling Calculation
Calculate the number of AMF pods needed during a Monday morning registration storm:
`
Peak registrations: 50,000/second (morning attach storm)
AMF capacity per pod: 8,000 registrations/second (benchmarked)
Target utilization: 70%
Effective capacity per pod: 8,000 * 0.70 = 5,600/s
Required pods: 50,000 / 5,600 = 8.93 -> 9 pods
Add 1 for redundancy (PDB minAvailable): 10 pods
HPA config: minReplicas=3, maxReplicas=12
Scale-up trigger: CPU > 70% OR custom metric registrations_per_second > 5,000
`
Migration Strategy
Operators migrating from VNF to CNF follow a phased approach:
- Phase 1 --- Dual-mode: Run VNF and CNF side by side, routing new subscribers to CNF
- Phase 2 --- Drain: Gradually migrate existing subscribers via controlled re-registration
- Phase 3 --- Decommission: Shut down VNF instances after full migration
- Phase 4 --- Optimize: Implement advanced K8s features (service mesh, GitOps, FinOps)
The migration typically spans 18--24 months for a Tier-1 operator. The critical dependency is ensuring CNF feature parity with the incumbent VNF, particularly for regulatory features (lawful intercept, emergency calling) specified in TS 33.127 and TS 23.167.
Key Takeaway: Cloud-native 5G core on Kubernetes delivers 40--60% better resource efficiency, sub-minute scaling, and zero-downtime upgrades compared to VNF architectures. Multus CNI, SR-IOV, and NUMA-aware scheduling are essential K8s enhancements for telco workloads. Rakuten, Dish, and AT&T prove the model works at scale.