1. Why VXLAN?

IEEE 802.1Q VLANs are capped at 4,094 IDs per broadcast domain — a hard constraint in multi-tenant data centers where thousands of customer segments must coexist on shared infrastructure. VXLAN (Virtual eXtensible LAN, RFC 7348) solves this by encapsulating Ethernet frames inside UDP/IP, using a 24-bit VNI (VXLAN Network Identifier) to support up to 16.7 million logical segments.

VXLAN decouples the virtual Layer 2 topology from the physical Layer 3 underlay, allowing standard IP routing (ECMP, OSPF, BGP) between VXLAN Tunnel Endpoints (VTEPs) without stretching VLANs. The outer UDP header uses destination port 4789 (IANA-assigned; early deployments used 8472). Total encapsulation overhead is ~50 bytes over IPv4, ~70 bytes over IPv6.

Encapsulation stack (RFC 7348): Outer Ethernet (14 B) + Outer IPv4 header (20 B) + Outer UDP header (8 B, dst port 4789) + VXLAN header (8 B, includes 24-bit VNI in bits 32–55) + original inner Ethernet frame.

2. VTEP Discovery Methods

VTEPs must discover peer VTEPs to set up tunnels and distribute BUM (Broadcast, Unknown unicast, Multicast) traffic. Three mechanisms are deployed in practice:

MethodHow it worksProsCons
MulticastEach VNI maps to a PIM multicast group in the underlay; BUM traffic is flooded to that groupSimple; automatic peer discoveryRequires PIM multicast in underlay; many operators disable multicast
Ingress ReplicationEach VTEP maintains an explicit unicast list of remote VTEPs per VNI; BUM traffic is replicated to each peerNo multicast requiredHead-end does O(N) replication per BUM packet; static peer lists require manual maintenance
BGP EVPNRT-3 IMET routes advertise VTEP membership; RT-2 routes distribute MAC+IP bindings; no flood-and-learnControl-plane MAC learning; ARP suppression; scales to thousands of VTEPs; standardBGP stack required on all VTEPs or route-reflectors

Modern greenfield data centers use BGP EVPN exclusively. Multicast and ingress-replication are legacy approaches still found in brownfield environments.

3. BGP EVPN Route Types

BGP EVPN (RFC 7432) uses AFI 25 (L2VPN) / SAFI 70 (EVPN) to distribute five route types. RT-5 was defined separately in RFC 9136 (October 2021).

RTNamePurposeKey NLRI fields
1Ethernet Auto-DiscoveryPer-ES and per-EVI mass-withdraw on link failure; aliasing for all-active multi-homing load-balancingRD, ESI, Ethernet Tag ID, MPLS label
2MAC/IP AdvertisementDistribute MAC addresses (and optionally the bound IP) to enable ARP suppression and eliminate flood-and-learnRD, ESI, VLAN tag, MAC address, IP address (optional), L2VNI + L3VNI labels
3Inclusive Multicast Ethernet Tag (IMET)Advertise VTEP reachability per VNI; used to build ingress-replication lists and trigger BUM forwardingRD, Ethernet Tag ID, Originating Router's IP (VTEP address); PMSI Tunnel attribute carries VNI and tunnel type
4Ethernet Segment RouteDesignated Forwarder (DF) election among PEs sharing an Ethernet Segment; ensures only one PE forwards BUM into the CE segmentRD, ESI, Originating Router IP
5IP Prefix Route (RFC 9136)Advertise IP prefixes into the EVPN overlay for inter-subnet routing; requires a dedicated L3VNI (transit VNI)RD, Ethernet Tag ID, IP prefix length, IP prefix, GW IP address, L3VNI label
Common documentation error: Many vendor guides cite RFC 7432 for all five route types. RT-5 (IP Prefix) was not in the original RFC 7432 — it was added in RFC 9136 (October 2021). If your device software predates that publication, RT-5 behavior may differ from the final RFC.

4. Symmetric vs Asymmetric IRB

Integrated Routing and Bridging (IRB) describes how VTEPs route traffic between overlay subnets. Two models are defined in RFC 9135:

Asymmetric IRB: The ingress VTEP performs L3 routing (TTL decrement, next-hop rewrite) into the destination L2VNI before encapsulating and sending. The egress VTEP only bridges — it sees the inner frame already addressed to the final MAC. Every VTEP must have every VNI (subnet) programmed locally, even those with no local hosts, which limits scale.

Symmetric IRB: The ingress VTEP routes from the source L2VNI into a shared L3VNI (transit VNI, one per VRF). The egress VTEP routes out of the L3VNI into the local destination L2VNI. Both endpoints perform routing. Each VTEP only needs its own local L2VNIs; the single L3VNI is universal. This is the recommended model for large fabrics.

Asymmetric IRBSymmetric IRB
L2VNIs needed per VTEPAll VNIs in the fabricOnly locally attached subnets
L3VNI (transit VNI)Not requiredRequired — one per VRF
Routing hopsIngress VTEP onlyIngress and egress VTEPs
ScalePoor (all VNIs everywhere)Good (local subnets only)
RT-5 prefixesNot supportedSupported (uses L3VNI)

5. ARP Suppression

Without EVPN, an ARP request from a host is broadcast into its VNI and flooded to every VTEP in the fabric. With BGP EVPN, RT-2 routes distribute MAC+IP bindings to all VTEPs as soon as hosts are learned. When a host ARPs for a remote IP, the local VTEP answers directly from its BGP-populated table — no ARP packet crosses the VXLAN fabric. This eliminates BUM flooding for known hosts and is especially impactful in fabrics with thousands of VMs per VTEP.

ND (Neighbor Discovery) suppression works identically for IPv6 — RT-2 routes carry IPv6 addresses in the IP field of the NLRI, and the VTEP answers NS messages locally.

6. Multi-Homing and ESI

An Ethernet Segment Identifier (ESI) is a 10-byte identifier assigned to the logical bundle connecting a CE device to multiple PE VTEPs. Two forwarding modes exist:

  • Single-Active: One PE forwards at a time. The DF election (using RT-4 routes) picks the Designated Forwarder for each Ethernet Tag. The non-DF PE blocks BUM forwarding into the segment but can still receive unicast.
  • All-Active: All PEs forward simultaneously, enabling ECMP across the bundle (like a port-channel with remote legs). RT-1 "aliasing" routes allow remote VTEPs to load-balance traffic toward the ESI across all attached PEs. MAC mobility is handled via the MAC Mobility extended community in RT-2.

7. Vendor CLI Quick Reference

TaskCisco NX-OSArista EOSJuniper Junos
Show EVPN routesshow bgp l2vpn evpnshow bgp evpnshow route table bgp.evpn.0
Show VTEP peersshow nve peersshow vxlan vtepshow evpn instance
Show overlay MACsshow mac address-tableshow vxlan address-tableshow evpn mac-ip-table
Show ARP suppression cacheshow ip arp suppression-cache detailshow vxlan address-table detailshow evpn mac-ip-table extensive
Show VNI-to-VRF mappingshow nve vnishow vxlan vnishow evpn instance extensive
Show ESI multi-homingshow nve ethernet-segmentshow bgp evpn instanceshow evpn instance extensive

References

  • RFC 7348 — VXLAN: A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks (2014)
  • RFC 7432 — BGP MPLS-Based Ethernet VPN (BGP EVPN) (2015)
  • RFC 8365 — A Network Virtualization Overlay Solution Using Ethernet VPN (EVPN) (2018)
  • RFC 9135 — Integrated Routing and Bridging in Ethernet VPN (EVPN) (2021)
  • RFC 9136 — IP Prefix Advertisement in Ethernet VPN (EVPN) (2021)
  • IETF BESS Working Group — BGP Enabled ServiceS (active EVPN drafts)