I’m at the start of my career in the service provider space and know the generalised concept of the lower down the model, the greater the MTU requirement to support the higher layers, but what has always eluded me is knowing how to accurately calculate or configure MTU to take into account the upper layer requirements.
It seems there is such a wealth of combinations to consider in terms of the types of frames, packets and segments that could possibly be transmitted, especially when considering MPLS - how do others seemingly achieve this so easily? Particularly when looking at throughput and the effect of overhead.
Is my way of thinking wrong or is there something I’m missing?
What helped you all understand this better and are there any great resources available that cover this well?
Thanks.
Edit:
Thanks for the great replies all, your insight is appreciated!
I think most of us don’t account for it exactly to the byte. I do Juniper and their max typically is 9216, so that’s what our uplinks/core are set to. Almost the entirety of our edge/encapculated traffic is 1500, we give up to 8k (typically storage circuits) to clients.
It’s also something you rather not retrofit in production. If you want to calculate exact values https://baturin.org/tools/encapcalc/ (personal site from VyOS maintainer).
Typically you just configure your backbone to the platform’s max MTU. Payload should all be 1500 bytes or less unless you have a specific requirement for larger MTUs. For instance, you may want your storage network to use 9000 bytes, but it also wouldn’t be routable outside that VLAN. In the ISP space, sometimes customers want larger MTUs to support their own tunneling mechanisms, so you just configure their circuits for a larger MTU, either max platform MTU, or something smaller that is smaller than the smallest MTU on your backbone.
MTU Math takes a while to get the knack of when you work in the SP space. It used to be worse when ATM and Frame relay were involved as well as Ethernet and L2 interworking was used more frequently.
As others have said, manufacturers will often calculate MTU differently
When I’m teaching new SP engineers, there are three main areas I focus on:
- L2 MTU - This needs to match or exceed the MPLS MTU required (does not need to match on each side of a link - vendors don’t agree on an upper value)
- L3 MTU - This needs to be set to the desired value for the IP subnet (needs to match on each side of a link)-
- MPLS MTU - This should be set (if possible in your vendor - some inherit it) to the required minimum for the MPLS services you want to deliver and can’t be lower than the L2 MTU value (needs to match on each side of an LDP session or SR segment)
An example of MTU Math to deliver a 1500 byte tagged frame over VPLS (These values can be higher, but this illustrates the minimums which makes it a little easier to understand where the values come from)
- L2 MTU - 9216 Bytes
- L3 MTU - 1500 Bytes
- MPLS MTU - 1530 Bytes
Why is the MPLS MTU 1530 Bytes to deliver VPLS?
- MPLS Overhead = 12 Bytes (MPLS transport + VPN labels)
- L2 headers inside the VPLS PW = 18 Bytes (MAC Header + VLAN tag)
- L3 headers and payload inside the VPLS PW = 1500 Bytes (IP Header + DATA)
Add those together and you get 1530 
While there is no standard definition for Ethernet MTU, somehow the “standard” has generally settled at 1500 bytes. From a service provider perspective, you essentially want to provide this MTU at your network edge to transit peers, IX peers, or subscribers. Within your MPLS cloud it’s very easy to over compensate by simply supporting jumbo frames (9000+ bytes). That way, no matter your label stack size (say, 4 MPLS headers used in some rare case) you are still well below the link MTU within the MPLS core.
Default of 1500 should be fine for most things. Jumbo frames and MPLS are the only things I can think of where you need to calculate the extra bytes.
Don’t ask me how to calculate the extra bytes. I use to build MPLS connections. Never could figure out how MTU was calculated. I was short a few bytes once. That didn’t go down well.
MTU considerations for transport and reduced throughput due to overhead are different issues.
For MTU, max out your backbone to a consistent value and leave customer handoffs at default. Change it only if the customer requests it. You don’t need some custom value for every customer. Just set it to some predetermined value that’s equivalent to or higher than what they need. I had one place where it was either default 1500 or 9000. Some other place did 1500, 2000, or 9100.
For the overhead issue, it’s up to the customer to account for overhead when purchasing circuits. Although, I’ve worked at a place where they policed at 10% above the CIR to account for overhead.
Pump 9K or maximum possible on all L2. L3 to 1500 for any internet bound interface.
L3 1600 for any LAN bound interface to account for RFC4638, QinQ etc.
L3 VLAN to 1600 as well.
Something like this
Yeah it depends on how the platform actually counts MTU. Some will say 9000 while others 9216 for example. Cisco, mostly, tries to make the calculations on end user easier by excluding L2 headers.
This was particularly valuable and sheds a lot of light for me.
Thank you!
I’m too lazy to look it up now but fairly sure it is in the early 802.3 standards. From memory the purpose was to limit the time taken up by a single frame when ethernet was a shared medium (CSMA/CD)
4 bytes per MPLS shim isn’t it?
~50 bytes for VXLAN.
9100 gives you 14 MPLS labels, what if you decide to do SRv6, I know of companies then they run into the exact problem I described.
Glad you found it useful! You’re welcome 
Don’t forget alot of customers may want to build tunnels for various reasons as well. Even with private carrier mpls, it’s not a bad idea to encrypt your traffic since you don’t control the carriers equipment. Not to mention sdwan where customers still want/need the higher sla of mpls/metro e for certain traffic.
We’re using Flex Algo for TE to minimize the label stack. Not a fan of SRv6 and would never deploy it. We’d prefer SR-MPLS IPv6 when vendors start supporting VPNv4 over IPv6. Maybe SRm6 but on a Cisco core right now and their focus is SRv6.
yeah i agree SRv6 isn’t that appealing to a lot of SP, it just adds complexity, and let’s be honest a lot of NE working at even tier-1s are idiots.
I’m pissed at the lack of support of VPNv4 over IPv6. I could have had a native IPv6 core for several years now with existing hardware. Aside from complexity, SRv6 would require all new hardware to do TE.
Most of the job can be done following templates and SOPs, so you don’t necessarily need high competency in most roles. It becomes problematic when the Architect and Sr Engineers lack competence.