SlideShare a Scribd company logo
MX TRIO LOAD BALANCING
Dmitry Shokarev
Product Line Management
Routing Business Unit
Version 1.4, April 2014
Confidential
2 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
AGENDA
1. High level load balancing overview
2. Packet parsing and hash computation
3. Advanced Topics
4. Theoretical load balancing efficiency analysis
5. Adaptive and Stateful load balancing
3 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
HIGH LEVEL OVERVIEW
4 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
Ingress PFE
Parse
packet
Compute
hash
Lookup
Route
Select
next-hop
HIGH LEVEL LOAD BALANCING OVERVIEW
(SIMPLIFIED)
Parse packet
 Depending on the interface
encapsulation, select packet fields for
route lookup
Compute hash
 Compute fixed size hash value from
variable set of packet fields
Egress PFE
Encapsulate
Lookup route
 Find a route based on the packet fields
Select next-hop
 Select ultimate next-hop from a list of
possible next-hops (multiple levels)
5 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
PACKET PARSING AND HASH
COMPUTATION
6 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
Hash is symmetric (swapping
the fields does not change the
hash result)
Applicable only if there is a
field match (TCP or UDP
packets in this case). The field
is included into the hash
L4
L3
L2
HOW TO READ THE DIAGRAM [1 OF 2]
Source Port ON
OFF
Dest. Port
IIF
Protocol
DSCP
ON
OFF
ON
OFF
6 or 17
Source Address
Dest. Address
IIF
Protocol
DSCP
ON
OFF
IPv4, GRE (PPTP)
ON
OFF
47
GRE Key (16 bits)
GRE Protocol 0x880B
Source Address
Dest. Address
Configurable (default on)
Applicable only if there is a field
match (PPTP packets in this
case). The field is NOT included
into the hash computation
Field is included
by default and can’t
be turned off
IIF stands for Incoming
Interface Index (internal
logical interface identifier)
ON
OFF
Configurable (default off)
IPv4, UDP or TCP
7 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
HOW TO READ THE DIAGRAM [2 OF 2]
L3/L4
L2
IIF ON
OFF
Source MAC
Dest. MAC
ON
OFF
Outer 802.1p
ON
OFF
VLAN Tag 1
VLAN Tag 2..N
Ether type 0x0800
IPv4 payload ON
OFF
Ethernet, IPv4
Shaded area refers to the hash field selection
procedure defined somewhere else
In this case IPv4 hash selection procedure will
be used
Protocol
DSCP
ON
OFF
47
GRE Key (32 bits)
Source Address
Dest. Address
Fragment Flag 0
Fragment Offset 0
IPv4, GRE,
non fragmented
Protocol
DSCP
ON
OFF
47
GRE Key (16 LS Bits)
GRE Protocol 0x880B
Source Address
Dest. Address
Fragment Flag 0
Fragment Offset 0
GRE Key (16 MS Bits)
IPv4, PPTP,
non-fragmented
Source Port
Dest. Port
Protocol
DSCP
ON
OFF
17
Source Address
Destination Address
215
2
GTP TEID
ON
OFF
Fragment Flag 0
Fragment Offset 0
IPv4, GTP,
non-fragmented
Protocol
DSCP
ON
OFF
IPv4
Source Address
Dest. Address
Source Port
Dest. Port
Fragment Flag
DSCP
ON
OFF
0
Source Address
Dest. Address
ON
OFF
ON
OFF
Fragment Offset 0
Protocol 6 or 17
IPv4, UDP or TCP,
non-fragmented
8 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
WHICH FIELDS SELECTED WHEN?
IP IP
Use IP fields
MPLS MPLS
Use MPLS fields
CCC, VPLS,
Bridge
Use CCC/Bridge/
VPLS fields
Answer depends on the encapsulation on ingress / egress
CCC,VPLS,
Bridge
IP MPLS
Use IP fields
CCC
Use CCC/Bridge/
VPLS fields
MPLS
Use MPLS fields
VPLS,
Bridge
Use CCC/Bridge/
VPLS fields
MPLS
IP IP+GRE/IPIP
Use IP fields
Use Inner IP fields
VPLS,
Bridge IP (VIA IRB)
Use IP fields
9 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
HASH FIELD SELECTION, IPV4 TRAFFIC [1 OF 2]
IIF
Protocol
DSCP
ON
OFF
IPv4
Source Port
Dest. Port
ON
OFF IIF
Fragment Flag
DSCP
ON
OFF
ON
OFF
0
Source Address
Dest. Address
Source Address
Dest. Address
ON
OFF
ON
OFF
L4
L3
L2
Fragment Offset 0 Include L4 only for
non fragments
Protocol 6 or 17
IPv4, UDP or TCP,
non-fragmented
10 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
L4
HASH FIELD SELECTION, IPV4 TRAFFIC [2 OF 2]
L3
L2
Source Port
Dest. Port
IIF
Protocol
DSCP
ON
OFF
ON
OFF
17
Source Address
Destination Address
2152
GTP TEID
ON
OFF
IIF
Protocol
DSCP
ON
OFF
ON
OFF
47
GRE Key (32 bits)
IIF
Protocol
DSCP
ON
OFF
ON
OFF
47
GRE Key (16 LS Bits)
GRE Protocol 0x880B
Source Address
Dest. Address
Source Address
Dest. Address
Fragment Flag 0
Fragment Offset 0
Fragment Flag 0
Fragment Offset 0
Fragment Flag 0
Fragment Offset 0
GRE Key (16 MS Bits)
IPv4, GRE,
non fragmented
IPv4, PPTP,
non-fragmented
IPv4, GTP,
non-fragmented
11 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
HASH FIELD SELECTION, IPV6 TRAFFIC [1 OF 2]
IIF
Next Header
Traffic Class
ON
OFF
ON
OFF
Source Address
Dest. Address
L4
L3
L2
IPv6
Source Port
Dest. Port
IIF
Traffic Class
ON
OFF
ON
OFF
Source Address
Dest. Address
ON
OFF
ON
OFF
Next Header 6 or 17
IPv6, UDP or TCP
12 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
L4
HASH FIELD SELECTION, IPV6 TRAFFIC [2 OF 2]
L3
L2
Source Port
Dest. Port
IIF
Next Header
Traffic Class
ON
OFF
ON
OFF
17
Source Address
Destination Address
2152
GTP TEID
ON
OFF
IIF
Next Header
Traffic Class
ON
OFF
ON
OFF
47
GRE Key (32 bits)
IIF
Next Header
Traffic Class
ON
OFF
ON
OFF
47
GRE Key (16 LS Bits)
GRE Protocol 0x880B
Source Address
Dest. Address
Source Address
Dest. Address
GRE Key (16 MS Bits)
IPv6, GRE IPv6, PPTP IPv6, GTP
13 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
HASH FIELD SELECTION
CCC/BRIDGE/VPLS TRAFFIC [1 OF 2]
IIF ON
OFF
Source MAC
Dest. MAC
ON
OFF
Outer 802.1p
ON
OFF
VLAN Tag 1
VLAN Tag 2..N
L4
L3
L2
Ethernet,
non IP or MPLS
Note, VLANs are note
included
14 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
L3/L4
L2
HASH FIELD SELECTION
CCC/BRIDGE/VPLS TRAFFIC [2 OF 2]
IIF ON
OFF
Source MAC
Dest. MAC
ON
OFF
Outer 802.1p
ON
OFF
VLAN Tag 1 or none
VLAN Tag 2 or none
Ether type 0x0800
IPv4 payload
IIF ON
OFF
Source MAC
Dest. MAC
ON
OFF
Outer 802.1p
ON
OFF
VLAN Tag 1 or none
VLAN Tag 2 or none
Ether type 0x8847
MPLS payloadON
OFF
ON
OFF
IIF ON
OFF
Source MAC
Dest. MAC
ON
OFF
Outer 802.1p
ON
OFF
VLAN Tag 1 or none
VLAN Tag 2 or none
Ether type 0x86DD
IPv6 payload ON
OFF
Ethernet, IPv4 Ethernet, IPv6 Ethernet, MPLS
Single knob to control
payload analysis for all
packet types
15 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
HASH FIELD SELECTION, MPLS TRAFFIC
[JUNOS < 14.1]
IIF
Label 2..5 (20 bits each)
Outer Label EXP
ON
OFF
ON
OFF
Label 1 (20 bits)
IPv4, IPv6 payload
IIF
Label 2..5 (20 bits each)
Outer Label EXP
ON
OFF
ON
OFF
Label 1 (20 bits)
IPv4, IPv6 in Ethernet
pseudo-wire
L3/L4
L2
ON
OFF
ON
OFF
Up to 5 top labels
MPLS, Encapsulated IPv4
or IPv6
MPLS, IPv4/IPv6 in
Ethernet Pseudo-wire
16 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
MPLS ENCAPSULATED TRAFFIC DETERMINATION
[JUNOS < 14.1]
Bottom of the
stack reached?
Start
Use up to 5 top labels
in hash computation
Include topmost EXP
(if enabled)
End
No
Check first nibble
Compute IPv4
hash
Length matches?
Compute IPv6
hash
Length matches?
Check Ethertype
Yes Yes
Yes
Skip VLAN
VLANs skipped > 2
0x4 (IPv4) 0x6 (IPv6)
Else
0x8100
0x86DD
0x0800
No
Yes
NoNo
Else
17 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
HASH FIELD SELECTION, MPLS TRAFFIC
[JUNOS >= 14.1]
IIF
Label 2..8 (20 bits each)
Outer Label EXP
ON
OFF
ON
OFF
Label 1 (20 bits)
IPv4, IPv6 payload
IIF
Label 2..8 (20 bits each)
Outer Label EXP
ON
OFF
ON
OFF
Label 1 (20 bits)
IPv4, IPv6 or
MPLS in Ethernet
pseudo-wire
L3/L4
L2
ON
OFF
ON
OFF
MPLS, Encapsulated IPv4
or IPv6
MPLS, IPv4/IPv6 in
Ethernet Pseudo-wire
IIF
Label 2..8 (20 bits each)
Outer Label EXP
ON
OFF
ON
OFF
Label 1 (20 bits)
Entropy Label Indicator
detected,
Payload is not processed
Indicator is not included into
hash
MPLS, Entropy Label
18 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
MPLS ENCAPSULATED TRAFFIC DETERMINATION
[JUNOS >= 14.1]
Bottom of the
stack reached AND
no ELI* detected?
Start
Use up to 8 top labels
in hash computation
except ELI*
Include topmost EXP
(if enabled)
End
No
Check first nibble
Compute IPv4
hash
Length matches?
Compute IPv6
hash
Length matches?
Check Ethertype
Yes Yes
Yes
Skip VLAN
VLANs skipped > 2
0x4 (IPv4) 0x6 (IPv6)
Else
0x8100
0x86DD
0x0800
No
Yes
NoNo
Else
* ELI: Entropy Label Indicator, value of 7
19 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
Byte offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
0
4
8
12 DEI
16 DEI
20
24 S=0[1]
28 S=0[2]
32 S=1[3]
36
40
44
48 DEI
52 DEI
56
60
64
68
72
76
80
84
88
Identification Flags Fragment offset
Version Header Length DSCP ECN Total Length
Checksum
Ethertype (0x0800, IPv4)PCP Encapsulated Inner VLAN
Payload Data
0 Protocol = 17 (UDP) UDP Length
Source Port Destination Port
Length
TTL Protocol Header checksum
Source Address
Destination Address
Destination MAC
Destination MAC
Source MAC
Source MAC
TPID (0x8100) .1P
EXP[2]
Encapsulated Destination MAC
Encapsulated
Ethernet
Outer VLAN
TPID (0x8100) .1P Inner VLAN
Ethertype (0x8847, MPLS)
Encapsulated SRC MAC
Encapsulated Destination MAC
TPID (0x8100)PCP Encapsulated Outer VLAN
TPID (0x8100)Encapsulated SRC MAC
Bit position
UDP
IPv4
TTL[2] Label[3]
Label[3] EXP[3] TTL[3]
MPLS
Ethernet
Label[1]
Label[1] EXP[1] TTL[1] Label[2]
Label[2]
NOTES ON MPLS PAYLOAD PROCESSING
Algorithm features
 Heuristic nature, produces good detection results
 Certain (fixed) requirements to the traffic
 No control word for EoMPLS/VPLS frames
 0x8100 Ethertype for VLANs
Sample hash field selection for bridged MPLS traffic with pseudo-wire encapsulated UDP.
All optional fields are enabled except IIF, fields included into computation are in black
20 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
HOW HASH IS COMPUTED?
Trio hash computation algorithm
 Uses a combination of Cyclic Redundancy Check (CRC) 13 and
CRC-31 polynomial functions (similar functions are used to
compute ethernet frame checksum)
 Implemented in hardware
 Very efficient
Hash function result
 One 31 bit number (used to select the next-hop)
 For hierarchical load balancing sections of that result are used
21 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
BGP
BGP
NEXT-HOP SELECTION EXAMPLE (MULTIPLE LEVELS)
IP Route
Next-hop
list 1.1
Indirect
next-hop 1
Indirect
next-hop 2
Indirect
next-hop 3
List 2.2
List 2.3
LSP 1
LSP 2
LSP 3
AE0-1
AE0-2
AE0-3
LSP1
PE0
LSP3
PE1
PE3
PE2
1st level balancing 2nd level balancing 3rd level balancing
Different set of bits from the hash are used to
select a next-hop at each level (to prevent
polarization)
List 2.1
AE0 List
BGP
22 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
POLARIZATION PREVENTION (NETWORK WIDE)
Problem statement
 Hashing at different nodes
may produce same results
 Will result in traffic
polarization
Solution
 Include a hash seed into
computation
 Hash seed is based on the
system MAC
 Enabled by default, non
configurable
Traffic
Hash computation,
1st load balancing
decision
Hash computation (same result,
unless we enable IIF inclusion),
2nd load balancing decision
Different hash
seeds fixes that
23 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
MULTICAST TRAFFIC LOAD BALANCING
Notes
 Only relevant in the context of aggregated ethernet
(ECMP join load balancing is managed by the
downstream)
 In enhanced-ip mode the algorithm behaves exactly the
same as for unicast traffic
 Same fields selected for hashing
 Same hash computation procedure
 Same member links are selected
24 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
HASH CONFIGURATION
forwarding-options {
enhanced-hash-key {
family inet {
incoming-interface-index;
gtp-tunnel-endpoint-identifier;
no-destination-port;
no-source-port;
type-of-service;
}
}
}
forwarding-options {
enhanced-hash-key {
family inet6 {
incoming-interface-index;
gtp-tunnel-endpoint-identifier;
no-destination-port;
no-source-port;
traffic-class;
}
}
}
IPv6 hash configuration
IPv4 hash configuration
forwarding-options {
enhanced-hash-key {
family mpls {
incoming-interface-index;
label-1-exp;
no-payload;
no-ether-pseudowire;/*13.3R3*/
}
}
}
forwarding-options {
enhanced-hash-key {
family multiservice {
incoming-interface-index;
no-mac-address;
no-payload;
outer-priority;
}
}
}
CCC/VPLS/Bridge hash configuration
MPLS hash configuration
25 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
SYMMETRIC LOAD BALANCING
26 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
SYMMETRIC LOAD BALANCING
Problem statement
 Same flow should reach same stateful appliance irrespective of the path (through
MX1 or MX2)
 Reverse flow should reach same stateful appliance
Solution
 Disable router hash seed
 Synchronize link order through link-index configuration
 Second problem is solved on Trio automatically
MX1 MX2Service
Appliances
Flow A->B
Flow B->A
27 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
CONSISTENT HASHING
28 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
CONSISTENT HASHING
Problem statement
 L3-L4 load balancing between servers should remain consistent in
failure scenarios (when server goes down or when it recovers)
 Need to detect and react to server failures
Solution
 Use EBGP for server health checks
 Use modified Equal Cost Multipath to distribute traffic
MX
Server 1
Server 2
Server N [N = 1..64]
Enabling highly efficient L3/L4 Load Balancing
29 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
CONSISTENT HASHING
IMPLEMENTATION DETAILS
All Servers active
Server 1
Server 2
Server 3
Flow 1, Flow 2
Flow 3, Flow 4
Flow 5, Flow 6
Server 2 / Link 2 fails
Server 1
Server 2
Server 3
Flow 1, Flow 2, Flow 3
Flow 5, Flow 6, Flow 4
Server 2 recovers
Server 1
Server 2
Server 3
Flow 1, Flow 2
Flow 3, Flow 4
Flow 5, Flow 6
Flow (hash bucket) to ECMP next-hop mapping table in time
MX
Server 1
Server 2
Server 3
eBGP
30 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
CONSISTENT HASHING
CONFIGURATION, SOFTWARE SUPPORT
policy-options {
policy-statement c-hash {
from {
route-filter ${virtual_ip};
}
then {
load-balance consistent-hash;
}
}
}
protocols {
bgp {
group server-group {
import c-hash;
}
}
}
Configuration
LINE CARD All Trio
JUNOS 13.3R3
LEVEL ECMP only
OTHER Unicast only
SCALING <1000 ECMP NHs
Software and hardware
31 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
THEORETICAL LOAD BALANCING
EFFICIENCY ANALYSIS
32 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
HOW MANY FLOWS DO WE NEED?
More flows will
 Improve load balancing efficiency (or reduce imbalance)
 Reduce imbalance probability
Some definitions
 Positive imbalance: difference between the max link rate and the expected average
 Tolerance limit: % of capacity that allowed to be wasted
1
2
3
4
5
6
7
8
Positive
Imbalance
Expected average
Max link rate
33 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
ESTIMATING THE FLOW COLLISION PROBABILITY
Traffic model
 N equal traffic flows are sent over M equal paths (or distributed between M member links);
 Traffic flows are balanced between paths using hash. The hash function produces uniform results, probability of a flow taking specific
path is 1/M;
 The balancing implemented for each flow independently. I.e. if one flow took path 1 with probability 1/M, another flow will take this
path with the same probability.
Bernoulli’s Trial Scheme applies in this case
 A given path is selected with probability 1/M;
 Any of other paths is selected with probability 1-1/M.
KN
K
MMK
N
KP
1
1
1
)(
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
14.00%
16.00%
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
KN
K
MMK
N
KP
1
1
1
)( Probability of the K flows hitting the same link
64 flows distributed over 8 links, probability of K flows hitting the same link N flows
Probability
34 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
TAKING TOLERANCE INTO ACCOUNT
How to find the imbalance probability
 Define tolerance limit (25% in this case, i.e. 10 flows is ok to map to a single link)
 Sum up probabilities of undesired outcomes (more than 11 flows mapped to a link)
Some results
 With 25% imbalance target, probability to stay within this target is 82.96%
 To reach 99.99% probability, need to increase the number of flows to 1605
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
14.00%
16.00%
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
64 flows distributed over 8 links, probability of K flows hitting the same link, outcomes in green are within
25% tolerance
N flows
Probability
35 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
ADAPTIVE LOAD BALANCING
36 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
Ingress PFE
Parse
packet
Compute
hash
Lookup
Route
Select
next-hop
ADAPTIVE LOAD BALANCING OVERVIEW
Monitor utilization
and adjust mapping
Hash Buckets
1
2
N
LAG link [1 .. M]
LAG link [1 .. M]
LAG link [1 .. M]
Rate Table
1
2
N
Rate 1
Rate 2
Rate 3
To fabricFrom WAN
Implementation details
 Track traffic rate per hash bucket
 Re-map hash buckets periodically if imbalance crosses a threshold
37 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
Link #1
Link #2
Link #8
ADAPTIVE AT WORK
Example
 Balancing towards network core (8 links in
a group)
 Many small flows
 Very few high volume flows
Results
 Without adaptive balancing, flows are
distributed in a uniform way, but link rates
differ because of the high volume flows
 With adaptive, the imbalance is
compensated
1 N
Rate
Rates per hash bucket
High volume flows
38 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
ADAPTIVE MAPPING OF HASH BUCKETS
Link rates, sample uniform (default) mapping of hash buckets to links
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
Link rates, sample adaptive mapping of hash buckets to links Savings
39 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
LAB VERIFICATION
1.25
1.3
1.4
1.45
1.5
1.55
1.6
1.35
1.65
1.7
42:00 43:00 44:00 45:00 46:00 47:00 48:00 49:00 50:00 51:00 52:00 53:00
Time, MM:SS
Interfacerate(Gbps)
Link 1
Link 2
Link 3
Link 4
Link 5
Link 6
Adaptive
balancing enabled Adjustment made
40 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
MX ADAPTIVE LOAD BALANCING SUPPORT
INGRESS LINE CARD Trio
EGRESS LINE CARD Trio, DPC
MX MIXED MODE Yes
JUNOS 12.3R4
Software and hardware
LEVEL
Only across LAG members
(no ECMP)
OTHER
Tracks usage and
compensates imbalance for
unicast traffic only,
multicast is load balanced
in a regular way
Features
OTHER NOTES
Optimization is local to the ingress PFE, in case of multiple ingress PFEs, each
ingress PFE compensates imbalance on its own
Hash bucket counters are maintained per egress IFL
Multi-LU line cards are supported (MPC3, MPC4)
41 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
STATEFUL LOAD BALANCING
42 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
Ingress PFE
Parse
packet
Compute
hash
Lookup
Route
Map to hash-bucket
Select a link for a new
bucket
Select next-hop
STATEFUL LOAD BALANCING OVERVIEW
Hash Buckets
1
2
N
LAG link [1 .. M]
LAG link [1 .. M]
LAG link [1 .. M]
To fabricFrom WAN
Implementation details
 Initially all hash buckets point to void
 Map packet to hash bucket
 If a hash bucket does not point to a link, incrementally choose a link
43 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
MX STATEFUL LOAD BALANCING SUPPORT
INGRESS LINE CARD Trio
EGRESS LINE CARD Trio, DPC
MX MIXED MODE Yes
JUNOS 12.3R3
Software and hardware
LEVEL
Only across LAG members
(no ECMP)
OTHER
Unicast traffic only,
multicast follows regular
hashing
Features
OTHER NOTES
Mapping is local to the ingress PFE, in case of multiple ingress PFEs, each
ingress PFE maintains its own mapping
Multi-LU line cards are not supported (MPC3, MPC4)
44 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
Stateful
• Best for few flows of the same size
• Requires more NPU memory
Regular
• Best for multiple flows of the same size
• Note, use formulas to estimate
number of flows (threshold)
USAGE GUIDELINES
Adaptive
• Best for multiple flows with few high
volume flows
• Requires more NPU memory
Flow rate
Nflows
Threshold
Flow rate
Nflows
Threshold
Flow rate
Nflows
Threshold
45 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
RELATED FEATURE LIST AND SCALING
SW FEATURE
10.2 Baseline Trio implementation
11.4R6 Turn off hash calculation based on Layer-4 information for fragments
12.3R2 GTP TEID hash inclusion
12.3R2 Introduce length checks in the heuristic algorithm
12.3R3 Increase number of links in a LAG to 64
13.3R3 Selectively disable hash computation for psedo-wires only
13.3R3 Consistent Hashing
ECMP PATHS 64
LAG MEMBERS 64
LAG GROUPS 496
Feature list
Current scaling
Вопросы балансировки трафика

More Related Content

Вопросы балансировки трафика

  • 1. MX TRIO LOAD BALANCING Dmitry Shokarev Product Line Management Routing Business Unit Version 1.4, April 2014 Confidential
  • 2. 2 Copyright © 2009 Juniper Networks, Inc. www.juniper.net AGENDA 1. High level load balancing overview 2. Packet parsing and hash computation 3. Advanced Topics 4. Theoretical load balancing efficiency analysis 5. Adaptive and Stateful load balancing
  • 3. 3 Copyright © 2009 Juniper Networks, Inc. www.juniper.net HIGH LEVEL OVERVIEW
  • 4. 4 Copyright © 2009 Juniper Networks, Inc. www.juniper.net Ingress PFE Parse packet Compute hash Lookup Route Select next-hop HIGH LEVEL LOAD BALANCING OVERVIEW (SIMPLIFIED) Parse packet  Depending on the interface encapsulation, select packet fields for route lookup Compute hash  Compute fixed size hash value from variable set of packet fields Egress PFE Encapsulate Lookup route  Find a route based on the packet fields Select next-hop  Select ultimate next-hop from a list of possible next-hops (multiple levels)
  • 5. 5 Copyright © 2009 Juniper Networks, Inc. www.juniper.net PACKET PARSING AND HASH COMPUTATION
  • 6. 6 Copyright © 2009 Juniper Networks, Inc. www.juniper.net Hash is symmetric (swapping the fields does not change the hash result) Applicable only if there is a field match (TCP or UDP packets in this case). The field is included into the hash L4 L3 L2 HOW TO READ THE DIAGRAM [1 OF 2] Source Port ON OFF Dest. Port IIF Protocol DSCP ON OFF ON OFF 6 or 17 Source Address Dest. Address IIF Protocol DSCP ON OFF IPv4, GRE (PPTP) ON OFF 47 GRE Key (16 bits) GRE Protocol 0x880B Source Address Dest. Address Configurable (default on) Applicable only if there is a field match (PPTP packets in this case). The field is NOT included into the hash computation Field is included by default and can’t be turned off IIF stands for Incoming Interface Index (internal logical interface identifier) ON OFF Configurable (default off) IPv4, UDP or TCP
  • 7. 7 Copyright © 2009 Juniper Networks, Inc. www.juniper.net HOW TO READ THE DIAGRAM [2 OF 2] L3/L4 L2 IIF ON OFF Source MAC Dest. MAC ON OFF Outer 802.1p ON OFF VLAN Tag 1 VLAN Tag 2..N Ether type 0x0800 IPv4 payload ON OFF Ethernet, IPv4 Shaded area refers to the hash field selection procedure defined somewhere else In this case IPv4 hash selection procedure will be used Protocol DSCP ON OFF 47 GRE Key (32 bits) Source Address Dest. Address Fragment Flag 0 Fragment Offset 0 IPv4, GRE, non fragmented Protocol DSCP ON OFF 47 GRE Key (16 LS Bits) GRE Protocol 0x880B Source Address Dest. Address Fragment Flag 0 Fragment Offset 0 GRE Key (16 MS Bits) IPv4, PPTP, non-fragmented Source Port Dest. Port Protocol DSCP ON OFF 17 Source Address Destination Address 215 2 GTP TEID ON OFF Fragment Flag 0 Fragment Offset 0 IPv4, GTP, non-fragmented Protocol DSCP ON OFF IPv4 Source Address Dest. Address Source Port Dest. Port Fragment Flag DSCP ON OFF 0 Source Address Dest. Address ON OFF ON OFF Fragment Offset 0 Protocol 6 or 17 IPv4, UDP or TCP, non-fragmented
  • 8. 8 Copyright © 2009 Juniper Networks, Inc. www.juniper.net WHICH FIELDS SELECTED WHEN? IP IP Use IP fields MPLS MPLS Use MPLS fields CCC, VPLS, Bridge Use CCC/Bridge/ VPLS fields Answer depends on the encapsulation on ingress / egress CCC,VPLS, Bridge IP MPLS Use IP fields CCC Use CCC/Bridge/ VPLS fields MPLS Use MPLS fields VPLS, Bridge Use CCC/Bridge/ VPLS fields MPLS IP IP+GRE/IPIP Use IP fields Use Inner IP fields VPLS, Bridge IP (VIA IRB) Use IP fields
  • 9. 9 Copyright © 2009 Juniper Networks, Inc. www.juniper.net HASH FIELD SELECTION, IPV4 TRAFFIC [1 OF 2] IIF Protocol DSCP ON OFF IPv4 Source Port Dest. Port ON OFF IIF Fragment Flag DSCP ON OFF ON OFF 0 Source Address Dest. Address Source Address Dest. Address ON OFF ON OFF L4 L3 L2 Fragment Offset 0 Include L4 only for non fragments Protocol 6 or 17 IPv4, UDP or TCP, non-fragmented
  • 10. 10 Copyright © 2009 Juniper Networks, Inc. www.juniper.net L4 HASH FIELD SELECTION, IPV4 TRAFFIC [2 OF 2] L3 L2 Source Port Dest. Port IIF Protocol DSCP ON OFF ON OFF 17 Source Address Destination Address 2152 GTP TEID ON OFF IIF Protocol DSCP ON OFF ON OFF 47 GRE Key (32 bits) IIF Protocol DSCP ON OFF ON OFF 47 GRE Key (16 LS Bits) GRE Protocol 0x880B Source Address Dest. Address Source Address Dest. Address Fragment Flag 0 Fragment Offset 0 Fragment Flag 0 Fragment Offset 0 Fragment Flag 0 Fragment Offset 0 GRE Key (16 MS Bits) IPv4, GRE, non fragmented IPv4, PPTP, non-fragmented IPv4, GTP, non-fragmented
  • 11. 11 Copyright © 2009 Juniper Networks, Inc. www.juniper.net HASH FIELD SELECTION, IPV6 TRAFFIC [1 OF 2] IIF Next Header Traffic Class ON OFF ON OFF Source Address Dest. Address L4 L3 L2 IPv6 Source Port Dest. Port IIF Traffic Class ON OFF ON OFF Source Address Dest. Address ON OFF ON OFF Next Header 6 or 17 IPv6, UDP or TCP
  • 12. 12 Copyright © 2009 Juniper Networks, Inc. www.juniper.net L4 HASH FIELD SELECTION, IPV6 TRAFFIC [2 OF 2] L3 L2 Source Port Dest. Port IIF Next Header Traffic Class ON OFF ON OFF 17 Source Address Destination Address 2152 GTP TEID ON OFF IIF Next Header Traffic Class ON OFF ON OFF 47 GRE Key (32 bits) IIF Next Header Traffic Class ON OFF ON OFF 47 GRE Key (16 LS Bits) GRE Protocol 0x880B Source Address Dest. Address Source Address Dest. Address GRE Key (16 MS Bits) IPv6, GRE IPv6, PPTP IPv6, GTP
  • 13. 13 Copyright © 2009 Juniper Networks, Inc. www.juniper.net HASH FIELD SELECTION CCC/BRIDGE/VPLS TRAFFIC [1 OF 2] IIF ON OFF Source MAC Dest. MAC ON OFF Outer 802.1p ON OFF VLAN Tag 1 VLAN Tag 2..N L4 L3 L2 Ethernet, non IP or MPLS Note, VLANs are note included
  • 14. 14 Copyright © 2009 Juniper Networks, Inc. www.juniper.net L3/L4 L2 HASH FIELD SELECTION CCC/BRIDGE/VPLS TRAFFIC [2 OF 2] IIF ON OFF Source MAC Dest. MAC ON OFF Outer 802.1p ON OFF VLAN Tag 1 or none VLAN Tag 2 or none Ether type 0x0800 IPv4 payload IIF ON OFF Source MAC Dest. MAC ON OFF Outer 802.1p ON OFF VLAN Tag 1 or none VLAN Tag 2 or none Ether type 0x8847 MPLS payloadON OFF ON OFF IIF ON OFF Source MAC Dest. MAC ON OFF Outer 802.1p ON OFF VLAN Tag 1 or none VLAN Tag 2 or none Ether type 0x86DD IPv6 payload ON OFF Ethernet, IPv4 Ethernet, IPv6 Ethernet, MPLS Single knob to control payload analysis for all packet types
  • 15. 15 Copyright © 2009 Juniper Networks, Inc. www.juniper.net HASH FIELD SELECTION, MPLS TRAFFIC [JUNOS < 14.1] IIF Label 2..5 (20 bits each) Outer Label EXP ON OFF ON OFF Label 1 (20 bits) IPv4, IPv6 payload IIF Label 2..5 (20 bits each) Outer Label EXP ON OFF ON OFF Label 1 (20 bits) IPv4, IPv6 in Ethernet pseudo-wire L3/L4 L2 ON OFF ON OFF Up to 5 top labels MPLS, Encapsulated IPv4 or IPv6 MPLS, IPv4/IPv6 in Ethernet Pseudo-wire
  • 16. 16 Copyright © 2009 Juniper Networks, Inc. www.juniper.net MPLS ENCAPSULATED TRAFFIC DETERMINATION [JUNOS < 14.1] Bottom of the stack reached? Start Use up to 5 top labels in hash computation Include topmost EXP (if enabled) End No Check first nibble Compute IPv4 hash Length matches? Compute IPv6 hash Length matches? Check Ethertype Yes Yes Yes Skip VLAN VLANs skipped > 2 0x4 (IPv4) 0x6 (IPv6) Else 0x8100 0x86DD 0x0800 No Yes NoNo Else
  • 17. 17 Copyright © 2009 Juniper Networks, Inc. www.juniper.net HASH FIELD SELECTION, MPLS TRAFFIC [JUNOS >= 14.1] IIF Label 2..8 (20 bits each) Outer Label EXP ON OFF ON OFF Label 1 (20 bits) IPv4, IPv6 payload IIF Label 2..8 (20 bits each) Outer Label EXP ON OFF ON OFF Label 1 (20 bits) IPv4, IPv6 or MPLS in Ethernet pseudo-wire L3/L4 L2 ON OFF ON OFF MPLS, Encapsulated IPv4 or IPv6 MPLS, IPv4/IPv6 in Ethernet Pseudo-wire IIF Label 2..8 (20 bits each) Outer Label EXP ON OFF ON OFF Label 1 (20 bits) Entropy Label Indicator detected, Payload is not processed Indicator is not included into hash MPLS, Entropy Label
  • 18. 18 Copyright © 2009 Juniper Networks, Inc. www.juniper.net MPLS ENCAPSULATED TRAFFIC DETERMINATION [JUNOS >= 14.1] Bottom of the stack reached AND no ELI* detected? Start Use up to 8 top labels in hash computation except ELI* Include topmost EXP (if enabled) End No Check first nibble Compute IPv4 hash Length matches? Compute IPv6 hash Length matches? Check Ethertype Yes Yes Yes Skip VLAN VLANs skipped > 2 0x4 (IPv4) 0x6 (IPv6) Else 0x8100 0x86DD 0x0800 No Yes NoNo Else * ELI: Entropy Label Indicator, value of 7
  • 19. 19 Copyright © 2009 Juniper Networks, Inc. www.juniper.net Byte offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 0 4 8 12 DEI 16 DEI 20 24 S=0[1] 28 S=0[2] 32 S=1[3] 36 40 44 48 DEI 52 DEI 56 60 64 68 72 76 80 84 88 Identification Flags Fragment offset Version Header Length DSCP ECN Total Length Checksum Ethertype (0x0800, IPv4)PCP Encapsulated Inner VLAN Payload Data 0 Protocol = 17 (UDP) UDP Length Source Port Destination Port Length TTL Protocol Header checksum Source Address Destination Address Destination MAC Destination MAC Source MAC Source MAC TPID (0x8100) .1P EXP[2] Encapsulated Destination MAC Encapsulated Ethernet Outer VLAN TPID (0x8100) .1P Inner VLAN Ethertype (0x8847, MPLS) Encapsulated SRC MAC Encapsulated Destination MAC TPID (0x8100)PCP Encapsulated Outer VLAN TPID (0x8100)Encapsulated SRC MAC Bit position UDP IPv4 TTL[2] Label[3] Label[3] EXP[3] TTL[3] MPLS Ethernet Label[1] Label[1] EXP[1] TTL[1] Label[2] Label[2] NOTES ON MPLS PAYLOAD PROCESSING Algorithm features  Heuristic nature, produces good detection results  Certain (fixed) requirements to the traffic  No control word for EoMPLS/VPLS frames  0x8100 Ethertype for VLANs Sample hash field selection for bridged MPLS traffic with pseudo-wire encapsulated UDP. All optional fields are enabled except IIF, fields included into computation are in black
  • 20. 20 Copyright © 2009 Juniper Networks, Inc. www.juniper.net HOW HASH IS COMPUTED? Trio hash computation algorithm  Uses a combination of Cyclic Redundancy Check (CRC) 13 and CRC-31 polynomial functions (similar functions are used to compute ethernet frame checksum)  Implemented in hardware  Very efficient Hash function result  One 31 bit number (used to select the next-hop)  For hierarchical load balancing sections of that result are used
  • 21. 21 Copyright © 2009 Juniper Networks, Inc. www.juniper.net BGP BGP NEXT-HOP SELECTION EXAMPLE (MULTIPLE LEVELS) IP Route Next-hop list 1.1 Indirect next-hop 1 Indirect next-hop 2 Indirect next-hop 3 List 2.2 List 2.3 LSP 1 LSP 2 LSP 3 AE0-1 AE0-2 AE0-3 LSP1 PE0 LSP3 PE1 PE3 PE2 1st level balancing 2nd level balancing 3rd level balancing Different set of bits from the hash are used to select a next-hop at each level (to prevent polarization) List 2.1 AE0 List BGP
  • 22. 22 Copyright © 2009 Juniper Networks, Inc. www.juniper.net POLARIZATION PREVENTION (NETWORK WIDE) Problem statement  Hashing at different nodes may produce same results  Will result in traffic polarization Solution  Include a hash seed into computation  Hash seed is based on the system MAC  Enabled by default, non configurable Traffic Hash computation, 1st load balancing decision Hash computation (same result, unless we enable IIF inclusion), 2nd load balancing decision Different hash seeds fixes that
  • 23. 23 Copyright © 2009 Juniper Networks, Inc. www.juniper.net MULTICAST TRAFFIC LOAD BALANCING Notes  Only relevant in the context of aggregated ethernet (ECMP join load balancing is managed by the downstream)  In enhanced-ip mode the algorithm behaves exactly the same as for unicast traffic  Same fields selected for hashing  Same hash computation procedure  Same member links are selected
  • 24. 24 Copyright © 2009 Juniper Networks, Inc. www.juniper.net HASH CONFIGURATION forwarding-options { enhanced-hash-key { family inet { incoming-interface-index; gtp-tunnel-endpoint-identifier; no-destination-port; no-source-port; type-of-service; } } } forwarding-options { enhanced-hash-key { family inet6 { incoming-interface-index; gtp-tunnel-endpoint-identifier; no-destination-port; no-source-port; traffic-class; } } } IPv6 hash configuration IPv4 hash configuration forwarding-options { enhanced-hash-key { family mpls { incoming-interface-index; label-1-exp; no-payload; no-ether-pseudowire;/*13.3R3*/ } } } forwarding-options { enhanced-hash-key { family multiservice { incoming-interface-index; no-mac-address; no-payload; outer-priority; } } } CCC/VPLS/Bridge hash configuration MPLS hash configuration
  • 25. 25 Copyright © 2009 Juniper Networks, Inc. www.juniper.net SYMMETRIC LOAD BALANCING
  • 26. 26 Copyright © 2009 Juniper Networks, Inc. www.juniper.net SYMMETRIC LOAD BALANCING Problem statement  Same flow should reach same stateful appliance irrespective of the path (through MX1 or MX2)  Reverse flow should reach same stateful appliance Solution  Disable router hash seed  Synchronize link order through link-index configuration  Second problem is solved on Trio automatically MX1 MX2Service Appliances Flow A->B Flow B->A
  • 27. 27 Copyright © 2009 Juniper Networks, Inc. www.juniper.net CONSISTENT HASHING
  • 28. 28 Copyright © 2009 Juniper Networks, Inc. www.juniper.net CONSISTENT HASHING Problem statement  L3-L4 load balancing between servers should remain consistent in failure scenarios (when server goes down or when it recovers)  Need to detect and react to server failures Solution  Use EBGP for server health checks  Use modified Equal Cost Multipath to distribute traffic MX Server 1 Server 2 Server N [N = 1..64] Enabling highly efficient L3/L4 Load Balancing
  • 29. 29 Copyright © 2009 Juniper Networks, Inc. www.juniper.net CONSISTENT HASHING IMPLEMENTATION DETAILS All Servers active Server 1 Server 2 Server 3 Flow 1, Flow 2 Flow 3, Flow 4 Flow 5, Flow 6 Server 2 / Link 2 fails Server 1 Server 2 Server 3 Flow 1, Flow 2, Flow 3 Flow 5, Flow 6, Flow 4 Server 2 recovers Server 1 Server 2 Server 3 Flow 1, Flow 2 Flow 3, Flow 4 Flow 5, Flow 6 Flow (hash bucket) to ECMP next-hop mapping table in time MX Server 1 Server 2 Server 3 eBGP
  • 30. 30 Copyright © 2009 Juniper Networks, Inc. www.juniper.net CONSISTENT HASHING CONFIGURATION, SOFTWARE SUPPORT policy-options { policy-statement c-hash { from { route-filter ${virtual_ip}; } then { load-balance consistent-hash; } } } protocols { bgp { group server-group { import c-hash; } } } Configuration LINE CARD All Trio JUNOS 13.3R3 LEVEL ECMP only OTHER Unicast only SCALING <1000 ECMP NHs Software and hardware
  • 31. 31 Copyright © 2009 Juniper Networks, Inc. www.juniper.net THEORETICAL LOAD BALANCING EFFICIENCY ANALYSIS
  • 32. 32 Copyright © 2009 Juniper Networks, Inc. www.juniper.net HOW MANY FLOWS DO WE NEED? More flows will  Improve load balancing efficiency (or reduce imbalance)  Reduce imbalance probability Some definitions  Positive imbalance: difference between the max link rate and the expected average  Tolerance limit: % of capacity that allowed to be wasted 1 2 3 4 5 6 7 8 Positive Imbalance Expected average Max link rate
  • 33. 33 Copyright © 2009 Juniper Networks, Inc. www.juniper.net ESTIMATING THE FLOW COLLISION PROBABILITY Traffic model  N equal traffic flows are sent over M equal paths (or distributed between M member links);  Traffic flows are balanced between paths using hash. The hash function produces uniform results, probability of a flow taking specific path is 1/M;  The balancing implemented for each flow independently. I.e. if one flow took path 1 with probability 1/M, another flow will take this path with the same probability. Bernoulli’s Trial Scheme applies in this case  A given path is selected with probability 1/M;  Any of other paths is selected with probability 1-1/M. KN K MMK N KP 1 1 1 )( 0.00% 2.00% 4.00% 6.00% 8.00% 10.00% 12.00% 14.00% 16.00% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 KN K MMK N KP 1 1 1 )( Probability of the K flows hitting the same link 64 flows distributed over 8 links, probability of K flows hitting the same link N flows Probability
  • 34. 34 Copyright © 2009 Juniper Networks, Inc. www.juniper.net TAKING TOLERANCE INTO ACCOUNT How to find the imbalance probability  Define tolerance limit (25% in this case, i.e. 10 flows is ok to map to a single link)  Sum up probabilities of undesired outcomes (more than 11 flows mapped to a link) Some results  With 25% imbalance target, probability to stay within this target is 82.96%  To reach 99.99% probability, need to increase the number of flows to 1605 0.00% 2.00% 4.00% 6.00% 8.00% 10.00% 12.00% 14.00% 16.00% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 64 flows distributed over 8 links, probability of K flows hitting the same link, outcomes in green are within 25% tolerance N flows Probability
  • 35. 35 Copyright © 2009 Juniper Networks, Inc. www.juniper.net ADAPTIVE LOAD BALANCING
  • 36. 36 Copyright © 2009 Juniper Networks, Inc. www.juniper.net Ingress PFE Parse packet Compute hash Lookup Route Select next-hop ADAPTIVE LOAD BALANCING OVERVIEW Monitor utilization and adjust mapping Hash Buckets 1 2 N LAG link [1 .. M] LAG link [1 .. M] LAG link [1 .. M] Rate Table 1 2 N Rate 1 Rate 2 Rate 3 To fabricFrom WAN Implementation details  Track traffic rate per hash bucket  Re-map hash buckets periodically if imbalance crosses a threshold
  • 37. 37 Copyright © 2009 Juniper Networks, Inc. www.juniper.net Link #1 Link #2 Link #8 ADAPTIVE AT WORK Example  Balancing towards network core (8 links in a group)  Many small flows  Very few high volume flows Results  Without adaptive balancing, flows are distributed in a uniform way, but link rates differ because of the high volume flows  With adaptive, the imbalance is compensated 1 N Rate Rates per hash bucket High volume flows
  • 38. 38 Copyright © 2009 Juniper Networks, Inc. www.juniper.net ADAPTIVE MAPPING OF HASH BUCKETS Link rates, sample uniform (default) mapping of hash buckets to links 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Link rates, sample adaptive mapping of hash buckets to links Savings
  • 39. 39 Copyright © 2009 Juniper Networks, Inc. www.juniper.net LAB VERIFICATION 1.25 1.3 1.4 1.45 1.5 1.55 1.6 1.35 1.65 1.7 42:00 43:00 44:00 45:00 46:00 47:00 48:00 49:00 50:00 51:00 52:00 53:00 Time, MM:SS Interfacerate(Gbps) Link 1 Link 2 Link 3 Link 4 Link 5 Link 6 Adaptive balancing enabled Adjustment made
  • 40. 40 Copyright © 2009 Juniper Networks, Inc. www.juniper.net MX ADAPTIVE LOAD BALANCING SUPPORT INGRESS LINE CARD Trio EGRESS LINE CARD Trio, DPC MX MIXED MODE Yes JUNOS 12.3R4 Software and hardware LEVEL Only across LAG members (no ECMP) OTHER Tracks usage and compensates imbalance for unicast traffic only, multicast is load balanced in a regular way Features OTHER NOTES Optimization is local to the ingress PFE, in case of multiple ingress PFEs, each ingress PFE compensates imbalance on its own Hash bucket counters are maintained per egress IFL Multi-LU line cards are supported (MPC3, MPC4)
  • 41. 41 Copyright © 2009 Juniper Networks, Inc. www.juniper.net STATEFUL LOAD BALANCING
  • 42. 42 Copyright © 2009 Juniper Networks, Inc. www.juniper.net Ingress PFE Parse packet Compute hash Lookup Route Map to hash-bucket Select a link for a new bucket Select next-hop STATEFUL LOAD BALANCING OVERVIEW Hash Buckets 1 2 N LAG link [1 .. M] LAG link [1 .. M] LAG link [1 .. M] To fabricFrom WAN Implementation details  Initially all hash buckets point to void  Map packet to hash bucket  If a hash bucket does not point to a link, incrementally choose a link
  • 43. 43 Copyright © 2009 Juniper Networks, Inc. www.juniper.net MX STATEFUL LOAD BALANCING SUPPORT INGRESS LINE CARD Trio EGRESS LINE CARD Trio, DPC MX MIXED MODE Yes JUNOS 12.3R3 Software and hardware LEVEL Only across LAG members (no ECMP) OTHER Unicast traffic only, multicast follows regular hashing Features OTHER NOTES Mapping is local to the ingress PFE, in case of multiple ingress PFEs, each ingress PFE maintains its own mapping Multi-LU line cards are not supported (MPC3, MPC4)
  • 44. 44 Copyright © 2009 Juniper Networks, Inc. www.juniper.net Stateful • Best for few flows of the same size • Requires more NPU memory Regular • Best for multiple flows of the same size • Note, use formulas to estimate number of flows (threshold) USAGE GUIDELINES Adaptive • Best for multiple flows with few high volume flows • Requires more NPU memory Flow rate Nflows Threshold Flow rate Nflows Threshold Flow rate Nflows Threshold
  • 45. 45 Copyright © 2009 Juniper Networks, Inc. www.juniper.net RELATED FEATURE LIST AND SCALING SW FEATURE 10.2 Baseline Trio implementation 11.4R6 Turn off hash calculation based on Layer-4 information for fragments 12.3R2 GTP TEID hash inclusion 12.3R2 Introduce length checks in the heuristic algorithm 12.3R3 Increase number of links in a LAG to 64 13.3R3 Selectively disable hash computation for psedo-wires only 13.3R3 Consistent Hashing ECMP PATHS 64 LAG MEMBERS 64 LAG GROUPS 496 Feature list Current scaling

Editor's Notes

  1. Darker colors for offending flows
  2. Length check for heuristic algorithm was introduced in PR858588, additional checks via PR 946694Disable hashing for fragmented traffic, PR 828338