Best one:
http://networkingbodges.blogspot.in/2012/12/all-sorts-of-things-about-lacp-and-lags.html?showComment=1408558282443#c3629621710549057766
Copied from above link
lacp pkt format:
Ref: http://h20564.www2.hpe.com/hpsc/doc/public/display?docId=mmr_sf-EN_US000005384
what is MC lag :
http://www.thomas-krenn.com/en/wiki/Link_Aggregation_and_LACP_basics
http://docs.oracle.com/cd/E19253-01/816-4554/fpjvl/index.html
from IEEE:
http://www.ieee802.org/3/hssg/public/apr07/frazier_01_0407.pdf
http://www.ieee802.org/3/ad/public/mar99/seaman_1_0399.pdf
http://networkingbodges.blogspot.in/2012/12/all-sorts-of-things-about-lacp-and-lags.html?showComment=1408558282443#c3629621710549057766
Copied from above link
All sorts of things about LACP and LAGs
A lot of people consider link aggregation groups (LAG / etherchannel / portchannel / MLT) to be pretty basic functionality that "just works" and don't really think any more about it. As with many networking technologies, there is a lot of intelligence responsible for creating the smooth veneer of simplicity.
The basic concept of the LAG is that multiple physical ports are combined into one logical bundle. This provides benefits including:
Active / standby groups are generally used when resilience is required, but it is not desirable for the LAG to pass more than a certain amount of traffic or for the available bandwidth to vary. Typical use cases are service provider environments where the customer only pays for a certain bandwidth and corporate networks with highly over-subscribed core.
Alcatel-Lucent 7750:
A:7750# configure port 2/2/[19..20] ethernet mode access
*A:7750# configure port 2/2/[19..20] ethernet autonegotiate limited
*A:7750# configure port 2/2/[19..20] no shutdown
*A:7750# configure lag 1
*A:7750>config>lag$ mode access
*A:7750>config>lag$ port 2/2/19
*A:7750>config>lag$ port 2/2/20
*A:7750>config>lag$ no shutdown
Cisco 2950:
2950#conf t
Enter configuration commands, one per line. End with CNTL/Z.
2950(config)#int range fa0/19 - 20
2950(config-if-range)#switchport mode access
2950(config-if-range)#channel-group 1 mode on
Creating a port-channel interface Port-channel 1
2950(config-if-range)#no shut
In this setup, as soon as a port becomes physically up it becomes a member of the LAG bundle. The only, fairly minor, advantage of this is that the configuration is very simple. The disadvantage is that there is no method to detect any kind of cabling or configuration errors.
Note: The lack of any kind of misconfiguration detection makes static LAGs very dangerous to deploy in production networks.
LACP must be configured in one of two modes:
Alcatel-Lucent 7750:
A:7750# configure port 2/2/[19..20] ethernet mode access
*A:7750# configure port 2/2/[19..20] ethernet autonegotiate limited
*A:7750# configure port 2/2/[19..20] no shutdown
*A:7750# configure lag 1
*A:7750>config>lag$ mode access
*A:7750>config>lag$ lacp active
*A:7750>config>lag$ port 2/2/19
*A:7750>config>lag$ port 2/2/20
*A:7750>config>lag$ no shutdown
Cisco 2950:
2950#conf t
Enter configuration commands, one per line. End with CNTL/Z.
2950(config)#int range fa0/19 - 20
2950(config-if-range)#switchport mode access
2950(config-if-range)#channel-group 1 mode active
Creating a port-channel interface Port-channel 1
2950(config-if-range)#no shut
There is, of course, a lot more going on behind the scenes but most parameters assume default values which are perfectly acceptable for most situations.
The first and arguably most fundamental concept is that of actors and partners. One of the really nice debugging features of LACP is that it echoes the parameters it receives back to the sender. To avoid confusion, the term actor is used to designate the parameters and flags pertaining to the sending node, while the term partner is used to designate the sending node's view of its peer's parameters and flags.
Per System:
Each network device has a LACP System ID. This is a 48 bit value which generally defaults to the chassis MAC address. The system ID is sent within every LACPDU and makes it easy to check that a LAG goes to the device you expect.
Each device also has a 16 bit LACP System Priority. The system priority is used to decide which system's port priorities are used to decide active / standby in the event that the two peers disagree. Lowest priority wins.
Per LAG:
Each LAG on a system will have a unique 16 bit LACP key, the purpose of which is to differentiate one LAG from another within the protocol. This number is locally significant and may or may not match between peers.The main purpose of the LACP key is to allow a system to detect cabling faults - if different LACP keys are received on members of the same LAG then we are connected to two different LAGs at the far end and, obviously, aggregating those together would be a bad idea.
LACP Flags:
The following flags are used to communicate state between systems:
In the scenario above, a unidirectional link failure has occurred so that LACPDUs are being lost in the direction A to B, but the ports remain physically up. LACPDUs that are lost are indicated in grey. In this situation, system B responds to the loss of three consecutive LACPDUs by clearing its synchronisation, collecting and distributing flags and setting its expired flag. System A responds immediately to the loss of sync by clearing its synchronisation, collecting and distributing flags.
I recommend starting with the basics and working up:
A:7750# show lag 1 detail
===============================================================================
LAG Details
===============================================================================
Description : N/A
-------------------------------------------------------------------------------
Details
-------------------------------------------------------------------------------
Lag-id : 1 Mode : access
Adm : up Opr : up
Thres. Exceeded Cnt : 2 Port Threshold : 0
Thres. Last Cleared : 12/21/2012 10:59:59 Threshold Action : down
Dynamic Cost : false Encap Type : null
Configured Address : 00:0a:aa:2e:af:ea Lag-IfIndex : 1342177281
Hardware Address : 00:0a:aa:2e:af:ea Adapt Qos (access) : distribute
Hold-time Down : 0.0 sec Port Type : standard
Per FP Ing Queuing : disabled
LACP : enabled Mode : active
LACP Transmit Intvl : fast LACP xmit stdby : enabled
Selection Criteria : highest-count Slave-to-partner : disabled
Number of sub-groups: 1 Forced : -
System Id : 00:0a:aa:2e:af:ea System Priority : 40960
Admin Key : 32777 Oper Key : 32777
Prtr System Id : 00:12:da:ab:fe:21 Prtr System Priority : 32768
Prtr Oper Key : 1
Standby Signaling : lacp
-------------------------------------------------------------------------------
Port-id Adm Act/Stdby Opr Primary Sub-group Forced Prio
-------------------------------------------------------------------------------
2/2/19 up active up yes 1 - 32768
2/2/20 up active up 1 - 32768
-------------------------------------------------------------------------------
Port-id Role Exp Def Dist Col Syn Aggr Timeout Activity
-------------------------------------------------------------------------------
2/2/19 actor No No Yes Yes Yes Yes Yes Yes
2/2/19 partner No No Yes Yes Yes Yes No Yes
2/2/20 actor No No Yes Yes Yes Yes Yes Yes
2/2/20 partner No No Yes Yes Yes Yes No Yes
===============================================================================
A:7750#
In this output you can see the local and remote flags, system IDs, system priorities and keys in use, whether the underlying ports are functioning and, if sub-groups are in use, whether local ports are active or standby. Note also that it shows you which port in the LAG is primary - if you want to edit anything such as MTU, QoS, etc, then you need to do it on the primary port. Your changes will then be pushed to the other ports automatically.
If you need to verify that LACPDUs are being received, you can use "debug lag [lag-id number] [port port-id] pkt". This will produce a debug message for every LACPDU sent or received, optionally filtered by LAG or by individual port:
A:7750# debug lag lag-id 1 pkt
980 2012/12/21 21:23:56.73 GMT MINOR: DEBUG #2001 Base LAG
"LAG: PKT
Xmit LACPDU on PortId 2/2/19"
981 2012/12/21 21:23:56.80 GMT MINOR: DEBUG #2001 Base LAG
"LAG: PKT
LACPDU rcvd on PortId 2/2/19"
A little light on detail, admittedly, but enough to prove whether they are arriving or not.
For more interactive debugging, a better choice might be "debug lag [lag-id number] [port port-id] sm" to indicate what is happening to the state machine for a given lag or port:
A:7750# debug lag lag-id 1 sm
852 2012/12/21 18:55:37.67 GMT MINOR: DEBUG #2001 Base LAG
"LAG: SM
LagId 1: partner oper state bits changed on member 2/2/20 : [sync FALSE -> TRUE]
"
853 2012/12/21 18:55:37.67 GMT MINOR: DEBUG #2001 Base LAG
"LAG: SM
LagId 1 mem. 2/2/20 :triggerMap 0 -> e after Rx SM"
854 2012/12/21 18:55:37.67 GMT MINOR: DEBUG #2001 Base LAG
"LAG: SM
LagId 1 mem. 2/2/20 :running selection logic"
855 2012/12/21 18:55:37.67 GMT MINOR: DEBUG #2001 Base LAG
"LAG: SM
LagId 1 mem. 2/2/20 :MUX SM ATTACHED->COLLECTING_DISTRIBUTING"
The above is quite verbose as it generates state machine transitions every time a LACPDU is sent or received, but it is really the best way to troubleshoot state transitions.
2950#show etherchannel
Channel-group listing:
----------------------
Group: 1
----------
Group state = L2
Ports: 2 Maxports = 16
Port-channels: 1 Max Port-channels = 16
Protocol: LACP
2950#
To find the local LACP system ID, use "show lacp sys-id":
2950#show lacp sys-id
32768,0012.da12.abcd
Note that the part before the comma is actually the system priority.
Useful information about the remote device (our partner) can be found using "show lacp neighbor":
2950#show lacp neighbor
Flags: S - Device is requesting Slow LACPDUs
F - Device is requesting Fast LACPDUs
A - Device is in Active mode P - Device is in Passive mode
Channel group 1 neighbors
Partner's information:
LACP port Oper Port Port
Port Flags Priority Dev ID Age Key Number State
Fa0/19 FA 32768 0003.abcd.aaa1 3s 0x8009 0x8894 0x3F
Fa0/20 FA 32768 0003.abcd.aaa1 3s 0x8009 0x8893 0x3F
This shows some useful information such as the timeout and activity flags, plus it allows you to verify the LACP keys being received on each port for consistency. If you need more information, add the "detail" keyword:
2950#show lacp neighbor detail
Flags: S - Device is requesting Slow LACPDUs
F - Device is requesting Fast LACPDUs
A - Device is in Active mode P - Device is in Passive mode
Channel group 1 neighbors
Partner's information:
Partner Partner Partner
Port System ID Port Number Age Flags
Fa0/19 40960,0003.abcd.aaa1 0x8894 11s FA
LACP Partner Partner Partner
Port Priority Oper Key Port State
32768 0x8009 0x3F
Port State Flags Decode:
Activity: Timeout: Aggregation: Synchronization:
Active Long Yes Yes
Collecting: Distributing: Defaulted: Expired:
Yes Yes No No Partner Partner Partner
Port System ID Port Number Age Flags
Fa0/20 40960,0003.abcd.aaa1 0x8893 11s FA
LACP Partner Partner Partner
Port Priority Oper Key Port State
32768 0x8009 0x3F
Port State Flags Decode:
Activity: Timeout: Aggregation: Synchronization:
Active Long Yes Yes
Collecting: Distributing: Defaulted: Expired:
Yes Yes No No2950#
Note that contrary to what you might expect, the "Port State Flags Decode" sections (highlighted in red) actually refer to the local flags rather than those being sent by the remote device. As you can see, in this example the remote end is requesting fast timeouts but the local end is requesting slow.
A fairly detailed overview of the local and remote state can be seen using the "show etherchannel detail" command:
2950#show etherchannel detail
Channel-group listing:
----------------------
Group: 1
----------
Group state = L2
Ports: 2 Maxports = 16
Port-channels: 1 Max Port-channels = 16
Protocol: LACP
Ports in the group:
-------------------
Port: Fa0/19
------------
Port state = Up Mstr In-Bndl
Channel group = 1 Mode = Active Gcchange = -
Port-channel = Po1 GC = - Pseudo port-channel = Po1
Port index = 0 Load = 0x00 Protocol = LACP
Flags: S - Device is sending Slow LACPDUs F - Device is sending fast LACPDUs.
A - Device is in active mode. P - Device is in passive mode.
Local information:
LACP port Admin Oper Port Port
Port Flags State Priority Key Key Number State
Fa0/19 SA bndl 32768 0x1 0x1 0x13 0x3D
Partner's information:
LACP port Oper Port Port
Port Flags Priority Dev ID Age Key Number State
Fa0/19 FA 32768 0003.abcd.aaa1 26s 0x8009 0x8894 0x3F
Age of the port in the current state: 0d:00h:00m:24s
Port: Fa0/20
------------
Port state = Up Mstr In-Bndl
Channel group = 1 Mode = Active Gcchange = -
Port-channel = Po1 GC = - Pseudo port-channel = Po1
Port index = 0 Load = 0x00 Protocol = LACP
Flags: S - Device is sending Slow LACPDUs F - Device is sending fast LACPDUs.
A - Device is in active mode. P - Device is in passive mode.
Local information:
LACP port Admin Oper Port Port
Port Flags State Priority Key Key Number State
Fa0/20 SA bndl 32768 0x1 0x1 0x14 0x3D
Partner's information:
LACP port Oper Port Port
Port Flags Priority Dev ID Age Key Number State
Fa0/20 FA 32768 0003.abcd.aaa1 0s 0x8009 0x8893 0x3F
Age of the port in the current state: 0d:00h:00m:27s
Port-channels in the group:
---------------------------
Port-channel: Po1 (Primary Aggregator)
------------
Age of the Port-channel = 0d:00h:00m:50s
Logical slot/port = 1/0 Number of ports = 2
HotStandBy port = null
Port state = Port-channel Ag-Inuse
Protocol = LACP
Ports in the Port-channel:
Index Load Port EC state No of bits
------+------+------+------------------+-----------
0 00 Fa0/19 Active 0
0 00 Fa0/20 Active 0
Time since last port bundled: 0d:00h:00m:28s Fa0/19
2950#
For more interactive troubleshooting, there are debug commands present but be careful - on my (admittedly ancient) switch, LACP debugs were only available chassis-wide and were pretty verbose. The packet level debug ("debug lacp packet") for a single LACPDU is shown below:
2950#debug lacp packet
Link Aggregation Control Protocol packet debugging is on
19w0d: LACP :lacp_bugpak: Send LACP-PDU packet via Fa0/20
19w0d: LACP : packet size: 124
19w0d: LACP: pdu: subtype: 1, version: 1
19w0d: LACP: Act: tlv:1, tlv-len:20, key:0x1, p-pri:0x8000, p:0x14, p-state:0x3D,
s-pri:0x8000, s-mac:0012.da12.abcd
19w0d: LACP: Part: tlv:2, tlv-len:20, key:0x8009, p-pri:0x8000, p:0x8893, p-state:0x3F,
s-pri:0xA000, s-mac:0003.abcd.aaa1
19w0d: LACP: col-tlv:3, col-tlv-len:16, col-max-d:0x8000
19w0d: LACP: term-tlv:0 termr-tlv-len:0
Pretty detailed, so watch your CPU!
A rather useful alternative is "debug lacp fsm" - again this provides a very high volume of output but is the only practical way to see detailed info on state transitions via CLI:
2950#debug lacp fsm
Link Aggregation Control Protocol fsm debugging is on
19w0d: lacp_mux Fa0/19 - mux: during state WAITING, got event 4(ready)
19w0d: @@@ lacp_mux Fa0/19 - mux: WAITING -> ATTACHED
19w0d: LACP: Fa0/19 lacp_action_mx_attached entered
19w0d: LACP: Fa0/19 Attaching mux to aggregator
19w0d: lacp_mux Fa0/19 - mux: during state ATTACHED, got event 5(in_sync)
19w0d: @@@ lacp_mux Fa0/19 - mux: ATTACHED -> COLLECTING_DISTRIBUTING
19w0d: LACP: Fa0/19 lacp_action_mx_collecting_distributing entered
19w0d: LACP: Fa0/19 Enabling collecting and distributing
19w0d: lacp_rx Fa0/19 - rx: during state CURRENT, got event 5(recv_lacpdu)
19w0d: @@@ lacp_rx Fa0/19 - rx: CURRENT
2950# -> CURRENT
19w0d: LACP: Fa0/19 lacp_action_rx_current entered
19w0d: lacp_mux Fa0/19 - mux: during state COLLECTING_DISTRIBUTING, got event 5(in_sync) (ignored)
19w0d: lacp_ptx Fa0/19 - ptx: during state FAST_PERIODIC, got event 3(pt_expired)
19w0d: @@@ lacp_ptx Fa0/19 - ptx: FAST_PERIODIC -> PERIODIC_TX
19w0d: LACP: Fa0/19 lacp_action_ptx_fast_periodic_exit entered
Very verbose indeed. Be careful with CPU load.
Frankly, if you can, it is better to troubleshoot with a port mirror and packet capture. The protocol is very good at telling you what it is doing as in addition to the periodic LACPDUs, triggered updates are generated whenever anything material such as sync state changes. Use a capture filter (see previous blog post "tshark one-liners" for more info) when capturing on links with a lot of user data.
The timeout value does not have to agree between peers. While it is not a recommended configuration, it is possible to bring up a LAG with one end sending every second and the other sending every 30 seconds. In this case, the end requesting fast timers will detect a silent failure in under 3 seconds while the end requesting slow timers will take up to 90 seconds to detect the same fault.
The configuration of sub-groups (and even whether to use sub-groups) does not have to agree between peers. The failure characteristics are often better if one end is configured with active / standby subgroups while the other is configured without any subgroups. In that case, as soon as the end with sub-groups decides to switch a new sub-group to active, the partner is already sending sync on all available links and will immediately put traffic onto the newly active sub-group.
The Alcatel-Lucent 7750 (and probably others, I've just not looked) sends an out of sync LACPDU upon detecting a LAG member go physically down. Normally that won't get through to the other end but in the event of a single fibre failure, for example, it serves tot inform the partner that the link is no longer usable and should be removed from the LAG bundle. This improves failover times considerably in the case where link loss is not forwarded (tens or hundreds of milliseconds as compared to 2 - 3 seconds).
The basic concept of the LAG is that multiple physical ports are combined into one logical bundle. This provides benefits including:
- Increased capacity - traffic may be balanced across the member ports to provide increased aggregate throughput
- Link redundancy - the LAG bundle can survive the loss of one or more member links
Load Balancing Operation
One important point to bear in mind with LAGs is that traffic is not dynamically assigned across member links but rather is "sprayed" using a deterministic hash algorithm. Depending on the platform and configuration, a number of parameters may feed into the algorithm including:- Source and/or destination MAC address
- Source and/or destination IP address
- Source and/or destination TCP / UDP port numbers
- Ingress interface
- Service ID or MPLS label
- System specific information (chassis MAC or system IP)
- Order is maintained for frames within a flow - the different member links, particularly on a WAN, may have different delay characteristics. If frames for a single flow were sprayed onto multiple member links, frames could be re-ordered in transit.
- Traffic for a single flow cannot exceed the bandwidth of a single member link.
- Traffic balance across member links is largely dependant on the diversity of the offered traffic. If the number of flows is low, some links may be saturated while others are under-utilised. The same effect can be seen if there are many flows but load is proportionally concentrated in just a few of them.
- When traffic passes through multiple hops using LAGs at each stage, polarisation can occur. This is where repeated application of the same hash function at each hop causes traffic to become unevenly distributed across the links. One link may be running at 100% and dropping excess traffic while another is almost idle. Passing system specific information into the algorithm is designed to mitigate this by ensuring that each hop hashes in a slightly different way.
- Upstream and downstream traffic for a single flow will not necessarily traverse the same link. Since the devices at each end of a LAG hash traffic independently, there is no guarantee that both legs of a conversation will pass along the same member link.
Active / Standby Operation
In addition to the "normal" load balancing mode of operation, it is also possible to configure a LAG to operate in an active/standby fashion. In fact, it is possible to combine the two modes and have an arbitrary number of links active and passing traffic while an arbitrary number remain on standby pending a fault on the active link(s).Active / standby groups are generally used when resilience is required, but it is not desirable for the LAG to pass more than a certain amount of traffic or for the available bandwidth to vary. Typical use cases are service provider environments where the customer only pays for a certain bandwidth and corporate networks with highly over-subscribed core.
Rules for LAGs
In order to be able to aggregate ports together certain rules must be obeyed. Fundamentally, the member ports must be homogeneous, but more specifically every member port must have the agree on the following:- Speed & Duplex - Since traffic is distributed by a simple hash, it is not possible to combine links of different speeds in the same bundle.
- Encapsulation - i.e. all ports must use the same number of 802.1Q VLAN tags. For switches this means they must all be access or all be trunk. For routers such as the 7750 this means that the Ethernet encap type (null, dot1q or qinq) must agree between members. For switches in access mode, all member ports must be in the same VLAN.
- For the 7750, the port type (access, network or hybrid) must agree across members and for the LAG
- MTU - all member port MTUs must match and for Cisco switches, the same MTU must be configured on the port channel.
Static Configuration
The simplest method of building a LAG does not involve any signalling or protocols at all and simply specifies the member ports to be aggregated. Here's an example of doing that on two different platforms:Alcatel-Lucent 7750:
A:7750# configure port 2/2/[19..20] ethernet mode access
*A:7750# configure port 2/2/[19..20] ethernet autonegotiate limited
*A:7750# configure port 2/2/[19..20] no shutdown
*A:7750# configure lag 1
*A:7750>config>lag$ mode access
*A:7750>config>lag$ port 2/2/19
*A:7750>config>lag$ port 2/2/20
*A:7750>config>lag$ no shutdown
Cisco 2950:
2950#conf t
Enter configuration commands, one per line. End with CNTL/Z.
2950(config)#int range fa0/19 - 20
2950(config-if-range)#switchport mode access
2950(config-if-range)#channel-group 1 mode on
Creating a port-channel interface Port-channel 1
2950(config-if-range)#no shut
In this setup, as soon as a port becomes physically up it becomes a member of the LAG bundle. The only, fairly minor, advantage of this is that the configuration is very simple. The disadvantage is that there is no method to detect any kind of cabling or configuration errors.
Note: The lack of any kind of misconfiguration detection makes static LAGs very dangerous to deploy in production networks.
LACP
LACP is the standards based protocol used to signal LAGs. It detects and protects the network from a variety of misconfiguration and fault conditions, ensuring that links are only aggregated into a bundle if they are consistently configured and cabled.LACP must be configured in one of two modes:
- Active mode - the device immediately sends LACP messages (LACPDUs) when the port comes up and must reach an agreement with the attached port before traffic will pass.
- Passive mode - the device does not generate LACPDUs until it receives them. If no LACPDUs are received then the port aggregates as though statically configured. If LACPDUs are received then an agreement must be reached with the peer before traffic will pass.
Minimal LACP configuration
The minimal configuration is still very straightforward, requiring little additional CLI:Alcatel-Lucent 7750:
A:7750# configure port 2/2/[19..20] ethernet mode access
*A:7750# configure port 2/2/[19..20] ethernet autonegotiate limited
*A:7750# configure port 2/2/[19..20] no shutdown
*A:7750# configure lag 1
*A:7750>config>lag$ mode access
*A:7750>config>lag$ lacp active
*A:7750>config>lag$ port 2/2/19
*A:7750>config>lag$ port 2/2/20
*A:7750>config>lag$ no shutdown
Cisco 2950:
2950#conf t
Enter configuration commands, one per line. End with CNTL/Z.
2950(config)#int range fa0/19 - 20
2950(config-if-range)#switchport mode access
2950(config-if-range)#channel-group 1 mode active
Creating a port-channel interface Port-channel 1
2950(config-if-range)#no shut
There is, of course, a lot more going on behind the scenes but most parameters assume default values which are perfectly acceptable for most situations.
LACP Terms and Parameters
There are a number of LACP-specific terms and parameter names that must be understood in order to make sense of LACP debug output and packet traces.The first and arguably most fundamental concept is that of actors and partners. One of the really nice debugging features of LACP is that it echoes the parameters it receives back to the sender. To avoid confusion, the term actor is used to designate the parameters and flags pertaining to the sending node, while the term partner is used to designate the sending node's view of its peer's parameters and flags.
Per System:
Each network device has a LACP System ID. This is a 48 bit value which generally defaults to the chassis MAC address. The system ID is sent within every LACPDU and makes it easy to check that a LAG goes to the device you expect.
Each device also has a 16 bit LACP System Priority. The system priority is used to decide which system's port priorities are used to decide active / standby in the event that the two peers disagree. Lowest priority wins.
Per LAG:
Each LAG on a system will have a unique 16 bit LACP key, the purpose of which is to differentiate one LAG from another within the protocol. This number is locally significant and may or may not match between peers.The main purpose of the LACP key is to allow a system to detect cabling faults - if different LACP keys are received on members of the same LAG then we are connected to two different LAGs at the far end and, obviously, aggregating those together would be a bad idea.
LACP Flags:
The following flags are used to communicate state between systems:
- Activity - Set to indicate LACP active mode, cleared to indicate passive mode
- Timeout - Set to indicate the device is requesting a fast (1s) transmit interval of its partner, cleared to indicate that a slow (30s) transmit interval is being requested.
- Aggregation - Set to indicate that the port is configured for aggregation (typically always set)
- Synchronisation - Set to indicate that the system is ready and willing to use this link in the bundle to carry traffic. Cleared to indicate the link is not usable or is in standby mode.
- Collecting - Set to indicate that traffic received on this interface will be processed by the device. Cleared otherwise.
- Distributing - Set to indicate that the device is using this link transmit traffic. Cleared otherwise.
- Expired - Set to indicate that no LACPDUs have been received by the device during the past 3 intervals. Cleared when at least one LACPDU has been received within the past three intervals.
- Defaulted - When set, indicates that no LACPDUs have been received during the past 6 intervals. Cleared when at least one LACPDU has been received within the past 6 intervals. Once the defaulted flag transitions to set, any stored partner information is flushed.
Bringing Links into Service
Assuming that the local configuration is consistent and LACPDUs are being exchanged across the link, the following flow chart roughly describes how to decide the value of the synchronisation, distributing and collecting flags.
If by the end your collecting / distributing flags are set then the link will be used for sending and receiving traffic. If not, it won't.
LACP Fault Detection
LACP can detect almost every conceivable patching error and will refuse to aggregate when that would be inappropriate. Following are a number of improper LAG topologies along with a description of how LACP detects and protects the network against them.Split LAG
In the above scenario, LACP inspects the system ID field of incoming LACPDUs and refuses to aggregate any links whose system ID does not match that of the existing member(s).Crossed LAGs
In the above scenario, LACP detects the cabling fault by inspecting the key ID on the incoming LACPDUs and refuses to aggregate any links whose key does not match that of the existing member(s).Looped LAG
In the above scenario, LACP detects the cabling fault by inspecting the system ID and key of the incoming LACPDU. Some systems (e.g. Alcatel-Lucent 7750) allow different LAGs to be interconnected on the same chassis, however it is never allowed for two member ports of the same LAG to be connected.
Unidirectional Link Failure
In the scenario above, a unidirectional link failure has occurred so that LACPDUs are being lost in the direction A to B, but the ports remain physically up. LACPDUs that are lost are indicated in grey. In this situation, system B responds to the loss of three consecutive LACPDUs by clearing its synchronisation, collecting and distributing flags and setting its expired flag. System A responds immediately to the loss of sync by clearing its synchronisation, collecting and distributing flags.
LACP Troubleshooting
The most important part of troubleshooting LAGs is to properly understand the meaning and purpose of all the parameters, particularly the flags, before you begin. After that point, it is just a matter of knowing what CLI commands will show you the required information.I recommend starting with the basics and working up:
- Are the member ports physically up?
- Are all member ports configured consistently (see LAG Rules above)?
- Can you be sure the topology is as we expect?
- Use LLDP or CDP if available
- Use system ID, key and port ID values from the LACPDUs otherwise
- Determine which end is unhappy (hint, it won't be sending sync).
- Verify that messages are passing bi-directionally and are not being blocked by any kind of filter (hint, check that the partner details are populated on LACPDUs)
Alcatel-Lucent 7750
To get almost all the information you could ever want, use "show lag [number] detail":A:7750# show lag 1 detail
===============================================================================
LAG Details
===============================================================================
Description : N/A
-------------------------------------------------------------------------------
Details
-------------------------------------------------------------------------------
Lag-id : 1 Mode : access
Adm : up Opr : up
Thres. Exceeded Cnt : 2 Port Threshold : 0
Thres. Last Cleared : 12/21/2012 10:59:59 Threshold Action : down
Dynamic Cost : false Encap Type : null
Configured Address : 00:0a:aa:2e:af:ea Lag-IfIndex : 1342177281
Hardware Address : 00:0a:aa:2e:af:ea Adapt Qos (access) : distribute
Hold-time Down : 0.0 sec Port Type : standard
Per FP Ing Queuing : disabled
LACP : enabled Mode : active
LACP Transmit Intvl : fast LACP xmit stdby : enabled
Selection Criteria : highest-count Slave-to-partner : disabled
Number of sub-groups: 1 Forced : -
System Id : 00:0a:aa:2e:af:ea System Priority : 40960
Admin Key : 32777 Oper Key : 32777
Prtr System Id : 00:12:da:ab:fe:21 Prtr System Priority : 32768
Prtr Oper Key : 1
Standby Signaling : lacp
-------------------------------------------------------------------------------
Port-id Adm Act/Stdby Opr Primary Sub-group Forced Prio
-------------------------------------------------------------------------------
2/2/19 up active up yes 1 - 32768
2/2/20 up active up 1 - 32768
-------------------------------------------------------------------------------
Port-id Role Exp Def Dist Col Syn Aggr Timeout Activity
-------------------------------------------------------------------------------
2/2/19 actor No No Yes Yes Yes Yes Yes Yes
2/2/19 partner No No Yes Yes Yes Yes No Yes
2/2/20 actor No No Yes Yes Yes Yes Yes Yes
2/2/20 partner No No Yes Yes Yes Yes No Yes
===============================================================================
A:7750#
In this output you can see the local and remote flags, system IDs, system priorities and keys in use, whether the underlying ports are functioning and, if sub-groups are in use, whether local ports are active or standby. Note also that it shows you which port in the LAG is primary - if you want to edit anything such as MTU, QoS, etc, then you need to do it on the primary port. Your changes will then be pushed to the other ports automatically.
If you need to verify that LACPDUs are being received, you can use "debug lag [lag-id number] [port port-id] pkt". This will produce a debug message for every LACPDU sent or received, optionally filtered by LAG or by individual port:
A:7750# debug lag lag-id 1 pkt
980 2012/12/21 21:23:56.73 GMT MINOR: DEBUG #2001 Base LAG
"LAG: PKT
Xmit LACPDU on PortId 2/2/19"
981 2012/12/21 21:23:56.80 GMT MINOR: DEBUG #2001 Base LAG
"LAG: PKT
LACPDU rcvd on PortId 2/2/19"
A little light on detail, admittedly, but enough to prove whether they are arriving or not.
For more interactive debugging, a better choice might be "debug lag [lag-id number] [port port-id] sm" to indicate what is happening to the state machine for a given lag or port:
A:7750# debug lag lag-id 1 sm
852 2012/12/21 18:55:37.67 GMT MINOR: DEBUG #2001 Base LAG
"LAG: SM
LagId 1: partner oper state bits changed on member 2/2/20 : [sync FALSE -> TRUE]
"
853 2012/12/21 18:55:37.67 GMT MINOR: DEBUG #2001 Base LAG
"LAG: SM
LagId 1 mem. 2/2/20 :triggerMap 0 -> e after Rx SM"
854 2012/12/21 18:55:37.67 GMT MINOR: DEBUG #2001 Base LAG
"LAG: SM
LagId 1 mem. 2/2/20 :running selection logic"
855 2012/12/21 18:55:37.67 GMT MINOR: DEBUG #2001 Base LAG
"LAG: SM
LagId 1 mem. 2/2/20 :MUX SM ATTACHED->COLLECTING_DISTRIBUTING"
The above is quite verbose as it generates state machine transitions every time a LACPDU is sent or received, but it is really the best way to troubleshoot state transitions.
Cisco 2950
There are a few LACP related show commands on IOS and the useful information is spread between them. Starting at the simple end, a high level overview of the LAGs on the system can be obtained using the command "show etherchannel":2950#show etherchannel
Channel-group listing:
----------------------
Group: 1
----------
Group state = L2
Ports: 2 Maxports = 16
Port-channels: 1 Max Port-channels = 16
Protocol: LACP
2950#
To find the local LACP system ID, use "show lacp sys-id":
2950#show lacp sys-id
32768,0012.da12.abcd
Note that the part before the comma is actually the system priority.
Useful information about the remote device (our partner) can be found using "show lacp neighbor":
2950#show lacp neighbor
Flags: S - Device is requesting Slow LACPDUs
F - Device is requesting Fast LACPDUs
A - Device is in Active mode P - Device is in Passive mode
Channel group 1 neighbors
Partner's information:
LACP port Oper Port Port
Port Flags Priority Dev ID Age Key Number State
Fa0/19 FA 32768 0003.abcd.aaa1 3s 0x8009 0x8894 0x3F
Fa0/20 FA 32768 0003.abcd.aaa1 3s 0x8009 0x8893 0x3F
This shows some useful information such as the timeout and activity flags, plus it allows you to verify the LACP keys being received on each port for consistency. If you need more information, add the "detail" keyword:
2950#show lacp neighbor detail
Flags: S - Device is requesting Slow LACPDUs
F - Device is requesting Fast LACPDUs
A - Device is in Active mode P - Device is in Passive mode
Channel group 1 neighbors
Partner's information:
Partner Partner Partner
Port System ID Port Number Age Flags
Fa0/19 40960,0003.abcd.aaa1 0x8894 11s FA
LACP Partner Partner Partner
Port Priority Oper Key Port State
32768 0x8009 0x3F
Port State Flags Decode:
Activity: Timeout: Aggregation: Synchronization:
Active Long Yes Yes
Collecting: Distributing: Defaulted: Expired:
Yes Yes No No Partner Partner Partner
Port System ID Port Number Age Flags
Fa0/20 40960,0003.abcd.aaa1 0x8893 11s FA
LACP Partner Partner Partner
Port Priority Oper Key Port State
32768 0x8009 0x3F
Port State Flags Decode:
Activity: Timeout: Aggregation: Synchronization:
Active Long Yes Yes
Collecting: Distributing: Defaulted: Expired:
Yes Yes No No2950#
Note that contrary to what you might expect, the "Port State Flags Decode" sections (highlighted in red) actually refer to the local flags rather than those being sent by the remote device. As you can see, in this example the remote end is requesting fast timeouts but the local end is requesting slow.
A fairly detailed overview of the local and remote state can be seen using the "show etherchannel detail" command:
2950#show etherchannel detail
Channel-group listing:
----------------------
Group: 1
----------
Group state = L2
Ports: 2 Maxports = 16
Port-channels: 1 Max Port-channels = 16
Protocol: LACP
Ports in the group:
-------------------
Port: Fa0/19
------------
Port state = Up Mstr In-Bndl
Channel group = 1 Mode = Active Gcchange = -
Port-channel = Po1 GC = - Pseudo port-channel = Po1
Port index = 0 Load = 0x00 Protocol = LACP
Flags: S - Device is sending Slow LACPDUs F - Device is sending fast LACPDUs.
A - Device is in active mode. P - Device is in passive mode.
Local information:
LACP port Admin Oper Port Port
Port Flags State Priority Key Key Number State
Fa0/19 SA bndl 32768 0x1 0x1 0x13 0x3D
Partner's information:
LACP port Oper Port Port
Port Flags Priority Dev ID Age Key Number State
Fa0/19 FA 32768 0003.abcd.aaa1 26s 0x8009 0x8894 0x3F
Age of the port in the current state: 0d:00h:00m:24s
Port: Fa0/20
------------
Port state = Up Mstr In-Bndl
Channel group = 1 Mode = Active Gcchange = -
Port-channel = Po1 GC = - Pseudo port-channel = Po1
Port index = 0 Load = 0x00 Protocol = LACP
Flags: S - Device is sending Slow LACPDUs F - Device is sending fast LACPDUs.
A - Device is in active mode. P - Device is in passive mode.
Local information:
LACP port Admin Oper Port Port
Port Flags State Priority Key Key Number State
Fa0/20 SA bndl 32768 0x1 0x1 0x14 0x3D
Partner's information:
LACP port Oper Port Port
Port Flags Priority Dev ID Age Key Number State
Fa0/20 FA 32768 0003.abcd.aaa1 0s 0x8009 0x8893 0x3F
Age of the port in the current state: 0d:00h:00m:27s
Port-channels in the group:
---------------------------
Port-channel: Po1 (Primary Aggregator)
------------
Age of the Port-channel = 0d:00h:00m:50s
Logical slot/port = 1/0 Number of ports = 2
HotStandBy port = null
Port state = Port-channel Ag-Inuse
Protocol = LACP
Ports in the Port-channel:
Index Load Port EC state No of bits
------+------+------+------------------+-----------
0 00 Fa0/19 Active 0
0 00 Fa0/20 Active 0
Time since last port bundled: 0d:00h:00m:28s Fa0/19
2950#
For more interactive troubleshooting, there are debug commands present but be careful - on my (admittedly ancient) switch, LACP debugs were only available chassis-wide and were pretty verbose. The packet level debug ("debug lacp packet") for a single LACPDU is shown below:
2950#debug lacp packet
Link Aggregation Control Protocol packet debugging is on
19w0d: LACP :lacp_bugpak: Send LACP-PDU packet via Fa0/20
19w0d: LACP : packet size: 124
19w0d: LACP: pdu: subtype: 1, version: 1
19w0d: LACP: Act: tlv:1, tlv-len:20, key:0x1, p-pri:0x8000, p:0x14, p-state:0x3D,
s-pri:0x8000, s-mac:0012.da12.abcd
19w0d: LACP: Part: tlv:2, tlv-len:20, key:0x8009, p-pri:0x8000, p:0x8893, p-state:0x3F,
s-pri:0xA000, s-mac:0003.abcd.aaa1
19w0d: LACP: col-tlv:3, col-tlv-len:16, col-max-d:0x8000
19w0d: LACP: term-tlv:0 termr-tlv-len:0
Pretty detailed, so watch your CPU!
A rather useful alternative is "debug lacp fsm" - again this provides a very high volume of output but is the only practical way to see detailed info on state transitions via CLI:
2950#debug lacp fsm
Link Aggregation Control Protocol fsm debugging is on
19w0d: lacp_mux Fa0/19 - mux: during state WAITING, got event 4(ready)
19w0d: @@@ lacp_mux Fa0/19 - mux: WAITING -> ATTACHED
19w0d: LACP: Fa0/19 lacp_action_mx_attached entered
19w0d: LACP: Fa0/19 Attaching mux to aggregator
19w0d: lacp_mux Fa0/19 - mux: during state ATTACHED, got event 5(in_sync)
19w0d: @@@ lacp_mux Fa0/19 - mux: ATTACHED -> COLLECTING_DISTRIBUTING
19w0d: LACP: Fa0/19 lacp_action_mx_collecting_distributing entered
19w0d: LACP: Fa0/19 Enabling collecting and distributing
19w0d: lacp_rx Fa0/19 - rx: during state CURRENT, got event 5(recv_lacpdu)
19w0d: @@@ lacp_rx Fa0/19 - rx: CURRENT
2950# -> CURRENT
19w0d: LACP: Fa0/19 lacp_action_rx_current entered
19w0d: lacp_mux Fa0/19 - mux: during state COLLECTING_DISTRIBUTING, got event 5(in_sync) (ignored)
19w0d: lacp_ptx Fa0/19 - ptx: during state FAST_PERIODIC, got event 3(pt_expired)
19w0d: @@@ lacp_ptx Fa0/19 - ptx: FAST_PERIODIC -> PERIODIC_TX
19w0d: LACP: Fa0/19 lacp_action_ptx_fast_periodic_exit entered
Very verbose indeed. Be careful with CPU load.
Frankly, if you can, it is better to troubleshoot with a port mirror and packet capture. The protocol is very good at telling you what it is doing as in addition to the periodic LACPDUs, triggered updates are generated whenever anything material such as sync state changes. Use a capture filter (see previous blog post "tshark one-liners" for more info) when capturing on links with a lot of user data.
Oddities
The value of the timeout flag sent by a device indicates the interval at which it expects the partner to send LACPDUs. The partner then should honour the request and send at the indicated interval.The timeout value does not have to agree between peers. While it is not a recommended configuration, it is possible to bring up a LAG with one end sending every second and the other sending every 30 seconds. In this case, the end requesting fast timers will detect a silent failure in under 3 seconds while the end requesting slow timers will take up to 90 seconds to detect the same fault.
The configuration of sub-groups (and even whether to use sub-groups) does not have to agree between peers. The failure characteristics are often better if one end is configured with active / standby subgroups while the other is configured without any subgroups. In that case, as soon as the end with sub-groups decides to switch a new sub-group to active, the partner is already sending sync on all available links and will immediately put traffic onto the newly active sub-group.
The Alcatel-Lucent 7750 (and probably others, I've just not looked) sends an out of sync LACPDU upon detecting a LAG member go physically down. Normally that won't get through to the other end but in the event of a single fibre failure, for example, it serves tot inform the partner that the link is no longer usable and should be removed from the LAG bundle. This improves failover times considerably in the case where link loss is not forwarded (tens or hundreds of milliseconds as compared to 2 - 3 seconds).
Finally
If you got this far, you should probably download the IEEE 802.1ax-2008 standard.lacp pkt format:
Ref: http://h20564.www2.hpe.com/hpsc/doc/public/display?docId=mmr_sf-EN_US000005384
what is MC lag :
MC-LAG, or Multi-Chassis Link Aggregation Group, is a type of LAG with constituent ports that terminate on separate chassis, thereby providing node-level redundancy. Unlike link aggregation in general, MC-LAG is not covered under IEEE standard, and its implementation varies by vendor. Cisco’s vPC is a good example for a MC-LAG implementation. The real challenge with MC-LAG is to maintain a consistent control plane state across the LAG setup, which is why the various multi-chassis mechanisms insist on countermeasures such as peer links or out of band connectivity between the redundant chassis.
Ref: https://thenetworkway.wordpress.com/2015/05/01/an-overview-of-link-aggregation-and-lacp/
http://www.thomas-krenn.com/en/wiki/Link_Aggregation_and_LACP_basics
http://docs.oracle.com/cd/E19253-01/816-4554/fpjvl/index.html
from IEEE:
http://www.ieee802.org/3/hssg/public/apr07/frazier_01_0407.pdf
http://www.ieee802.org/3/ad/public/mar99/seaman_1_0399.pdf
No comments:
Post a Comment