Proxmox: Packet loss on bonded NICs in LACP mode
Got a bit of a strange problem. I have a machine running Proxmox 5.3 that has among its hardware a 4 port Intel NIC card (Gigabit, PCI-e) in addition to a fifth Gigabit ethernet on the motherboard.
I have the machine configured that the onboard NIC is the management interface for the machine, and the 4 gigabit NICs are bonded together with LACP (and connected to a HP ProCurve 1810G managed switch) - all the VM and containers on the box get network connectivity through the bonded NIC. Obviously, the switch is managed and supports LACP and I have configured a trunk setup on the switch for the 4 ports.
Everything seems to work fine, or so I thought.
Over the weekend I installed netdata on the Proxmox host, and now I'm getting continual alarms about packet loss on bond0 (the 4 bonded NICs). I'm a little perplexed as to why.
Looking at the statistics for bond0, it seems that RX packets are getting dropped with reasonable frequency (currently showing ~160 RX packets dropped in the last 10 minutes - no TX packets seem to get dropped).
Interface output below, you'll note that the bridge interface to the VMs has no dropped packets, it's happening only on bond0 and it's slaves. The MTU is set to 9000 (jumbo frames are enabled on the switch) - I was still seeing this issue with MTU as 1500. enp12s0 is the management NIC, the other 4 NICs are the bond slaves.
bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST> mtu 9000
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 347300 bytes 146689725 (139.8 MiB)
RX errors 0 dropped 11218 overruns 0 frame 0
TX packets 338459 bytes 132985798 (126.8 MiB)
TX errors 0 dropped 2 overruns 0 carrier 0 collisions 0
enp12s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.3 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::7285:c2ff:fe67:19b9 prefixlen 64 scopeid 0x20<link>
ether 70:85:c2:67:19:b9 txqueuelen 1000 (Ethernet)
RX packets 25416597 bytes 36117733348 (33.6 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 16850795 bytes 21472508786 (19.9 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp3s0f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 9000
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 225363 bytes 113059352 (107.8 MiB)
RX errors 0 dropped 2805 overruns 0 frame 0
TX packets 15162 bytes 2367657 (2.2 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp3s0f1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 9000
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 25499 bytes 6988254 (6.6 MiB)
RX errors 0 dropped 2805 overruns 0 frame 0
TX packets 263442 bytes 123302293 (117.5 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp4s0f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 9000
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 33208 bytes 11681537 (11.1 MiB)
RX errors 0 dropped 2804 overruns 0 frame 0
TX packets 42729 bytes 2258949 (2.1 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp4s0f1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 9000
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 63230 bytes 14960582 (14.2 MiB)
RX errors 0 dropped 2804 overruns 0 frame 0
TX packets 17126 bytes 5056899 (4.8 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
vmbr0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 192.168.1.4 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::21b:21ff:fec7:40d8 prefixlen 64 scopeid 0x20<link>
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 54616 bytes 5852177 (5.5 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 757 bytes 61270 (59.8 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Initially suspecting it was some kind of buffer issue, I did some tweaking in sysctl to make sure the buffer sizes were adequate. sysctl tweaks can be found here (these did not appear to make any difference):
https://paste.linux.community/view/3b5f2b63
Network config is:
auto lo
iface lo inet loopback
auto enp12s0
iface enp12s0 inet static
address 192.168.1.3
netmask 255.255.255.0
iface enp3s0f0 inet manual
iface enp3s0f1 inet manual
iface enp4s0f0 inet manual
iface enp4s0f1 inet manual
auto bond0
iface bond0 inet manual
bond-slaves enp3s0f0 enp3s0f1 enp4s0f0 enp4s0f1
bond-miimon 100
bond-mode 802.3ad
mtu 9000
auto vmbr0
iface vmbr0 inet static
address 192.168.1.4
netmask 255.255.255.0
gateway 192.168.1.1
bridge-ports bond0
bridge-stp off
bridge-fd 0
Troubleshooting steps I took:
a) sysctl tweaks (as attached)
b) MTU increase and enabling jumbo frames on the switch (no change)
c) Reset the switch and recreated the LACP trunk (no change)
Any ideas on what I should try next? I'm starting to think that there is something I don't understand about the NIC teaming. As I said, everything seems to work fine but it concerns me a little about the high packet loss.
Other machines on the network that are connected to the switch do not have this issue (the 5th NIC on the machine is fine, too).
linux networking network-adapter bonding proxmox
add a comment |
Got a bit of a strange problem. I have a machine running Proxmox 5.3 that has among its hardware a 4 port Intel NIC card (Gigabit, PCI-e) in addition to a fifth Gigabit ethernet on the motherboard.
I have the machine configured that the onboard NIC is the management interface for the machine, and the 4 gigabit NICs are bonded together with LACP (and connected to a HP ProCurve 1810G managed switch) - all the VM and containers on the box get network connectivity through the bonded NIC. Obviously, the switch is managed and supports LACP and I have configured a trunk setup on the switch for the 4 ports.
Everything seems to work fine, or so I thought.
Over the weekend I installed netdata on the Proxmox host, and now I'm getting continual alarms about packet loss on bond0 (the 4 bonded NICs). I'm a little perplexed as to why.
Looking at the statistics for bond0, it seems that RX packets are getting dropped with reasonable frequency (currently showing ~160 RX packets dropped in the last 10 minutes - no TX packets seem to get dropped).
Interface output below, you'll note that the bridge interface to the VMs has no dropped packets, it's happening only on bond0 and it's slaves. The MTU is set to 9000 (jumbo frames are enabled on the switch) - I was still seeing this issue with MTU as 1500. enp12s0 is the management NIC, the other 4 NICs are the bond slaves.
bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST> mtu 9000
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 347300 bytes 146689725 (139.8 MiB)
RX errors 0 dropped 11218 overruns 0 frame 0
TX packets 338459 bytes 132985798 (126.8 MiB)
TX errors 0 dropped 2 overruns 0 carrier 0 collisions 0
enp12s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.3 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::7285:c2ff:fe67:19b9 prefixlen 64 scopeid 0x20<link>
ether 70:85:c2:67:19:b9 txqueuelen 1000 (Ethernet)
RX packets 25416597 bytes 36117733348 (33.6 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 16850795 bytes 21472508786 (19.9 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp3s0f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 9000
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 225363 bytes 113059352 (107.8 MiB)
RX errors 0 dropped 2805 overruns 0 frame 0
TX packets 15162 bytes 2367657 (2.2 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp3s0f1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 9000
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 25499 bytes 6988254 (6.6 MiB)
RX errors 0 dropped 2805 overruns 0 frame 0
TX packets 263442 bytes 123302293 (117.5 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp4s0f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 9000
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 33208 bytes 11681537 (11.1 MiB)
RX errors 0 dropped 2804 overruns 0 frame 0
TX packets 42729 bytes 2258949 (2.1 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp4s0f1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 9000
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 63230 bytes 14960582 (14.2 MiB)
RX errors 0 dropped 2804 overruns 0 frame 0
TX packets 17126 bytes 5056899 (4.8 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
vmbr0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 192.168.1.4 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::21b:21ff:fec7:40d8 prefixlen 64 scopeid 0x20<link>
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 54616 bytes 5852177 (5.5 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 757 bytes 61270 (59.8 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Initially suspecting it was some kind of buffer issue, I did some tweaking in sysctl to make sure the buffer sizes were adequate. sysctl tweaks can be found here (these did not appear to make any difference):
https://paste.linux.community/view/3b5f2b63
Network config is:
auto lo
iface lo inet loopback
auto enp12s0
iface enp12s0 inet static
address 192.168.1.3
netmask 255.255.255.0
iface enp3s0f0 inet manual
iface enp3s0f1 inet manual
iface enp4s0f0 inet manual
iface enp4s0f1 inet manual
auto bond0
iface bond0 inet manual
bond-slaves enp3s0f0 enp3s0f1 enp4s0f0 enp4s0f1
bond-miimon 100
bond-mode 802.3ad
mtu 9000
auto vmbr0
iface vmbr0 inet static
address 192.168.1.4
netmask 255.255.255.0
gateway 192.168.1.1
bridge-ports bond0
bridge-stp off
bridge-fd 0
Troubleshooting steps I took:
a) sysctl tweaks (as attached)
b) MTU increase and enabling jumbo frames on the switch (no change)
c) Reset the switch and recreated the LACP trunk (no change)
Any ideas on what I should try next? I'm starting to think that there is something I don't understand about the NIC teaming. As I said, everything seems to work fine but it concerns me a little about the high packet loss.
Other machines on the network that are connected to the switch do not have this issue (the 5th NIC on the machine is fine, too).
linux networking network-adapter bonding proxmox
add a comment |
Got a bit of a strange problem. I have a machine running Proxmox 5.3 that has among its hardware a 4 port Intel NIC card (Gigabit, PCI-e) in addition to a fifth Gigabit ethernet on the motherboard.
I have the machine configured that the onboard NIC is the management interface for the machine, and the 4 gigabit NICs are bonded together with LACP (and connected to a HP ProCurve 1810G managed switch) - all the VM and containers on the box get network connectivity through the bonded NIC. Obviously, the switch is managed and supports LACP and I have configured a trunk setup on the switch for the 4 ports.
Everything seems to work fine, or so I thought.
Over the weekend I installed netdata on the Proxmox host, and now I'm getting continual alarms about packet loss on bond0 (the 4 bonded NICs). I'm a little perplexed as to why.
Looking at the statistics for bond0, it seems that RX packets are getting dropped with reasonable frequency (currently showing ~160 RX packets dropped in the last 10 minutes - no TX packets seem to get dropped).
Interface output below, you'll note that the bridge interface to the VMs has no dropped packets, it's happening only on bond0 and it's slaves. The MTU is set to 9000 (jumbo frames are enabled on the switch) - I was still seeing this issue with MTU as 1500. enp12s0 is the management NIC, the other 4 NICs are the bond slaves.
bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST> mtu 9000
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 347300 bytes 146689725 (139.8 MiB)
RX errors 0 dropped 11218 overruns 0 frame 0
TX packets 338459 bytes 132985798 (126.8 MiB)
TX errors 0 dropped 2 overruns 0 carrier 0 collisions 0
enp12s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.3 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::7285:c2ff:fe67:19b9 prefixlen 64 scopeid 0x20<link>
ether 70:85:c2:67:19:b9 txqueuelen 1000 (Ethernet)
RX packets 25416597 bytes 36117733348 (33.6 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 16850795 bytes 21472508786 (19.9 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp3s0f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 9000
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 225363 bytes 113059352 (107.8 MiB)
RX errors 0 dropped 2805 overruns 0 frame 0
TX packets 15162 bytes 2367657 (2.2 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp3s0f1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 9000
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 25499 bytes 6988254 (6.6 MiB)
RX errors 0 dropped 2805 overruns 0 frame 0
TX packets 263442 bytes 123302293 (117.5 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp4s0f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 9000
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 33208 bytes 11681537 (11.1 MiB)
RX errors 0 dropped 2804 overruns 0 frame 0
TX packets 42729 bytes 2258949 (2.1 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp4s0f1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 9000
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 63230 bytes 14960582 (14.2 MiB)
RX errors 0 dropped 2804 overruns 0 frame 0
TX packets 17126 bytes 5056899 (4.8 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
vmbr0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 192.168.1.4 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::21b:21ff:fec7:40d8 prefixlen 64 scopeid 0x20<link>
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 54616 bytes 5852177 (5.5 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 757 bytes 61270 (59.8 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Initially suspecting it was some kind of buffer issue, I did some tweaking in sysctl to make sure the buffer sizes were adequate. sysctl tweaks can be found here (these did not appear to make any difference):
https://paste.linux.community/view/3b5f2b63
Network config is:
auto lo
iface lo inet loopback
auto enp12s0
iface enp12s0 inet static
address 192.168.1.3
netmask 255.255.255.0
iface enp3s0f0 inet manual
iface enp3s0f1 inet manual
iface enp4s0f0 inet manual
iface enp4s0f1 inet manual
auto bond0
iface bond0 inet manual
bond-slaves enp3s0f0 enp3s0f1 enp4s0f0 enp4s0f1
bond-miimon 100
bond-mode 802.3ad
mtu 9000
auto vmbr0
iface vmbr0 inet static
address 192.168.1.4
netmask 255.255.255.0
gateway 192.168.1.1
bridge-ports bond0
bridge-stp off
bridge-fd 0
Troubleshooting steps I took:
a) sysctl tweaks (as attached)
b) MTU increase and enabling jumbo frames on the switch (no change)
c) Reset the switch and recreated the LACP trunk (no change)
Any ideas on what I should try next? I'm starting to think that there is something I don't understand about the NIC teaming. As I said, everything seems to work fine but it concerns me a little about the high packet loss.
Other machines on the network that are connected to the switch do not have this issue (the 5th NIC on the machine is fine, too).
linux networking network-adapter bonding proxmox
Got a bit of a strange problem. I have a machine running Proxmox 5.3 that has among its hardware a 4 port Intel NIC card (Gigabit, PCI-e) in addition to a fifth Gigabit ethernet on the motherboard.
I have the machine configured that the onboard NIC is the management interface for the machine, and the 4 gigabit NICs are bonded together with LACP (and connected to a HP ProCurve 1810G managed switch) - all the VM and containers on the box get network connectivity through the bonded NIC. Obviously, the switch is managed and supports LACP and I have configured a trunk setup on the switch for the 4 ports.
Everything seems to work fine, or so I thought.
Over the weekend I installed netdata on the Proxmox host, and now I'm getting continual alarms about packet loss on bond0 (the 4 bonded NICs). I'm a little perplexed as to why.
Looking at the statistics for bond0, it seems that RX packets are getting dropped with reasonable frequency (currently showing ~160 RX packets dropped in the last 10 minutes - no TX packets seem to get dropped).
Interface output below, you'll note that the bridge interface to the VMs has no dropped packets, it's happening only on bond0 and it's slaves. The MTU is set to 9000 (jumbo frames are enabled on the switch) - I was still seeing this issue with MTU as 1500. enp12s0 is the management NIC, the other 4 NICs are the bond slaves.
bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST> mtu 9000
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 347300 bytes 146689725 (139.8 MiB)
RX errors 0 dropped 11218 overruns 0 frame 0
TX packets 338459 bytes 132985798 (126.8 MiB)
TX errors 0 dropped 2 overruns 0 carrier 0 collisions 0
enp12s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.3 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::7285:c2ff:fe67:19b9 prefixlen 64 scopeid 0x20<link>
ether 70:85:c2:67:19:b9 txqueuelen 1000 (Ethernet)
RX packets 25416597 bytes 36117733348 (33.6 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 16850795 bytes 21472508786 (19.9 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp3s0f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 9000
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 225363 bytes 113059352 (107.8 MiB)
RX errors 0 dropped 2805 overruns 0 frame 0
TX packets 15162 bytes 2367657 (2.2 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp3s0f1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 9000
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 25499 bytes 6988254 (6.6 MiB)
RX errors 0 dropped 2805 overruns 0 frame 0
TX packets 263442 bytes 123302293 (117.5 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp4s0f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 9000
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 33208 bytes 11681537 (11.1 MiB)
RX errors 0 dropped 2804 overruns 0 frame 0
TX packets 42729 bytes 2258949 (2.1 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp4s0f1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 9000
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 63230 bytes 14960582 (14.2 MiB)
RX errors 0 dropped 2804 overruns 0 frame 0
TX packets 17126 bytes 5056899 (4.8 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
vmbr0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 192.168.1.4 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::21b:21ff:fec7:40d8 prefixlen 64 scopeid 0x20<link>
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 54616 bytes 5852177 (5.5 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 757 bytes 61270 (59.8 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Initially suspecting it was some kind of buffer issue, I did some tweaking in sysctl to make sure the buffer sizes were adequate. sysctl tweaks can be found here (these did not appear to make any difference):
https://paste.linux.community/view/3b5f2b63
Network config is:
auto lo
iface lo inet loopback
auto enp12s0
iface enp12s0 inet static
address 192.168.1.3
netmask 255.255.255.0
iface enp3s0f0 inet manual
iface enp3s0f1 inet manual
iface enp4s0f0 inet manual
iface enp4s0f1 inet manual
auto bond0
iface bond0 inet manual
bond-slaves enp3s0f0 enp3s0f1 enp4s0f0 enp4s0f1
bond-miimon 100
bond-mode 802.3ad
mtu 9000
auto vmbr0
iface vmbr0 inet static
address 192.168.1.4
netmask 255.255.255.0
gateway 192.168.1.1
bridge-ports bond0
bridge-stp off
bridge-fd 0
Troubleshooting steps I took:
a) sysctl tweaks (as attached)
b) MTU increase and enabling jumbo frames on the switch (no change)
c) Reset the switch and recreated the LACP trunk (no change)
Any ideas on what I should try next? I'm starting to think that there is something I don't understand about the NIC teaming. As I said, everything seems to work fine but it concerns me a little about the high packet loss.
Other machines on the network that are connected to the switch do not have this issue (the 5th NIC on the machine is fine, too).
linux networking network-adapter bonding proxmox
linux networking network-adapter bonding proxmox
asked Feb 11 at 10:18
NOPNOP
228312
228312
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
I have seen this before: HP switches seem to sometimes send broadast packets to all members of an LACP trunk. The Kernel then sees these packets as duplicates and drops those (apart from the first arriving one of course).
While this is of course not elegant, it seems not to create problems in real life. You can verify if it is this effect by sending many broadcast packets on purpose and checking if this lines up with the drop statistics.
Hmm, I did suspect something like this. That would make sense (from the math, being that there's basically the same number of dropped packets across each NIC). Is there a way to tell exactly what's getting dropped? I ran dropwatch on the machine, but I really couldn't interpret the output well enough to conclude one way or the other.
– NOP
Feb 11 at 10:41
My diagnostics were rather basic: Increase rate of broadcast packets and see, if drop rate increases acordingly. I didn't investigate much further, as everything was working fine with no loss of payload packets.
– Eugen Rieck
Feb 11 at 10:59
I did some more digging around. I wrote a script to fire out a whole bunch of broadcast packets. At the same time, I ran tcpdump on all 4 individual NICs + the bond interface. While the broadcast traffic was mostly appearing only on 1 NIC, the odd packet did end up hitting another NIC. It didn't seem to increase the rate of dropped packets, though. The box has been running for a couple of hours, and with my several VMs and containers running + me pelting it with broadcast packets, bond0 has dropped out ~700 of 1437352 RX packets. This is representative of what I've been seeing.
– NOP
Feb 13 at 2:13
Ok, to try do some further digging I shut down a particularly chatty VM on the box, which cleared up tcpdump quite a bit. Looking at the tcpdump of the bond interface while watching the interface statistics, there's very strong correlation between ARP requests being on the wire and packets getting dropped. Still sitting at around ~1.5% dropped RX packets on the bond. I guess this makes sense, right?
– NOP
Feb 13 at 2:44
ARP packets are the canonical example of broadcast traffic, so this does tend to confirm our suspicions. TBH I don't know how ARP over an LACP trunk is actually supposed to be broadcast - maybe an "all ports" policy is the standrad?
– Eugen Rieck
Feb 13 at 8:05
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1404381%2fproxmox-packet-loss-on-bonded-nics-in-lacp-mode%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I have seen this before: HP switches seem to sometimes send broadast packets to all members of an LACP trunk. The Kernel then sees these packets as duplicates and drops those (apart from the first arriving one of course).
While this is of course not elegant, it seems not to create problems in real life. You can verify if it is this effect by sending many broadcast packets on purpose and checking if this lines up with the drop statistics.
Hmm, I did suspect something like this. That would make sense (from the math, being that there's basically the same number of dropped packets across each NIC). Is there a way to tell exactly what's getting dropped? I ran dropwatch on the machine, but I really couldn't interpret the output well enough to conclude one way or the other.
– NOP
Feb 11 at 10:41
My diagnostics were rather basic: Increase rate of broadcast packets and see, if drop rate increases acordingly. I didn't investigate much further, as everything was working fine with no loss of payload packets.
– Eugen Rieck
Feb 11 at 10:59
I did some more digging around. I wrote a script to fire out a whole bunch of broadcast packets. At the same time, I ran tcpdump on all 4 individual NICs + the bond interface. While the broadcast traffic was mostly appearing only on 1 NIC, the odd packet did end up hitting another NIC. It didn't seem to increase the rate of dropped packets, though. The box has been running for a couple of hours, and with my several VMs and containers running + me pelting it with broadcast packets, bond0 has dropped out ~700 of 1437352 RX packets. This is representative of what I've been seeing.
– NOP
Feb 13 at 2:13
Ok, to try do some further digging I shut down a particularly chatty VM on the box, which cleared up tcpdump quite a bit. Looking at the tcpdump of the bond interface while watching the interface statistics, there's very strong correlation between ARP requests being on the wire and packets getting dropped. Still sitting at around ~1.5% dropped RX packets on the bond. I guess this makes sense, right?
– NOP
Feb 13 at 2:44
ARP packets are the canonical example of broadcast traffic, so this does tend to confirm our suspicions. TBH I don't know how ARP over an LACP trunk is actually supposed to be broadcast - maybe an "all ports" policy is the standrad?
– Eugen Rieck
Feb 13 at 8:05
add a comment |
I have seen this before: HP switches seem to sometimes send broadast packets to all members of an LACP trunk. The Kernel then sees these packets as duplicates and drops those (apart from the first arriving one of course).
While this is of course not elegant, it seems not to create problems in real life. You can verify if it is this effect by sending many broadcast packets on purpose and checking if this lines up with the drop statistics.
Hmm, I did suspect something like this. That would make sense (from the math, being that there's basically the same number of dropped packets across each NIC). Is there a way to tell exactly what's getting dropped? I ran dropwatch on the machine, but I really couldn't interpret the output well enough to conclude one way or the other.
– NOP
Feb 11 at 10:41
My diagnostics were rather basic: Increase rate of broadcast packets and see, if drop rate increases acordingly. I didn't investigate much further, as everything was working fine with no loss of payload packets.
– Eugen Rieck
Feb 11 at 10:59
I did some more digging around. I wrote a script to fire out a whole bunch of broadcast packets. At the same time, I ran tcpdump on all 4 individual NICs + the bond interface. While the broadcast traffic was mostly appearing only on 1 NIC, the odd packet did end up hitting another NIC. It didn't seem to increase the rate of dropped packets, though. The box has been running for a couple of hours, and with my several VMs and containers running + me pelting it with broadcast packets, bond0 has dropped out ~700 of 1437352 RX packets. This is representative of what I've been seeing.
– NOP
Feb 13 at 2:13
Ok, to try do some further digging I shut down a particularly chatty VM on the box, which cleared up tcpdump quite a bit. Looking at the tcpdump of the bond interface while watching the interface statistics, there's very strong correlation between ARP requests being on the wire and packets getting dropped. Still sitting at around ~1.5% dropped RX packets on the bond. I guess this makes sense, right?
– NOP
Feb 13 at 2:44
ARP packets are the canonical example of broadcast traffic, so this does tend to confirm our suspicions. TBH I don't know how ARP over an LACP trunk is actually supposed to be broadcast - maybe an "all ports" policy is the standrad?
– Eugen Rieck
Feb 13 at 8:05
add a comment |
I have seen this before: HP switches seem to sometimes send broadast packets to all members of an LACP trunk. The Kernel then sees these packets as duplicates and drops those (apart from the first arriving one of course).
While this is of course not elegant, it seems not to create problems in real life. You can verify if it is this effect by sending many broadcast packets on purpose and checking if this lines up with the drop statistics.
I have seen this before: HP switches seem to sometimes send broadast packets to all members of an LACP trunk. The Kernel then sees these packets as duplicates and drops those (apart from the first arriving one of course).
While this is of course not elegant, it seems not to create problems in real life. You can verify if it is this effect by sending many broadcast packets on purpose and checking if this lines up with the drop statistics.
answered Feb 11 at 10:32
Eugen RieckEugen Rieck
11.2k22429
11.2k22429
Hmm, I did suspect something like this. That would make sense (from the math, being that there's basically the same number of dropped packets across each NIC). Is there a way to tell exactly what's getting dropped? I ran dropwatch on the machine, but I really couldn't interpret the output well enough to conclude one way or the other.
– NOP
Feb 11 at 10:41
My diagnostics were rather basic: Increase rate of broadcast packets and see, if drop rate increases acordingly. I didn't investigate much further, as everything was working fine with no loss of payload packets.
– Eugen Rieck
Feb 11 at 10:59
I did some more digging around. I wrote a script to fire out a whole bunch of broadcast packets. At the same time, I ran tcpdump on all 4 individual NICs + the bond interface. While the broadcast traffic was mostly appearing only on 1 NIC, the odd packet did end up hitting another NIC. It didn't seem to increase the rate of dropped packets, though. The box has been running for a couple of hours, and with my several VMs and containers running + me pelting it with broadcast packets, bond0 has dropped out ~700 of 1437352 RX packets. This is representative of what I've been seeing.
– NOP
Feb 13 at 2:13
Ok, to try do some further digging I shut down a particularly chatty VM on the box, which cleared up tcpdump quite a bit. Looking at the tcpdump of the bond interface while watching the interface statistics, there's very strong correlation between ARP requests being on the wire and packets getting dropped. Still sitting at around ~1.5% dropped RX packets on the bond. I guess this makes sense, right?
– NOP
Feb 13 at 2:44
ARP packets are the canonical example of broadcast traffic, so this does tend to confirm our suspicions. TBH I don't know how ARP over an LACP trunk is actually supposed to be broadcast - maybe an "all ports" policy is the standrad?
– Eugen Rieck
Feb 13 at 8:05
add a comment |
Hmm, I did suspect something like this. That would make sense (from the math, being that there's basically the same number of dropped packets across each NIC). Is there a way to tell exactly what's getting dropped? I ran dropwatch on the machine, but I really couldn't interpret the output well enough to conclude one way or the other.
– NOP
Feb 11 at 10:41
My diagnostics were rather basic: Increase rate of broadcast packets and see, if drop rate increases acordingly. I didn't investigate much further, as everything was working fine with no loss of payload packets.
– Eugen Rieck
Feb 11 at 10:59
I did some more digging around. I wrote a script to fire out a whole bunch of broadcast packets. At the same time, I ran tcpdump on all 4 individual NICs + the bond interface. While the broadcast traffic was mostly appearing only on 1 NIC, the odd packet did end up hitting another NIC. It didn't seem to increase the rate of dropped packets, though. The box has been running for a couple of hours, and with my several VMs and containers running + me pelting it with broadcast packets, bond0 has dropped out ~700 of 1437352 RX packets. This is representative of what I've been seeing.
– NOP
Feb 13 at 2:13
Ok, to try do some further digging I shut down a particularly chatty VM on the box, which cleared up tcpdump quite a bit. Looking at the tcpdump of the bond interface while watching the interface statistics, there's very strong correlation between ARP requests being on the wire and packets getting dropped. Still sitting at around ~1.5% dropped RX packets on the bond. I guess this makes sense, right?
– NOP
Feb 13 at 2:44
ARP packets are the canonical example of broadcast traffic, so this does tend to confirm our suspicions. TBH I don't know how ARP over an LACP trunk is actually supposed to be broadcast - maybe an "all ports" policy is the standrad?
– Eugen Rieck
Feb 13 at 8:05
Hmm, I did suspect something like this. That would make sense (from the math, being that there's basically the same number of dropped packets across each NIC). Is there a way to tell exactly what's getting dropped? I ran dropwatch on the machine, but I really couldn't interpret the output well enough to conclude one way or the other.
– NOP
Feb 11 at 10:41
Hmm, I did suspect something like this. That would make sense (from the math, being that there's basically the same number of dropped packets across each NIC). Is there a way to tell exactly what's getting dropped? I ran dropwatch on the machine, but I really couldn't interpret the output well enough to conclude one way or the other.
– NOP
Feb 11 at 10:41
My diagnostics were rather basic: Increase rate of broadcast packets and see, if drop rate increases acordingly. I didn't investigate much further, as everything was working fine with no loss of payload packets.
– Eugen Rieck
Feb 11 at 10:59
My diagnostics were rather basic: Increase rate of broadcast packets and see, if drop rate increases acordingly. I didn't investigate much further, as everything was working fine with no loss of payload packets.
– Eugen Rieck
Feb 11 at 10:59
I did some more digging around. I wrote a script to fire out a whole bunch of broadcast packets. At the same time, I ran tcpdump on all 4 individual NICs + the bond interface. While the broadcast traffic was mostly appearing only on 1 NIC, the odd packet did end up hitting another NIC. It didn't seem to increase the rate of dropped packets, though. The box has been running for a couple of hours, and with my several VMs and containers running + me pelting it with broadcast packets, bond0 has dropped out ~700 of 1437352 RX packets. This is representative of what I've been seeing.
– NOP
Feb 13 at 2:13
I did some more digging around. I wrote a script to fire out a whole bunch of broadcast packets. At the same time, I ran tcpdump on all 4 individual NICs + the bond interface. While the broadcast traffic was mostly appearing only on 1 NIC, the odd packet did end up hitting another NIC. It didn't seem to increase the rate of dropped packets, though. The box has been running for a couple of hours, and with my several VMs and containers running + me pelting it with broadcast packets, bond0 has dropped out ~700 of 1437352 RX packets. This is representative of what I've been seeing.
– NOP
Feb 13 at 2:13
Ok, to try do some further digging I shut down a particularly chatty VM on the box, which cleared up tcpdump quite a bit. Looking at the tcpdump of the bond interface while watching the interface statistics, there's very strong correlation between ARP requests being on the wire and packets getting dropped. Still sitting at around ~1.5% dropped RX packets on the bond. I guess this makes sense, right?
– NOP
Feb 13 at 2:44
Ok, to try do some further digging I shut down a particularly chatty VM on the box, which cleared up tcpdump quite a bit. Looking at the tcpdump of the bond interface while watching the interface statistics, there's very strong correlation between ARP requests being on the wire and packets getting dropped. Still sitting at around ~1.5% dropped RX packets on the bond. I guess this makes sense, right?
– NOP
Feb 13 at 2:44
ARP packets are the canonical example of broadcast traffic, so this does tend to confirm our suspicions. TBH I don't know how ARP over an LACP trunk is actually supposed to be broadcast - maybe an "all ports" policy is the standrad?
– Eugen Rieck
Feb 13 at 8:05
ARP packets are the canonical example of broadcast traffic, so this does tend to confirm our suspicions. TBH I don't know how ARP over an LACP trunk is actually supposed to be broadcast - maybe an "all ports" policy is the standrad?
– Eugen Rieck
Feb 13 at 8:05
add a comment |
Thanks for contributing an answer to Super User!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1404381%2fproxmox-packet-loss-on-bonded-nics-in-lacp-mode%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown