Proxmox: Packet loss on bonded NICs in LACP mode

Got a bit of a strange problem. I have a machine running Proxmox 5.3 that has among its hardware a 4 port Intel NIC card (Gigabit, PCI-e) in addition to a fifth Gigabit ethernet on the motherboard.

I have the machine configured that the onboard NIC is the management interface for the machine, and the 4 gigabit NICs are bonded together with LACP (and connected to a HP ProCurve 1810G managed switch) - all the VM and containers on the box get network connectivity through the bonded NIC. Obviously, the switch is managed and supports LACP and I have configured a trunk setup on the switch for the 4 ports.

Everything seems to work fine, or so I thought.

Over the weekend I installed netdata on the Proxmox host, and now I'm getting continual alarms about packet loss on bond0 (the 4 bonded NICs). I'm a little perplexed as to why.

Looking at the statistics for bond0, it seems that RX packets are getting dropped with reasonable frequency (currently showing ~160 RX packets dropped in the last 10 minutes - no TX packets seem to get dropped).

Interface output below, you'll note that the bridge interface to the VMs has no dropped packets, it's happening only on bond0 and it's slaves. The MTU is set to 9000 (jumbo frames are enabled on the switch) - I was still seeing this issue with MTU as 1500. enp12s0 is the management NIC, the other 4 NICs are the bond slaves.

bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST>  mtu 9000

    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)

    RX packets 347300  bytes 146689725 (139.8 MiB)

    RX errors 0  dropped 11218  overruns 0  frame 0

    TX packets 338459  bytes 132985798 (126.8 MiB)

    TX errors 0  dropped 2 overruns 0  carrier 0  collisions 0



enp12s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500

    inet 192.168.1.3  netmask 255.255.255.0  broadcast 192.168.1.255

    inet6 fe80::7285:c2ff:fe67:19b9  prefixlen 64  scopeid 0x20<link>

    ether 70:85:c2:67:19:b9  txqueuelen 1000  (Ethernet)

    RX packets 25416597  bytes 36117733348 (33.6 GiB)

    RX errors 0  dropped 0  overruns 0  frame 0

    TX packets 16850795  bytes 21472508786 (19.9 GiB)

    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



enp3s0f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 9000

    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)

    RX packets 225363  bytes 113059352 (107.8 MiB)

    RX errors 0  dropped 2805  overruns 0  frame 0

    TX packets 15162  bytes 2367657 (2.2 MiB)

    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



enp3s0f1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 9000

    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)

    RX packets 25499  bytes 6988254 (6.6 MiB)

    RX errors 0  dropped 2805  overruns 0  frame 0

    TX packets 263442  bytes 123302293 (117.5 MiB)

    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



enp4s0f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 9000

    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)

    RX packets 33208  bytes 11681537 (11.1 MiB)

    RX errors 0  dropped 2804  overruns 0  frame 0

    TX packets 42729  bytes 2258949 (2.1 MiB)

    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



enp4s0f1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 9000

    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)

    RX packets 63230  bytes 14960582 (14.2 MiB)

    RX errors 0  dropped 2804  overruns 0  frame 0

    TX packets 17126  bytes 5056899 (4.8 MiB)

    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



vmbr0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000

    inet 192.168.1.4  netmask 255.255.255.0  broadcast 192.168.1.255

    inet6 fe80::21b:21ff:fec7:40d8  prefixlen 64  scopeid 0x20<link>

    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)

    RX packets 54616  bytes 5852177 (5.5 MiB)

    RX errors 0  dropped 0  overruns 0  frame 0

    TX packets 757  bytes 61270 (59.8 KiB)

    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Initially suspecting it was some kind of buffer issue, I did some tweaking in sysctl to make sure the buffer sizes were adequate. sysctl tweaks can be found here (these did not appear to make any difference):

https://paste.linux.community/view/3b5f2b63

Network config is:

auto lo

iface lo inet loopback



auto enp12s0

iface enp12s0 inet static

    address  192.168.1.3

    netmask  255.255.255.0



iface enp3s0f0 inet manual



iface enp3s0f1 inet manual



iface enp4s0f0 inet manual



iface enp4s0f1 inet manual



auto bond0

iface bond0 inet manual

    bond-slaves enp3s0f0 enp3s0f1 enp4s0f0 enp4s0f1

    bond-miimon 100

    bond-mode 802.3ad

    mtu 9000



auto vmbr0

iface vmbr0 inet static

    address  192.168.1.4

    netmask  255.255.255.0

    gateway  192.168.1.1

    bridge-ports bond0

    bridge-stp off

    bridge-fd 0

Troubleshooting steps I took:

a) sysctl tweaks (as attached)
b) MTU increase and enabling jumbo frames on the switch (no change)
c) Reset the switch and recreated the LACP trunk (no change)

Any ideas on what I should try next? I'm starting to think that there is something I don't understand about the NIC teaming. As I said, everything seems to work fine but it concerns me a little about the high packet loss.

Other machines on the network that are connected to the switch do not have this issue (the 5th NIC on the machine is fine, too).

asked Feb 11 at 10:18

NOP

228312

add a comment |

Got a bit of a strange problem. I have a machine running Proxmox 5.3 that has among its hardware a 4 port Intel NIC card (Gigabit, PCI-e) in addition to a fifth Gigabit ethernet on the motherboard.

Everything seems to work fine, or so I thought.

Over the weekend I installed netdata on the Proxmox host, and now I'm getting continual alarms about packet loss on bond0 (the 4 bonded NICs). I'm a little perplexed as to why.

bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST>  mtu 9000

    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)

    RX packets 347300  bytes 146689725 (139.8 MiB)

    RX errors 0  dropped 11218  overruns 0  frame 0

    TX packets 338459  bytes 132985798 (126.8 MiB)

    TX errors 0  dropped 2 overruns 0  carrier 0  collisions 0



enp12s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500

    inet 192.168.1.3  netmask 255.255.255.0  broadcast 192.168.1.255

    inet6 fe80::7285:c2ff:fe67:19b9  prefixlen 64  scopeid 0x20<link>

    ether 70:85:c2:67:19:b9  txqueuelen 1000  (Ethernet)

    RX packets 25416597  bytes 36117733348 (33.6 GiB)

    RX errors 0  dropped 0  overruns 0  frame 0

    TX packets 16850795  bytes 21472508786 (19.9 GiB)

    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



enp3s0f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 9000

    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)

    RX packets 225363  bytes 113059352 (107.8 MiB)

    RX errors 0  dropped 2805  overruns 0  frame 0

    TX packets 15162  bytes 2367657 (2.2 MiB)

    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



enp3s0f1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 9000

    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)

    RX packets 25499  bytes 6988254 (6.6 MiB)

    RX errors 0  dropped 2805  overruns 0  frame 0

    TX packets 263442  bytes 123302293 (117.5 MiB)

    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



enp4s0f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 9000

    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)

    RX packets 33208  bytes 11681537 (11.1 MiB)

    RX errors 0  dropped 2804  overruns 0  frame 0

    TX packets 42729  bytes 2258949 (2.1 MiB)

    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



enp4s0f1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 9000

    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)

    RX packets 63230  bytes 14960582 (14.2 MiB)

    RX errors 0  dropped 2804  overruns 0  frame 0

    TX packets 17126  bytes 5056899 (4.8 MiB)

    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



vmbr0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000

    inet 192.168.1.4  netmask 255.255.255.0  broadcast 192.168.1.255

    inet6 fe80::21b:21ff:fec7:40d8  prefixlen 64  scopeid 0x20<link>

    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)

    RX packets 54616  bytes 5852177 (5.5 MiB)

    RX errors 0  dropped 0  overruns 0  frame 0

    TX packets 757  bytes 61270 (59.8 KiB)

    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

https://paste.linux.community/view/3b5f2b63

Network config is:

auto lo

iface lo inet loopback



auto enp12s0

iface enp12s0 inet static

    address  192.168.1.3

    netmask  255.255.255.0



iface enp3s0f0 inet manual



iface enp3s0f1 inet manual



iface enp4s0f0 inet manual



iface enp4s0f1 inet manual



auto bond0

iface bond0 inet manual

    bond-slaves enp3s0f0 enp3s0f1 enp4s0f0 enp4s0f1

    bond-miimon 100

    bond-mode 802.3ad

    mtu 9000



auto vmbr0

iface vmbr0 inet static

    address  192.168.1.4

    netmask  255.255.255.0

    gateway  192.168.1.1

    bridge-ports bond0

    bridge-stp off

    bridge-fd 0

Troubleshooting steps I took:

a) sysctl tweaks (as attached)
b) MTU increase and enabling jumbo frames on the switch (no change)
c) Reset the switch and recreated the LACP trunk (no change)

Other machines on the network that are connected to the switch do not have this issue (the 5th NIC on the machine is fine, too).

asked Feb 11 at 10:18

NOP

228312

add a comment |

Got a bit of a strange problem. I have a machine running Proxmox 5.3 that has among its hardware a 4 port Intel NIC card (Gigabit, PCI-e) in addition to a fifth Gigabit ethernet on the motherboard.

Everything seems to work fine, or so I thought.

Over the weekend I installed netdata on the Proxmox host, and now I'm getting continual alarms about packet loss on bond0 (the 4 bonded NICs). I'm a little perplexed as to why.

bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST>  mtu 9000

    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)

    RX packets 347300  bytes 146689725 (139.8 MiB)

    RX errors 0  dropped 11218  overruns 0  frame 0

    TX packets 338459  bytes 132985798 (126.8 MiB)

    TX errors 0  dropped 2 overruns 0  carrier 0  collisions 0



enp12s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500

    inet 192.168.1.3  netmask 255.255.255.0  broadcast 192.168.1.255

    inet6 fe80::7285:c2ff:fe67:19b9  prefixlen 64  scopeid 0x20<link>

    ether 70:85:c2:67:19:b9  txqueuelen 1000  (Ethernet)

    RX packets 25416597  bytes 36117733348 (33.6 GiB)

    RX errors 0  dropped 0  overruns 0  frame 0

    TX packets 16850795  bytes 21472508786 (19.9 GiB)

    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



enp3s0f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 9000

    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)

    RX packets 225363  bytes 113059352 (107.8 MiB)

    RX errors 0  dropped 2805  overruns 0  frame 0

    TX packets 15162  bytes 2367657 (2.2 MiB)

    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



enp3s0f1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 9000

    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)

    RX packets 25499  bytes 6988254 (6.6 MiB)

    RX errors 0  dropped 2805  overruns 0  frame 0

    TX packets 263442  bytes 123302293 (117.5 MiB)

    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



enp4s0f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 9000

    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)

    RX packets 33208  bytes 11681537 (11.1 MiB)

    RX errors 0  dropped 2804  overruns 0  frame 0

    TX packets 42729  bytes 2258949 (2.1 MiB)

    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



enp4s0f1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 9000

    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)

    RX packets 63230  bytes 14960582 (14.2 MiB)

    RX errors 0  dropped 2804  overruns 0  frame 0

    TX packets 17126  bytes 5056899 (4.8 MiB)

    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



vmbr0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000

    inet 192.168.1.4  netmask 255.255.255.0  broadcast 192.168.1.255

    inet6 fe80::21b:21ff:fec7:40d8  prefixlen 64  scopeid 0x20<link>

    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)

    RX packets 54616  bytes 5852177 (5.5 MiB)

    RX errors 0  dropped 0  overruns 0  frame 0

    TX packets 757  bytes 61270 (59.8 KiB)

    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

https://paste.linux.community/view/3b5f2b63

Network config is:

auto lo

iface lo inet loopback



auto enp12s0

iface enp12s0 inet static

    address  192.168.1.3

    netmask  255.255.255.0



iface enp3s0f0 inet manual



iface enp3s0f1 inet manual



iface enp4s0f0 inet manual



iface enp4s0f1 inet manual



auto bond0

iface bond0 inet manual

    bond-slaves enp3s0f0 enp3s0f1 enp4s0f0 enp4s0f1

    bond-miimon 100

    bond-mode 802.3ad

    mtu 9000



auto vmbr0

iface vmbr0 inet static

    address  192.168.1.4

    netmask  255.255.255.0

    gateway  192.168.1.1

    bridge-ports bond0

    bridge-stp off

    bridge-fd 0

Troubleshooting steps I took:

a) sysctl tweaks (as attached)
b) MTU increase and enabling jumbo frames on the switch (no change)
c) Reset the switch and recreated the LACP trunk (no change)

Other machines on the network that are connected to the switch do not have this issue (the 5th NIC on the machine is fine, too).

asked Feb 11 at 10:18

NOP

228312

Got a bit of a strange problem. I have a machine running Proxmox 5.3 that has among its hardware a 4 port Intel NIC card (Gigabit, PCI-e) in addition to a fifth Gigabit ethernet on the motherboard.

Everything seems to work fine, or so I thought.

Over the weekend I installed netdata on the Proxmox host, and now I'm getting continual alarms about packet loss on bond0 (the 4 bonded NICs). I'm a little perplexed as to why.

bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST>  mtu 9000

    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)

    RX packets 347300  bytes 146689725 (139.8 MiB)

    RX errors 0  dropped 11218  overruns 0  frame 0

    TX packets 338459  bytes 132985798 (126.8 MiB)

    TX errors 0  dropped 2 overruns 0  carrier 0  collisions 0



enp12s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500

    inet 192.168.1.3  netmask 255.255.255.0  broadcast 192.168.1.255

    inet6 fe80::7285:c2ff:fe67:19b9  prefixlen 64  scopeid 0x20<link>

    ether 70:85:c2:67:19:b9  txqueuelen 1000  (Ethernet)

    RX packets 25416597  bytes 36117733348 (33.6 GiB)

    RX errors 0  dropped 0  overruns 0  frame 0

    TX packets 16850795  bytes 21472508786 (19.9 GiB)

    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



enp3s0f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 9000

    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)

    RX packets 225363  bytes 113059352 (107.8 MiB)

    RX errors 0  dropped 2805  overruns 0  frame 0

    TX packets 15162  bytes 2367657 (2.2 MiB)

    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



enp3s0f1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 9000

    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)

    RX packets 25499  bytes 6988254 (6.6 MiB)

    RX errors 0  dropped 2805  overruns 0  frame 0

    TX packets 263442  bytes 123302293 (117.5 MiB)

    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



enp4s0f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 9000

    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)

    RX packets 33208  bytes 11681537 (11.1 MiB)

    RX errors 0  dropped 2804  overruns 0  frame 0

    TX packets 42729  bytes 2258949 (2.1 MiB)

    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



enp4s0f1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 9000

    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)

    RX packets 63230  bytes 14960582 (14.2 MiB)

    RX errors 0  dropped 2804  overruns 0  frame 0

    TX packets 17126  bytes 5056899 (4.8 MiB)

    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



vmbr0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000

    inet 192.168.1.4  netmask 255.255.255.0  broadcast 192.168.1.255

    inet6 fe80::21b:21ff:fec7:40d8  prefixlen 64  scopeid 0x20<link>

    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)

    RX packets 54616  bytes 5852177 (5.5 MiB)

    RX errors 0  dropped 0  overruns 0  frame 0

    TX packets 757  bytes 61270 (59.8 KiB)

    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

https://paste.linux.community/view/3b5f2b63

Network config is:

auto lo

iface lo inet loopback



auto enp12s0

iface enp12s0 inet static

    address  192.168.1.3

    netmask  255.255.255.0



iface enp3s0f0 inet manual



iface enp3s0f1 inet manual



iface enp4s0f0 inet manual



iface enp4s0f1 inet manual



auto bond0

iface bond0 inet manual

    bond-slaves enp3s0f0 enp3s0f1 enp4s0f0 enp4s0f1

    bond-miimon 100

    bond-mode 802.3ad

    mtu 9000



auto vmbr0

iface vmbr0 inet static

    address  192.168.1.4

    netmask  255.255.255.0

    gateway  192.168.1.1

    bridge-ports bond0

    bridge-stp off

    bridge-fd 0

Troubleshooting steps I took:

a) sysctl tweaks (as attached)
b) MTU increase and enabling jumbo frames on the switch (no change)
c) Reset the switch and recreated the LACP trunk (no change)

Other machines on the network that are connected to the switch do not have this issue (the 5th NIC on the machine is fine, too).

linux networking network-adapter bonding proxmox

asked Feb 11 at 10:18

NOP

228312

asked Feb 11 at 10:18

NOP

228312

asked Feb 11 at 10:18

NOP

228312

asked Feb 11 at 10:18

NOP

228312

asked Feb 11 at 10:18

NOP

228312

add a comment |

1 Answer
1

active

oldest

votes

I have seen this before: HP switches seem to sometimes send broadast packets to all members of an LACP trunk. The Kernel then sees these packets as duplicates and drops those (apart from the first arriving one of course).

While this is of course not elegant, it seems not to create problems in real life. You can verify if it is this effect by sending many broadcast packets on purpose and checking if this lines up with the drop statistics.

answered Feb 11 at 10:32

Eugen Rieck

11.2k22429

Hmm, I did suspect something like this. That would make sense (from the math, being that there's basically the same number of dropped packets across each NIC). Is there a way to tell exactly what's getting dropped? I ran dropwatch on the machine, but I really couldn't interpret the output well enough to conclude one way or the other.

– NOP
Feb 11 at 10:41

My diagnostics were rather basic: Increase rate of broadcast packets and see, if drop rate increases acordingly. I didn't investigate much further, as everything was working fine with no loss of payload packets.

– Eugen Rieck
Feb 11 at 10:59

I did some more digging around. I wrote a script to fire out a whole bunch of broadcast packets. At the same time, I ran tcpdump on all 4 individual NICs + the bond interface. While the broadcast traffic was mostly appearing only on 1 NIC, the odd packet did end up hitting another NIC. It didn't seem to increase the rate of dropped packets, though. The box has been running for a couple of hours, and with my several VMs and containers running + me pelting it with broadcast packets, bond0 has dropped out ~700 of 1437352 RX packets. This is representative of what I've been seeing.

– NOP
Feb 13 at 2:13

Ok, to try do some further digging I shut down a particularly chatty VM on the box, which cleared up tcpdump quite a bit. Looking at the tcpdump of the bond interface while watching the interface statistics, there's very strong correlation between ARP requests being on the wire and packets getting dropped. Still sitting at around ~1.5% dropped RX packets on the bond. I guess this makes sense, right?

– NOP
Feb 13 at 2:44

ARP packets are the canonical example of broadcast traffic, so this does tend to confirm our suspicions. TBH I don't know how ARP over an LACP trunk is actually supposed to be broadcast - maybe an "all ports" policy is the standrad?

– Eugen Rieck
Feb 13 at 8:05

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1404381%2fproxmox-packet-loss-on-bonded-nics-in-lacp-mode%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

answered Feb 11 at 10:32

Eugen Rieck

11.2k22429

Hmm, I did suspect something like this. That would make sense (from the math, being that there's basically the same number of dropped packets across each NIC). Is there a way to tell exactly what's getting dropped? I ran dropwatch on the machine, but I really couldn't interpret the output well enough to conclude one way or the other.

– NOP
Feb 11 at 10:41

My diagnostics were rather basic: Increase rate of broadcast packets and see, if drop rate increases acordingly. I didn't investigate much further, as everything was working fine with no loss of payload packets.

– Eugen Rieck
Feb 11 at 10:59

I did some more digging around. I wrote a script to fire out a whole bunch of broadcast packets. At the same time, I ran tcpdump on all 4 individual NICs + the bond interface. While the broadcast traffic was mostly appearing only on 1 NIC, the odd packet did end up hitting another NIC. It didn't seem to increase the rate of dropped packets, though. The box has been running for a couple of hours, and with my several VMs and containers running + me pelting it with broadcast packets, bond0 has dropped out ~700 of 1437352 RX packets. This is representative of what I've been seeing.

– NOP
Feb 13 at 2:13

Ok, to try do some further digging I shut down a particularly chatty VM on the box, which cleared up tcpdump quite a bit. Looking at the tcpdump of the bond interface while watching the interface statistics, there's very strong correlation between ARP requests being on the wire and packets getting dropped. Still sitting at around ~1.5% dropped RX packets on the bond. I guess this makes sense, right?

– NOP
Feb 13 at 2:44

ARP packets are the canonical example of broadcast traffic, so this does tend to confirm our suspicions. TBH I don't know how ARP over an LACP trunk is actually supposed to be broadcast - maybe an "all ports" policy is the standrad?

– Eugen Rieck
Feb 13 at 8:05

add a comment |

answered Feb 11 at 10:32

Eugen Rieck

11.2k22429

Hmm, I did suspect something like this. That would make sense (from the math, being that there's basically the same number of dropped packets across each NIC). Is there a way to tell exactly what's getting dropped? I ran dropwatch on the machine, but I really couldn't interpret the output well enough to conclude one way or the other.

– NOP
Feb 11 at 10:41

My diagnostics were rather basic: Increase rate of broadcast packets and see, if drop rate increases acordingly. I didn't investigate much further, as everything was working fine with no loss of payload packets.

– Eugen Rieck
Feb 11 at 10:59

I did some more digging around. I wrote a script to fire out a whole bunch of broadcast packets. At the same time, I ran tcpdump on all 4 individual NICs + the bond interface. While the broadcast traffic was mostly appearing only on 1 NIC, the odd packet did end up hitting another NIC. It didn't seem to increase the rate of dropped packets, though. The box has been running for a couple of hours, and with my several VMs and containers running + me pelting it with broadcast packets, bond0 has dropped out ~700 of 1437352 RX packets. This is representative of what I've been seeing.

– NOP
Feb 13 at 2:13

Ok, to try do some further digging I shut down a particularly chatty VM on the box, which cleared up tcpdump quite a bit. Looking at the tcpdump of the bond interface while watching the interface statistics, there's very strong correlation between ARP requests being on the wire and packets getting dropped. Still sitting at around ~1.5% dropped RX packets on the bond. I guess this makes sense, right?

– NOP
Feb 13 at 2:44

ARP packets are the canonical example of broadcast traffic, so this does tend to confirm our suspicions. TBH I don't know how ARP over an LACP trunk is actually supposed to be broadcast - maybe an "all ports" policy is the standrad?

– Eugen Rieck
Feb 13 at 8:05

add a comment |

answered Feb 11 at 10:32

Eugen Rieck

11.2k22429

answered Feb 11 at 10:32

Eugen Rieck

11.2k22429

answered Feb 11 at 10:32

Eugen Rieck

11.2k22429

answered Feb 11 at 10:32

Eugen Rieck

11.2k22429

answered Feb 11 at 10:32

Eugen Rieck

11.2k22429

Hmm, I did suspect something like this. That would make sense (from the math, being that there's basically the same number of dropped packets across each NIC). Is there a way to tell exactly what's getting dropped? I ran dropwatch on the machine, but I really couldn't interpret the output well enough to conclude one way or the other.

– NOP
Feb 11 at 10:41

My diagnostics were rather basic: Increase rate of broadcast packets and see, if drop rate increases acordingly. I didn't investigate much further, as everything was working fine with no loss of payload packets.

– Eugen Rieck
Feb 11 at 10:59

I did some more digging around. I wrote a script to fire out a whole bunch of broadcast packets. At the same time, I ran tcpdump on all 4 individual NICs + the bond interface. While the broadcast traffic was mostly appearing only on 1 NIC, the odd packet did end up hitting another NIC. It didn't seem to increase the rate of dropped packets, though. The box has been running for a couple of hours, and with my several VMs and containers running + me pelting it with broadcast packets, bond0 has dropped out ~700 of 1437352 RX packets. This is representative of what I've been seeing.

– NOP
Feb 13 at 2:13

Ok, to try do some further digging I shut down a particularly chatty VM on the box, which cleared up tcpdump quite a bit. Looking at the tcpdump of the bond interface while watching the interface statistics, there's very strong correlation between ARP requests being on the wire and packets getting dropped. Still sitting at around ~1.5% dropped RX packets on the bond. I guess this makes sense, right?

– NOP
Feb 13 at 2:44

ARP packets are the canonical example of broadcast traffic, so this does tend to confirm our suspicions. TBH I don't know how ARP over an LACP trunk is actually supposed to be broadcast - maybe an "all ports" policy is the standrad?

– Eugen Rieck
Feb 13 at 8:05

add a comment |

Hmm, I did suspect something like this. That would make sense (from the math, being that there's basically the same number of dropped packets across each NIC). Is there a way to tell exactly what's getting dropped? I ran dropwatch on the machine, but I really couldn't interpret the output well enough to conclude one way or the other.

– NOP
Feb 11 at 10:41

My diagnostics were rather basic: Increase rate of broadcast packets and see, if drop rate increases acordingly. I didn't investigate much further, as everything was working fine with no loss of payload packets.

– Eugen Rieck
Feb 11 at 10:59

I did some more digging around. I wrote a script to fire out a whole bunch of broadcast packets. At the same time, I ran tcpdump on all 4 individual NICs + the bond interface. While the broadcast traffic was mostly appearing only on 1 NIC, the odd packet did end up hitting another NIC. It didn't seem to increase the rate of dropped packets, though. The box has been running for a couple of hours, and with my several VMs and containers running + me pelting it with broadcast packets, bond0 has dropped out ~700 of 1437352 RX packets. This is representative of what I've been seeing.

– NOP
Feb 13 at 2:13

Ok, to try do some further digging I shut down a particularly chatty VM on the box, which cleared up tcpdump quite a bit. Looking at the tcpdump of the bond interface while watching the interface statistics, there's very strong correlation between ARP requests being on the wire and packets getting dropped. Still sitting at around ~1.5% dropped RX packets on the bond. I guess this makes sense, right?

– NOP
Feb 13 at 2:44

ARP packets are the canonical example of broadcast traffic, so this does tend to confirm our suspicions. TBH I don't know how ARP over an LACP trunk is actually supposed to be broadcast - maybe an "all ports" policy is the standrad?

– Eugen Rieck
Feb 13 at 8:05

Hmm, I did suspect something like this. That would make sense (from the math, being that there's basically the same number of dropped packets across each NIC). Is there a way to tell exactly what's getting dropped? I ran dropwatch on the machine, but I really couldn't interpret the output well enough to conclude one way or the other.

– NOP
Feb 11 at 10:41

My diagnostics were rather basic: Increase rate of broadcast packets and see, if drop rate increases acordingly. I didn't investigate much further, as everything was working fine with no loss of payload packets.

– Eugen Rieck
Feb 11 at 10:59

I did some more digging around. I wrote a script to fire out a whole bunch of broadcast packets. At the same time, I ran tcpdump on all 4 individual NICs + the bond interface. While the broadcast traffic was mostly appearing only on 1 NIC, the odd packet did end up hitting another NIC. It didn't seem to increase the rate of dropped packets, though. The box has been running for a couple of hours, and with my several VMs and containers running + me pelting it with broadcast packets, bond0 has dropped out ~700 of 1437352 RX packets. This is representative of what I've been seeing.

– NOP
Feb 13 at 2:13

Ok, to try do some further digging I shut down a particularly chatty VM on the box, which cleared up tcpdump quite a bit. Looking at the tcpdump of the bond interface while watching the interface statistics, there's very strong correlation between ARP requests being on the wire and packets getting dropped. Still sitting at around ~1.5% dropped RX packets on the bond. I guess this makes sense, right?

– NOP
Feb 13 at 2:44

ARP packets are the canonical example of broadcast traffic, so this does tend to confirm our suspicions. TBH I don't know how ARP over an LACP trunk is actually supposed to be broadcast - maybe an "all ports" policy is the standrad?

– Eugen Rieck
Feb 13 at 8:05

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Super User!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtyktl