Xen v4.8 on Fedora v26 IRQ balance for xen_netback (vif)
A Xen dom0 host isn't distributing network (xen_netback) interrupts across cores, resulting in poor network performance (video is broken and exibits pixelation). All interrupts are serviced on the first dom0 core.
The dom0 host is given 4G of RAM and 4 cores. All netback (vif) interfaces are serviced by th first core of dom0. The irqbalance daemon (v1.2) is running in both dom0 and domU.
Theory
The kernel or and tools for Fedora with kernel v4.13.4 don't match the output of the '/proc/interrupts'. These have a sligtly different format where the name of Xen event based interrupts are split from "xen-dyn-event" to "xen-dyn -event". This means that IRQBalance doesn't recognise those interrupts. For example the '/proc/interrupts' from Fedora with v4.13.4 kernel:
18: 71817 0 xen-dyn -event eth0-q0-tx 19: 108047 0 xen-dyn -event eth0-q0-rx 20: 52184 0 xen-dyn -event eth0-q1-tx 21: 37035 0 xen-dyn -event eth0-q1-rx
241: 258 158517 0 0 xen-dyn -event vif26.0-q0-tx 242: 1 0 0 0 xen-dyn -event vif26.0-q0-rx 243: 303 0 0 157885 xen-dyn -event vif26.0-q1-tx 244: 1 0 0 0 xen-dyn -event vif26.0-q1-rx 245: 133 378112 0 0 xen-dyn -event vif26.1-q0-tx 246: 1 0 0 0 xen-dyn -event vif26.1-q0-rx 247: 1400 0 0 3457479 xen-dyn -event vif26.1-q1-tx 248: 1 0 0 0 xen-dyn -event vif26.1-q1-rx
In the dom0 VM the vif interface interrupts are not handled distributed intelligently across the available cores (e.g. the VIF domain, VIF interface index, VIF queue index & VIF tx/rx interrupts are not evenly distributed).
Workaround
Add a small IRQBalance policy script that very crudely distributes 'xen-dyn' interrupts across the available cores of dom0 .
Steps:
- put in place the '/usr/local/bin/irqbalance-policyscript'
- Change the irqbalance settings in '/etc/sysconfig/irqbalance'
- restart irqbalance
Install the script (see below).
Change the '/etc/sysconfig/irqbalance' settings:
IRQBALANCE_ARGS="--policyscript=/usr/local/bin/irqbalance-policyscript"
The script will:
- only change the balance for 'xen-dyn' interrupts
- it will not make any changes for devices on a PCI bus
- it will not make changes for 'xen-pirq' or 'xen-percpu' interrupts
- will distribute the interrupts statically (i.e. only once)
- won't distribute based on the number of interrupts service by the core/cpu
- makes a very crude guess at the number of CPUs active
- assumes that the CPUs are numbers 0...(n-1) (i.e. zero based CPU number)
The script has many limitations and assumptions, BUT it much much better than not having the script.
Links
- https://xenserver.org/blog/entry/dundee-networking-multi-queue.html
- https://github.com/Irqbalance/irqbalance
- https://wiki.xen.org/wiki/Network_Throughput_and_Performance_Guide
- https://wiki.xen.org/wiki/Xen_network_TODO
Appendices
IRQbalance Policy Script
#!/bin/bash # # Argsments are # $1 PCI device name (or /sys if unknown) # $2 IRQ number # # Xen on Fedora seems to be unable to spead the interrupts for # the xen backend devices across the available cores. This script # takes a simple approach of using the interrupt number modulus the # number of CPU cores and assigning the smp_affinity that core. # # This makes a number of assumptions (some of which are known to be # bad). For example: # # - the available CPU count may not be sequential or start from zero # - that each IRQ will have an even load on the system # # Devices that are backed by PCI devices are not modified. DEVICE=$1 IRQ=$2 CPU_COUNT=$( find /sys/devices/system/cpu/ -maxdepth 1 -type d -name 'cpu[0-9]*' | wc -l ) if [ "${DEVICE}" == "/sys" ] ; then CPU_NUMBER=$(( ${IRQ} % ${CPU_COUNT} )) CHIP_NAME=$( cat /sys/kernel/irq/${IRQ}/chip_name ) # # This should handle device names like: # blkif-backend # vif<domain>.<if>[-q<#>[-rx|-tx]] # evtchan: # evtchan:xenstored # evtchan:xenconsoled # evtchan:qemu-system-i<id> # if [ "${CHIP_NAME}" == "xen-dyn" ] ; then echo ${CPU_NUMBER} > /proc/irq/${IRQ}/smp_affinity_list echo ban=true fi fi
Host info
# xl info host : blue.lucidsolutions.co.nz release : 4.13.4-200.fc26.x86_64 version : #1 SMP Thu Sep 28 20:46:39 UTC 2017 machine : x86_64 nr_cpus : 12 max_cpu_id : 11 nr_nodes : 2 cores_per_socket : 6 threads_per_core : 1 cpu_mhz : 2600 hw_caps : 178bf3ff:80802001:efd3fbff:000037ff:00000000:00000000:00000000:00000100 virt_caps : hvm total_memory : 57343 free_memory : 13888 sharing_freed_memory : 0 sharing_used_memory : 0 outstanding_claims : 0 free_cpus : 0 xen_major : 4 xen_minor : 8 xen_extra : .2 xen_version : 4.8.2 xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : xen_commandline : placeholder dom0_mem=4G,max:8G dom0_max_vcpus=4 dom0_vcpus_pin cc_compiler : gcc (GCC) 7.1.1 20170622 (Red Hat 7.1.1-3) cc_compile_by : mockbuild cc_compile_domain : [unknown] cc_compile_date : Tue Sep 12 21:57:03 UTC 2017 build_id : baca4c8c5a903568230d6f6d45411c4a15ae92f2 xend_config_format : 4
Fedora 26 dom0
The Federa 26 kernel (and net back driver) supports multiple queues and seperate tx & rx interrupts:
# modinfo xen-netfront filename: /lib/modules/4.13.4-200.fc26.x86_64/kernel/drivers/net/xen-netfront.ko.xz alias: xennet alias: xen:vif license: GPL description: Xen virtual network device frontend depends: intree: Y name: xen_netfront vermagic: 4.13.4-200.fc26.x86_64 SMP mod_unload signat: PKCS#7 signer: sig_key: sig_hashalgo: md4 parm: max_queues:Maximum number of queues per virtual interface (uint) # grep . /sys/module/xen_netback/parameters/* /sys/module/xen_netback/parameters/fatal_skb_slots:20 /sys/module/xen_netback/parameters/hash_cache_size:64 /sys/module/xen_netback/parameters/max_queues:4 /sys/module/xen_netback/parameters/rx_drain_timeout_msecs:10000 /sys/module/xen_netback/parameters/rx_stall_timeout_msecs:60000 /sys/module/xen_netback/parameters/separate_tx_rx_irq:Y
CentOS v6.x domU
The CentOS v6 kernel (and netfront driver) does not support multiqueue. Thus each NIC is seviced by a single core in each VM (irqbalance is changing the smp_affinity to balance the number of interrupts).
$ cat /proc/interrupts CPU0 CPU1 272: 642971 803006 xen-dyn-event eth1 273: 334965 38472827 xen-dyn-event eth0 274: 492 3 xen-dyn-event blkif 275: 17177 31729 xen-dyn-event blkif 276: 27 0 xen-dyn-event hvc_console 277: 504 0 xen-dyn-event xenbus 278: 0 11235 xen-percpu-ipi callfuncsingle1 279: 0 0 xen-percpu-virq debug1 280: 0 0 xen-percpu-ipi callfunc1 281: 0 547309 xen-percpu-ipi resched1 282: 0 10189651 xen-percpu-virq timer1 283: 2454 0 xen-percpu-ipi callfuncsingle0 284: 0 0 xen-percpu-virq debug0 285: 0 0 xen-percpu-ipi callfunc0 286: 1031998 0 xen-percpu-ipi resched0 287: 7013506 0 xen-percpu-virq timer0 NMI: 0 0 Non-maskable interrupts LOC: 0 0 Local timer interrupts SPU: 0 0 Spurious interrupts PMI: 0 0 Performance monitoring interrupts IWI: 0 0 IRQ work interrupts RES: 1031998 547309 Rescheduling interrupts CAL: 2454 11235 Function call interrupts TLB: 0 0 TLB shootdowns TRM: 0 0 Thermal event interrupts THR: 0 0 Threshold APIC interrupts MCE: 0 0 Machine check exceptions MCP: 0 0 Machine check polls ERR: 0 MIS: 0
CentOS v7.x domU
The CentOS v7 kernel (and netfront driver) supports both multiqueue and seperate tx and rx interrupts
# grep . /proc/irq/*/smp_affinity_list /proc/irq/16/smp_affinity_list:0 /proc/irq/17/smp_affinity_list:0 /proc/irq/18/smp_affinity_list:0 /proc/irq/19/smp_affinity_list:0 /proc/irq/20/smp_affinity_list:0 /proc/irq/21/smp_affinity_list:0 /proc/irq/22/smp_affinity_list:0 /proc/irq/23/smp_affinity_list:1 /proc/irq/24/smp_affinity_list:1 /proc/irq/25/smp_affinity_list:1 /proc/irq/26/smp_affinity_list:1 /proc/irq/27/smp_affinity_list:1 /proc/irq/28/smp_affinity_list:1 /proc/irq/29/smp_affinity_list:1 /proc/irq/30/smp_affinity_list:1 /proc/irq/31/smp_affinity_list:0 /proc/irq/32/smp_affinity_list:1 /proc/irq/33/smp_affinity_list:0 /proc/irq/34/smp_affinity_list:1 /proc/irq/35/smp_affinity_list:1 /proc/irq/36/smp_affinity_list:1 /proc/irq/37/smp_affinity_list:1
# cat /proc/interrupts CPU0 CPU1 16: 1900857 0 xen-percpu-virq timer0 17: 0 0 xen-percpu-ipi spinlock0 18: 1940532 0 xen-percpu-ipi resched0 19: 0 0 xen-percpu-ipi callfunc0 20: 0 0 xen-percpu-virq debug0 21: 1175 0 xen-percpu-ipi callfuncsingle0 22: 0 0 xen-percpu-ipi irqwork0 23: 0 1917993 xen-percpu-virq timer1 24: 0 0 xen-percpu-ipi spinlock1 25: 0 1906799 xen-percpu-ipi resched1 26: 0 0 xen-percpu-ipi callfunc1 27: 0 0 xen-percpu-virq debug1 28: 0 1340 xen-percpu-ipi callfuncsingle1 29: 0 0 xen-percpu-ipi irqwork1 30: 724 0 xen-dyn-event xenbus 31: 500 64 xen-dyn-event hvc_console 32: 9582 156749 xen-dyn-event eth0-q0-tx 33: 4518 7091 xen-dyn-event eth0-q0-rx 34: 4448 166296 xen-dyn-event eth0-q1-tx 35: 24514 387232 xen-dyn-event eth0-q1-rx 36: 366572 9422 xen-dyn-event eth1-q0-tx 37: 1572 407 xen-dyn-event eth1-q0-rx 38: 913320 2606663 xen-dyn-event eth1-q1-tx 39: 2491893 2283177 xen-dyn-event eth1-q1-rx 40: 8173 27177 xen-dyn-event blkif 41: 569 860 xen-dyn-event blkif NMI: 0 0 Non-maskable interrupts LOC: 0 0 Local timer interrupts SPU: 0 0 Spurious interrupts PMI: 0 0 Performance monitoring interrupts IWI: 0 0 IRQ work interrupts RTR: 0 0 APIC ICR read retries RES: 1940532 1906799 Rescheduling interrupts CAL: 793 941 Function call interrupts TLB: 382 399 TLB shootdowns TRM: 0 0 Thermal event interrupts THR: 0 0 Threshold APIC interrupts DFR: 0 0 Deferred Error APIC interrupts MCE: 0 0 Machine check exceptions MCP: 281 281 Machine check polls ERR: 0 MIS: 0 PIN: 0 0 Posted-interrupt notification event PIW: 0 0 Posted-interrupt wakeup event
Fedora 26 domU
The domU host supports both multi-queue and seperate tx and rx interrupts, BUT they are all being service by the first of two cores (smp affinity is for both cores). The irqbalance daemon (irqbalance-1.2.0-2.fc26) is running.
$ cat /proc/interrupts CPU0 CPU1 0: 9597 0 xen-percpu -virq timer0 1: 2489 0 xen-percpu -ipi resched0 2: 0 0 xen-percpu -ipi callfunc0 3: 0 0 xen-percpu -virq debug0 4: 396 0 xen-percpu -ipi callfuncsingle0 5: 1 0 xen-percpu -ipi irqwork0 6: 0 12066 xen-percpu -virq timer1 7: 0 4403 xen-percpu -ipi resched1 8: 0 0 xen-percpu -ipi callfunc1 9: 0 0 xen-percpu -virq debug1 10: 0 1297 xen-percpu -ipi callfuncsingle1 11: 0 0 xen-percpu -ipi irqwork1 12: 572 0 xen-dyn -event xenbus 13: 27 0 xen-dyn -event hvc_console 14: 2720 0 xen-dyn -event blkif 15: 2119 0 xen-dyn -event blkif 16: 80 0 xen-dyn -event blkif 17: 51 0 xen-dyn -event blkif 18: 29 0 xen-dyn -event eth0-q0-tx 19: 251 0 xen-dyn -event eth0-q0-rx 20: 304 0 xen-dyn -event eth0-q1-tx 21: 24 0 xen-dyn -event eth0-q1-rx NMI: 0 0 Non-maskable interrupts LOC: 0 0 Local timer interrupts SPU: 0 0 Spurious interrupts PMI: 0 0 Performance monitoring interrupts IWI: 1 0 IRQ work interrupts RTR: 0 0 APIC ICR read retries RES: 2489 4403 Rescheduling interrupts CAL: 396 1297 Function call interrupts TLB: 0 0 TLB shootdowns TRM: 0 0 Thermal event interrupts THR: 0 0 Threshold APIC interrupts DFR: 0 0 Deferred Error APIC interrupts MCE: 0 0 Machine check exceptions MCP: 1 1 Machine check polls ERR: 0 MIS: 0 PIN: 0 0 Posted-interrupt notification event NPI: 0 0 Nested posted-interrupt event PIW: 0 0 Posted-interrupt wakeup event
$ grep . /proc/irq/*/smp_affinity_list /proc/irq/0/smp_affinity_list:0 /proc/irq/10/smp_affinity_list:1 /proc/irq/11/smp_affinity_list:1 /proc/irq/12/smp_affinity_list:0-1 /proc/irq/13/smp_affinity_list:0-1 /proc/irq/14/smp_affinity_list:0-1 /proc/irq/15/smp_affinity_list:0-1 /proc/irq/16/smp_affinity_list:0-1 /proc/irq/17/smp_affinity_list:0-1 /proc/irq/18/smp_affinity_list:0-1 /proc/irq/19/smp_affinity_list:0-1 /proc/irq/1/smp_affinity_list:0 /proc/irq/20/smp_affinity_list:0-1 /proc/irq/21/smp_affinity_list:0-1 /proc/irq/2/smp_affinity_list:0 /proc/irq/3/smp_affinity_list:0 /proc/irq/4/smp_affinity_list:0 /proc/irq/5/smp_affinity_list:0 /proc/irq/6/smp_affinity_list:1 /proc/irq/7/smp_affinity_list:1 /proc/irq/8/smp_affinity_list:1 /proc/irq/9/smp_affinity_list:1