Tuned indirect update
You can use tuned
for CPU partitioning: isolating CPUs from use by the Linux OS, except a CPU usage measurement made by the kernel every second. You can find more information at TUNED_PROFILES_CPU_PARTITIONING(7). Essentially, you must optimize this on the hardware, depending on number of CPUs and per NUMA node.
The following provides an example of the considerations, using the Dell PowerEdge R740 as an example:
[root@rhel-tiger-14-6 ~]# lscpu egrep "^(NUMA |Socket|Core|Thread)" Thread(s) per core: 2 Core(s) per socket: 18 Socket(s): 2 NUMA node(s): 2 NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71 [root@rhel-tiger-14-6 ~]# tuned-adm active Current active profile: throughput-performance [root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor wc -l 72 [root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor sort -u performance [root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq sort -u 3700000 [root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq sort -u 1200000 [root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq sort -u cat: '/sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq': No such file or directory
In this case, there are two NUMA nodes with one socket each, equating to 36 vCPUs per NUMA. Considering the FortiGate-VM sizing options, using 32 vCPUs per NUMA for VMs makes sense. This leaves four vCPUs for the host machine to use for housekeeping.
The current tuned profile is for throughout performance, which has some relevance to the CPU scaling governor settings. You can reperform these commands once you have configured tuned
as desired and the outputs here are for comparison. The current frequency is not set in this case:
[root@rhel-tiger-14-6 ~]# yum -y install tuned-profiles-cpu-partitioning <output omitted for brevity> [root@rhel-tiger-14-6 ~]# touch /etc/tuned/cpu-partitioning-variables.conf [root@rhel-tiger-14-6 ~]# chown root:root /etc/tuned/cpu-partitioning-variables.conf [root@rhel-tiger-14-6 ~]# chmod 644 /etc/tuned/cpu-partitioning-variables.conf [root@rhel-tiger-14-6 ~]# cat /etc/tuned/cpu-partitioning-variables.conf # Examples: # isolated_cores=2,4-7 # isolated_cores=2-23 # # To disable the kernel load balancing in certain isolated CPUs: # no_balance_cores=5-10 isolated_cores=4-35,40-71 no_balance_cores=4-35,40-71 [root@rhel-tiger-14-6 ~]# tuned-adm profile cpu-partitioning [root@rhel-tiger-14-6 ~]# tuned-adm active Current active profile: cpu-partitioning [root@rhel-tiger-14-6 ~]# reboot
In this example, vCPUs 0-3 and 36-39 are not declared and are the housekeeping resources, while the others are used in VMs.
When the tuned
profile is activated, changes are embedded itself into the kernel command line via GRUB:
[root@rhel-tiger-14-6 ~]# cat /proc/cmdline BOOT_IMAGE=(hd1,gpt2)/vmlinuz-4.18.0-305.25.1.el8_4.x86_64 root=/dev/mapper/vg1-root ro crashkernel=auto resume=/dev/mapper/vg1-swap rd.lvm.lv=vg1/root rd.lvm.lv=vg1/swap rhgb quiet intel-iommu=on iommu=pt hugepagesz=1G default_hugepagesz=1G hugepages=160 transparent_hugepage=never selinux=0 skew_tick=1 nohz=on nohz_full=4-35,40-71 rcu_nocbs=4-35,40-71 tuned.non_isolcpus=000000f0,0000000f intel_pstate=disable nosoftlockup
tuned
has added the following parameters:
Parameter |
Description |
---|---|
skew_tick=1 |
Ensures that the ticks per CPU do not occur simultaneously by skewing their start times. Skewing the start times of the per-CPU timer ticks decreases the potential for lock conflicts, reducing system jitter for interrupt response times. |
nohz=on |
Turns off the timer tick on an idle CPU. |
nohz_full=4-35,40-71 |
Turns off the timer tick on a CPU when there is only one runnable task on that CPU. Needs |
rcu_nocbs=4-35,40-71 |
To allow the user to move all RCU offload threads to a housekeeping CPU. |
tuned.non_isolcpus=000000f0,0000000f |
The CPU mask of the CPUs left for the host to use, in our example 000000f0,0000000f => 0x000000F00000000F => CPUs 0-3, 36-39 (c.f. CPU Affinity Calculator). |
intel_pstate=disable |
Prevents the Intel idle driver from managing power state and CPU frequency. |
nosoftlockup |
Prevents the kernel from detecting soft lockups in user threads. |
The isolcpus
parameter is considered deprecated. Instead, tuned
is using CPUsets/affinity to partition CPUs. It does what it can to keep the kernel noise/housekeeping away from CPUs that are intended to use for VMs.
Using the lowest number physical cores for housekeeping is good practice. You can use lstopo-no-graphics
to ensure that the appropriate ones are selected. The following shows a snippet of the lstopo-no-graphics
output:
[root@rhel-tiger-14-6 ~]# lstopo-no-graphics Machine (187GB total) Package L#0 NUMANode L#0 (P#0 93GB) L3 L#0 (25MB) L2 L#0 (1024KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 PU L#0 (P#0) PU L#1 (P#36) L2 L#1 (1024KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 PU L#2 (P#2) PU L#3 (P#38) L2 L#2 (1024KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 PU L#4 (P#4) PU L#5 (P#40) <output omitted for brevity> Package L#1 NUMANode L#1 (P#1 94GB) L3 L#1 (25MB) L2 L#18 (1024KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core L#18 PU L#36 (P#1) PU L#37 (P#37) L2 L#19 (1024KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core L#19 PU L#38 (P#3) PU L#39 (P#39) L2 L#20 (1024KB) + L1d L#20 (32KB) + L1i L#20 (32KB) + Core L#20 PU L#40 (P#5) PU L#41 (P#41) <output omitted for brevity>
- Core L#0 = Physical Core with hwloc index 0
- PU L#0 (P#0) = Processing Unit with hwloc index 0: processor 0
- PU L#1 (P#36) = Processing Unit with hwloc index 1: processor 36
You can see how the vCPU number relates to the physical core. In this case, physical core 0 has two threads identified as vCPU 0 and vCPU 36.
tuned
has also taken care of the CPU scaling governor. Clock scaling allows changing the CPU clock speed on the fly between a minimum and maximum value. This is good for some compute hardware but for a performant system, the CPU must work at the maximum frequency it can. It must be in performance mode:
[root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor wc -l 72 [root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor sort -u performance [root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq sort -u 3001000 [root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq sort -u 1200000 [root@rhel-tiger-14-6 ~]# cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq sort -u 3001000
Change is due to the addition of intel_pstate=disable
to the kernel command line. You no longer overclock the CPU. Overclocking is not advised as it makes the performance less predictable. When using vSPU, overclocking leads to the CPUs being permanently overclocked, which could lead to problems such as overheating.
In the case of the aforementioned outputs, you can see that all 72 CPUs have been set to performance mode. All CPUs are operating at the maximum frequency that they can support without overclocking. You configure this setting per CPU and should configure all CPUs used for the FortiGate-VM this way as a minimum requirement.