Home » Tech » 53rd Linux kernel container function-CPU limit bandwidth used in cgroup v2 (1) | gihyo.jp

53rd Linux kernel container function-CPU limit bandwidth used in cgroup v2 (1) | gihyo.jp

Last timecover the topic of using io controller from cgroup v2. I thought about writing about the io controller this time too, but the functions of the io controller in cgroup v2 are more difficult than I imagined, and I did not have enough understanding to write an article, so this time I will start from. cgroup v2 I would like to talk about the CPU controller used[1]

I wrote about the CPU controller in this series in 2014Chapter 4tea At that time there was no cgroup v2 yet, so cgroup v1[2]It was a used explanation

Later, cgroup v2 was implemented, and many features that can only be used with cgroup v2 were implemented.Last timeThe same applies to the cooperation between the io controller and the memory controller presented.

nonetheless, in fact, cgroup v1 and v2 CPU controllers have almost no functional difference. Features added after cgroup v2 became stable are also well implemented in v1[3].nonetheless, for cgroup v2, the interface file isChapter 49It follows the conventions presented in , and the specifications can be changed for the improvement of the details.

Therefore, this time, I would like to introduce the operation to limit CPU bandwidth from cgroup v2, and also introduce the functions that have been added since then. And let’s dig a little deeper into the features that limit the bandwidth we use in the CPU controller.

Bandwidth limitation is done by the CPU controller

Let’s start with a quick explanation of how a bandwidth limit is set and how it works.

In cgroup v2, the files in Table 1 are the main files used to limit bandwidth using the CPU controller. Method setting in v1 isChapter 4Please take a look.

Table 1 File usage limits bandwidth on the CPU controller
parameter name function operate
cpu.max The period used as the unit for the width limit and the limit value of this time unit. The format is “(制限値⁠⁠期間)(separate space⁠). The unit is microsecond. Default value is “max 100000 reading and writing
cpu.stat View CPU usage statistics by jobs in a cgroup loaded

In cgroup v1, each setting value had a separate file, and each file contained only one value. In contrast, v2 writes two values ​​to a single file. It will output in the same format as it.Chapter 49explained inMultiple values ​​are separated by spacesThe equivalent

cpu.maxset the period and limit.setperiodBetween,limitThe CPU can only be used for the set time. This is the same as in v1.

Default value is 100 ms duration, no limitmaxstring set.maxmeans unlimited.

The limit set here is per CPU. If you have an environment with multiple CPU cores and want to limit to 2 CPUs when the period is set to 100 ms, use “200000 100000“will put

Simply illustratedFigure 1It seems. here as a period100000but 1)no limit, 2)as a limit value50000is put

Figure 1 bandwidth limit

Let’s say you have a job that needs 200 ms to work. If only this task can use the CPU, if the limit value is not set, see Figure 1-1)Process completes in 200 milliseconds like

nonetheless, if we set 50 milliseconds as the limit value here, Figure 1-2)Only 50 milliseconds out of 100 milliseconds can use the CPU, so the process finishes 350 milliseconds after it starts.

After explaining how the width limitation works in Figure 1,cpu.statLet’s take a look at the contents of the file.

Table 2 cpu.stat Content file
key name meaning appear
usage_usec CPU time used by tasks in the cgroup(microseconds) cgroup v2 only
user_usec User CPU time used by work in the cgroup(microseconds) cgroup v2 only
system_usec System CPU time used by work in the cgroup(microseconds) cgroup v2 only
nr_periods The number of periods during which jobs in the cgroup could run always
nr_throttled The number of times a task in the cgroup has reached its limit and is limited always
throttled_time(v1)/throttled_usec(v2) The total amount of time that jobs in the cgroup failed to run because they reached their limit(v1: nanosecond, v2: microsecond) always(v1 and v2 have different key names)
nr_bursts The number of periods during which the burst occurred in the cgroup 5.16 cores or later
burst_time(v1)/ burst_usec(v2) The total time spent not working in the cgroup is over the limit(v1: nanosecond, v2: microsecond) 5.16 cores or later(v1 and v2 have different key names)

where the first three in table 2 are cgroup v1cpu.statdoes not appear in the record. This is probably because v1 has a cpuacct controller, which has the same stats. nonetheless, the values ​​you get with v1’s cpuacct controller and v2’scpu.statNote that the values ​​found in the file have different units.

The next three items appear regardless of the cgroup version and the kernel version. Among them, the key name isthrottoled_Items starting with have different units for v1 and v2. That’s why the key names are different. v1 is in nanoseconds, v2 is in microseconds, and v1 at the end of the key name is_timeand in v2_useccame back

nr_burstandburst_usec(in v1burst_time)is 5Now it appears with 16 cores. We will provide explanations related to these two items in the next section. By the way, the setting I used when I wrote this article is 5.Ubuntu 22 which is 15 cores.04, so these two items were not there.

At that timenr_periodsnr_throttledthrottled_timefor the three values ​​shown in Figure 1-2)I will try to explain using Figure 1-2)explain what values ​​are set for these three values ​​from when the graph starts to when the statistics start and 400ms has passed.

  • nr_periods: Because the task is using the CPU for 4 times in a period of 100 ms”4
  • nr_throttled: I hit the limit 3 times.”3
  • throttled_usec150

The value is entered like this.

Bandwidth limit set in cgroup v2

So far, I have explained the files used to limit bandwidth using the cgroup v2 CPU controller.Chapter 4Let’s set the limit value as we did in , and see the movement.

The running example here is Ubuntu 22.I run in an environment 04. Ubuntu 22.04 uses v2 as the cgroup, which is the root cgroup, and is set to use the CPU controller in the subordinate cgroups. For settings when using controllers in cgroups, seeChapter 38Please take a look.

Here we create a cgroup called “test01” with a period of 100 ms, a limit of 50 ms, and a shell PID.

$ sudo mkdir /sys/fs/cgroup/test01 (test01 cgroupの作成)
$ echo "50000 100000" | sudo tee /sys/fs/cgroup/test01/cpu.max
$ echo 5467 | sudo tee /sys/fs/cgroup/test01/cgroup.procs
(シェルのPIDをtest01 cgroupに登録)

Now run this command on the shell with PID:5467.

$ while :; do true ; done

Start another shell withtopWhen you execute the command, you can see that the CPU usage rate is 50% as shown below, and the limit is working as set.

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND 
   5467 tenforw+  20   0    5044   4172   3504 R  50.0   0.1   0:05.24 bash    

cpu.statLet’s try it for a while so you can see what the file is. Create the “test01” cgroup as in the previous example, and set the period and limit to “50000 100000“Likecpu.maxput in then register the shell’s PID in “test01”.

$ sudo mkdir /sys/fs/cgroup/test01/
$ echo "50000 100000" | sudo tee /sys/fs/cgroup/test01/cpu.max
50000 100000
$ echo 5467 | sudo tee /sys/fs/cgroup/test01/cgroup.procs 

In this state, execute the following command in the shell with PID: 5467.

$ timeout 1 yes > /dev/null

Full CPU usage for 1 second. Immediately after the execution, move the shell from the “test01” cgroup to the root of the cgroup so that it does not count in “test01”, and the “test01”.cpu.statI will take a look

$ cat /sys/fs/cgroup/test01/cpu.stat
nr_periods 17
nr_throttled 10
throttled_usec 495760

The period is 100 milliseconds, so if you run it for 1 second, it will be 10 times. During this time, the limit should be in effect all the time, so the number of times you are limited should be 10. When I checked, the street value isnr_throttledwill appear in It should only use half the CPU per second.throttled_usecThe value is as expected, almost 500 milliseconds.nr_periodsAs for the value immediately after cgroup creation,10He has not come

Assign quotas to CPUs

Now let’s talk about how bandwidth throttling actually works.

The limit is not designed in such a way that it counts usage and caps when the limit is about to be exceeded. Rather than restricting, it is more appropriate to give the right to use the CPU up to the limit value. CPU time allocation up to this limit, herequotacalled

In a large system with many CPUs, counting and limiting usage requires the total usage per CPU. When you do this often, it can be a big burden. It is better to assign up to the limit value in advance in a place independent of the CPU, and then assign to the CPU from there, releasing the load for aggregation.

The quotas assigned to cgroups are CPU-independent global quotas for each cgroup.“Pool”inslice”accumulated in the units Then when a task that can be run is assigned to the CPU, it is received slice by slice in this pool.

Figure 2 Allocation of slices from the CPU pool

This slice is the sysctl parameter sched_cfs_bandwidth_slice_us defined in

$ sudo sysctl -a | grep sched_cfs_bandwidth_slice_us
kernel.sched_cfs_bandwidth_slice_us = 5000

As above, 5 milliseconds is set by default. In other words, CPUs are allocated from the quota pool in 5 ms increments. Increasing this value decreases transfer overhead. Also, lower values ​​allow finer control and may need to be adjusted along with duration, limits, etc., depending on the nature of the workload.

If we set the limit to 50ms as in the previous example,Figure 2This means that 10 installments of them will receive money from the global pool each period. And when the deadline is over, it will be reset, and 10 new items will be assigned according to the set value.

Now let’s look at how slices are transferred from the global quota pool to multiple CPUs and use them.

Figure 3 How the quota is transferred to each CPU and used

Figure 3there is a 20ms limit set for the cgroup and there are 2 CPUs.

  • In (1), a slice was transferred due to a request from a task that used the CPU on CPU1, and the task was executed for 5 milliseconds and the transferred slice was used.
  • In (2), a slice was transferred due to a request from a task that used the CPU on CPU2, and the task was executed for 5 milliseconds and the transferred slice was used.
  • In (3), a request is made by the task that uses the CPU on CPU1, the slice is transferred, and the task goes to 2.It runs for 5 milliseconds, and then there is a request from the job again on CPU1, and the remaining quota is reduced to 2.5 milliseconds have passed
  • In (4), a slice was transferred due to a request from a task that used the CPU on CPU2, and the task was executed for 5 milliseconds and the transferred slice was used.
  • In (5), a request was made to a task that uses the CPU on CPU1. nonetheless, there are no more slices left in the global pool, so the task cannot be executed during this period.
  • After that, 100 milliseconds passed in ⑥, and the quota was filled by 20 milliseconds as the next period entered.

I explained how the slices are transferred to the global pool and used every time the CPU makes a request.

If the allocated tranches are not used during the period, the tranches will be reset at the end of the period and new tranches will be allocated for the limit value in the next period.


So far, we have used a very simple, simplified case to illustrate how bandwidth is made in the CPU controller.

Here“Use a simplified case”I said that for a reason. The illustrations in Figures 1 and 3 are fairly ideal situations.

In Figure 3, each job used the given slice, and all finished in exactly 5 milliseconds. Many of you may notice that this is not actually the case, and that there are cases where the process ends without using up the slices, and there are cases where the process time is longer.

In fact, bandwidth limitation is not a process of simply allocating slices to the CPU, resetting the allocation up to that point when the period comes, and assigning a new slice in the next period, as explained in this article. nonetheless, the basic idea is as described here.

Next time, I will explain this slice allocation and return behavior, and look a little better at bandwidth limits.


Documents used to write this articleHere is the article: The content that explains this time is written in the core document. Also, if you compare it with the diagram in this article, you will immediately get a sense of it, Indeed engineeringThe blog post was very helpful in understanding the functionality.

#53rd #Linux #kernel #container #functionCPU #limit #bandwidth #cgroup #gihyo.jp

Leave a Comment