How to limit memory usage of a process under Linux

In various cases a process can easily eat up the memory of a server. This can happen fast and slowly(within weeks) as well. This article will show you how to find this process and how to limit its memory usage. The Linux itself does not limit the physichal memory usage of a process, either running under root privileges, or not.

Physichal memory handling in Linux

The physichal memory (RAM) is divided into equally sized parts, called memory pages. The size of pages depends on the architecture of the server, and setup by the OS. The most common page size under Linux is 4096 Bytes.
At the system boot, the vmlinuz is loaded to the very beginning of the RAM. The loading of the kernel and its modules requires some more memory, so the kernel allocates slab caches. When the allocated slab is not needed anymore, the kernel might free it. The pages used by the kernel will be never swapped out.
Beside the kernel the processes also require physichal pages for their code, static data and stack. The consumed physichal memory used a specific process called Resident Set Size, RSS.
The remaining RAM is used for page caching. There are two parts, one for caching the metadata of the filesystem (superblocks, inodes, etc.) the other part is for the data blocks stored on the disk. Monitoring applications like top, free, etc. shows two separate values(cached and buffer). The sum of these values is the page cache. The size of the page cache varies depending how much memory is used by the kernel and the processes.

In the rest of the memory the kernels keep a page pool to be able to fulfill the requests for a new page. When this pool drops bellow a certain limit the kernel retrives pages from the page cache (flush). or from processes (swaps out). In this case the RSS size of a process decreases. In some cases the kernel can steal pages from the slab.

mem1

This is an output from atop. “tot” shows the total physichal RAM, the “cache + buff” the page cache.

mem2

When watching the memory details with atop, the RSS of a process can be seen in the RSIZE column, the RGWOR show how the RSS size has been changed during the update interval.

Virtual memory handling in Linux

When a process starts in Linux, the kernel creates a virtual address space for it, this space describes all the memory a process can use. The size of the virtual space is determined by the text and data pages of the process, and some stack space. At this point no physichal memory used yet!

After the address space has been built the kernel fills the PC register with the address of the first instruction. The CPU tries to fetch the instrucion and notices that it is not in memory. The CPU generates a TRAP, handled by yhe kernel, and the kernel routine will load the missing data to the RAM. After that the kernel restarts the process from the same point. Pages, which are not referenced any time, won’t be loaded to the memory at all. The RSS can never be larger than the virtual size. For example the kernel creates a 80KB virtual space for a process, but only 20KB RSS (5 pages) are used.

Test case

Lets create a small C program, that allocates memory using malloc().

root@wheezy:~# cat leaker.c
#include <stdio.h>
#include <stdlib.h>

int main() {
int j=0;
for(j=0;j<10;j++)
{
int i=0;
char *array=malloc(128000000);
for(i=0;i<128000000;i++){array[i]=i%256;}
getchar();
}
}

The program allocates ~128MB of memory in a step. When starting the program we can see that it allocates Virtual pages and RSS as well. You can also see that our test machine has got 1 Gigs of RAM, and almost the half of it is used page caching.

mem3

After 3 more iterations the leaker allocates more 384MB of RAM. The kernel is stealing pages from the page cache. We still have around 200 MBs of RAM that can be used by other processes (page caches, free).

mem4

One more iteration, another 128 MB of allocation gets the system react more aggressive. The kernel gets back pages not only from the page caches but from the another processes. Begins to swap out to the disk! You can see how is the kernel stealing small amount of RAM pages from the other processes to fullfill the leakers request, the RSS are decreasing. (A sysctl parameter defines when does the kernel start to swap out the pages to the disk. (swapiness)).

mem5

One more iteration and the kernel runs out of physichal memory. Starts agressivly scanning for free pages and swapping out out other processes pages. Note that even if the kernel is not swapping, without page caches the access of block devices is much slower!

mem6

Two more iterations. Things getting worse. The kernel ran out of physichal memory. It’s seen that even the leaker can’t gen more RAM, the size of RSS decreases despite that the Virtual size grows. Kernel is swapping intensively, and now not just swapping out, but reading pages in for other processes. The disk system is getting loaded, the response time of all processes are getting worse.

mem7

As you can see, a simple tiny program(bug) can cause denial of service. There is a ulimit option to limit the maximum memory usage of a process, but that does not work anymore on linux systems (linux kernel 2.6+).

mem8

Solution

What to do then? From the version 2.6 of the Linux kernel there is a mechanism called control groups (cgroups).

“Cgroups allow you to allocate resources—such as CPU time, system memory, network bandwidth, or combinations of these resources—among user-defined groups of tasks (processes) running on a system. You can monitor the cgroups you configure, deny cgroups access to certain resources, and even reconfigure your cgroups dynamically on a running system”

Cgroups are implemented via a filesystem module (as /dev, /proc etc.), first check if the memory management is enabled in the kernel(!!!). The virtual cgroup filesystem should be mounted to a directory (Depending on the distribution it can already be done). This mount has to be done only once after boot, so it’s better to specify it in your /etc/fstab file:

(Some distros alreay create a cgroups filesystem. For Example /sys/fs/cgroup on Debian)

# mkdir /cgroups/mem
# mount -t cgroup -o memory none /cgroups/mem

To define a new memory cgroup for the leaking process(es) create a subdirectory below the mount point of the virtual cgroup filesystem.

# mkdir /cgroups/mem/leaker

The subdirectory is filled with all kind of pseudo files and subdirectories that can be used to control the properties of this cgroup. There is a file named “memory.limit_in_bytes“, this file can be used to set the total memory(RSS) limit for all processes that will run in this cgroup.

# echo 384M > /cgroup/mem/leaker/memory.limit_in_bytes

There is another file named “tasks“. PIDs (childs, threads also) in this file are belonging to this cgroup. Lets put the PID of the leaker into this control group.

# echo 4173 > /cgroup/memo/leakers/tasks

Now lets start the testing over. The leaker process can use only 384MB of physichal RAM. At the beginning there is plently of RAM, the system just rebooted, so the page cache is almost empty. The leaker has started and allocated 128MB of RAM (Virtual and RSS).

mem9

After 4 more iteraions (+ 512MB) the leaker reaches the limit we specified (640MB total vs. 384MB). You can see that the Virtual size is 640MB (total allocated pages) but the RSS is 319MB, due to the limit. The difference between the RSS and Virtual is swapped out to the disk. The kernel has not steal any pages from the processes!

mem10

Allocation of more 512MB RAM made the situation critical in the firs case, but now the system still have free RAM and page cache! The memory allocated by leaker simply landed in the swap. No other processes are affected!

mem11

As you can see the usage of cgroups can prevent processes to steal resources from other processes. Not just memory, but CPU time and others. More processes can be put in the same group.

The memory leak itself haven’t been fixed, but at least, it’s under control.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s