Day 2: Conference: the new Linux VM

After being in the crowd to listen to Julian's talk on Memtest and offer a few comments as well as answer a few questions, Rik van Riel who now works for Conectiva Linux gave us an overview of the new Linux Virtual Memory architecture


Rik first explained how we have different kinds of memory from registers to level 1-3 cache and main memory and that, as we know, the more memory you can afford for a specific technology, the slower it is (i.e. you can't quite get 512MB of level 2 cache memory)
RAM is at the same time 100 times slower than the CPU core and yet it's also 100,000 times faster than disk. When you are out of RAM, you swap and page faults are really really slow so you want to avoid them as much as possible.

The goal is to swap out the page of memory that we'll need in the longest amount of time, but of course this requires knowing the future, so since no one has yet written the crystal ball algorithm, we have to do other things.
One common algorithm is LRU, where the least recently used page gets swapped to disk. The problem with this algorithm is that if a multi gigabyte dataset is being dealt with, all the other applications that are actually used get swapped.

Least frequently used (LFU) works better in that respect but there are also problems. For instance multi pass compilers will touch the same data set several times, and it can cause the compiler itself to be swapped.
You also have NRU, Not recently used (LRU approximation where you don't evict pages which were just used) and NFU, Not Frequently Used is an LFU approximation where pages which are used over and over again are kept in memory.

It's hard to get the best algorithm because it depends on the workload. Linux has page aging which is a mix of NRU and NFU that people can tweak to match what they're doing.

With streaming IO, data is typically used only once and pages are put in an inactive list so the goal is to detect that kind of memory access and not try to cache it.

As for memory management itself:
Linux 2.2 introduced do_try_to_free_pages() which was a simple algorithm with low overhead, but it didn't work very well in some cases. It didn't take into account the relative activity of the differnt cache sizes and didn't work well under wildly varying or heavy VM load.

2.4 attempt to fix those problems.
It has balanced page aging, multiqueue VM and smarter page flushing.
As a result, it is more robust under heavy loads, shared memory is integrated into the page cache, streaming IO is detected and dealt with accordingly.
Multiqueue VM works as follows: you have an active list, an inactive_dirty list (pages that might become reclaimable), and an inactive_clean list (pages that are certainly immediately reclaimable)

Other improvements allow a configurable RSS per process where you can limit the amount of resident memory that is being used by a process and the rest will be swapped.

Rik talked about a lot more, but being constrained with time, he flew through some of his slides a bit faster than we could follow. Hopefully, he'll get 2 hours or more next time :-).

[library] Picture library [back] Back to Main Page [next] Next page

[ms free site] Email
Link to Home Page

2001/01/28 (14:01): Version 1.0