Seamless kernel upgrade

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

When replacing a kernel on a box we can do it without stopping critical activity. Checkpoint it, then replace the kernel (e.g. using kexec) then restore services back. In a perfect world the applications memory shouldn't be put to disk image, but should rather be kept in RAM.

Description of the concept

To upgrade a kernel on a running system one may use live kernel patching technology, but it has some limitations. Instead, there's a way to upgrade the kernel by really rebooting into it. The sequence of steps is

Suspend the processes or containers you need to keep
Reboot the kernel into new one using Kexec
Restore the suspended processes and containers from images

This way requires several optimizations.

Keep memory images in memory

Reading memory contents and writing it to disk on step 1 and reading the memory from disk and putting back into memory on step 3 is time consuming. But since we do have the memory to keep the memory images, it would be better to keep memory images in memory and make them stay there while doing Kexec.

For this the kernel patch is required. One approach was to implement a PMEM filesystem. (FIXME: link on lkml patch here).

Don't flush the page cache

Similar applies to disk cache pages -- dropping these pages and re-reading them after reboot from disk slows things down. And similar to the previous optimization, it would be good to keep page cache pages in PMEM.

Don't flush dirty page cache

Disk cache with dirty data should be flushed on disk before doing reboot, but, again, this makes things slower. We'd better keep the dirty pages in memory and flush them later. However, dirty meta data of FS should be written on disk, otherwise on reboot filesystem might want to replay the journal as it will appear to be dirty.

Issues

Problems of this approach are

Dirty metadata slows things down
Kexec doesn't work on some hardware
Kernel patching is needed
Kernel boots too slow on many-cores nodes
Accessing on-disk files when restoring doesn't hit dentry cache and thus slow

Seamless kernel upgrade

Contents

Description of the concept

Keep memory images in memory

Don't flush the page cache

Don't flush dirty page cache

Issues

See also

Navigation menu

Search