<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://criu.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=OpenGridSchedulerGridEngine</id>
	<title>CRIU - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://criu.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=OpenGridSchedulerGridEngine"/>
	<link rel="alternate" type="text/html" href="https://criu.org/Special:Contributions/OpenGridSchedulerGridEngine"/>
	<updated>2026-05-13T11:39:27Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.35.6</generator>
	<entry>
		<id>https://criu.org/index.php?title=Checkpoint/Restore&amp;diff=4207</id>
		<title>Checkpoint/Restore</title>
		<link rel="alternate" type="text/html" href="https://criu.org/index.php?title=Checkpoint/Restore&amp;diff=4207"/>
		<updated>2017-05-12T11:05:50Z</updated>

		<summary type="html">&lt;p&gt;OpenGridSchedulerGridEngine: /* Switch to restorer context, restore the rest and continue */ spelling&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page describes the overall design of how Checkpoint and Restore work in CRIU.&lt;br /&gt;
&lt;br /&gt;
== Checkpoint ==&lt;br /&gt;
&lt;br /&gt;
The checkpoint procedure relies heavily on '''/proc''' file system (it's a general place where criu takes all the information it needs).&lt;br /&gt;
The information gathered from /proc includes:&lt;br /&gt;
&lt;br /&gt;
* Files descriptors information (via '''/proc/$pid/fd''' and '''/proc/$pid/fdinfo''').&lt;br /&gt;
* Pipes parameters.&lt;br /&gt;
* Memory maps (via '''/proc/$pid/maps''' and '''/proc/$pid/map_files/''').&lt;br /&gt;
* etc.&lt;br /&gt;
&lt;br /&gt;
The process dumper (called a dumper below) does the following steps during the checkpoint stage.&lt;br /&gt;
&lt;br /&gt;
=== Collect process tree and freeze it ===&lt;br /&gt;
The '''$pid''' of a process group leader is obtained from the command line (&amp;lt;code&amp;gt;--tree&amp;lt;/code&amp;gt; option). By using this '''$pid''' the dumper walks though '''/proc/$pid/task/''' directory collecting threads and through the '''/proc/$pid/task/$tid/children''' to gathers children recursively. While walking tasks are stopped using the &amp;lt;code&amp;gt;ptrace&amp;lt;/code&amp;gt;'s &amp;lt;code&amp;gt;PTRACE_SEIZE&amp;lt;/code&amp;gt; command.&lt;br /&gt;
&lt;br /&gt;
''See also: [[Freezing the tree]]''&lt;br /&gt;
&lt;br /&gt;
=== Collect tasks' resources and dump them ===&lt;br /&gt;
At this step CRIU reads all the information (it knows) about collected tasks and writes them to dump files. The resources are obtained via&lt;br /&gt;
# VMAs areas are parsed from '''/proc/$pid/smaps''' and mapped files are read from '''/proc/$pid/map_files''' links&lt;br /&gt;
# File descriptor numbers are read via '''/proc/$pid/fd'''&lt;br /&gt;
# Core parameters of a task (such as registers and friends) are being dumped via ptrace interface and parsing '''/proc/$pid/stat''' entry.&lt;br /&gt;
&lt;br /&gt;
Then CRIU injects a [[parasite code]] into a task via ptrace interface. This is done in two steps -- at first we inject only a few bytes for ''mmap'' syscall at CS:IP the task has at moment of seizing. Then ptrace allow us to run an injected syscall and we allocate enough memory for a parasite code chunk we need for dumping. After that the parasite code is copied into new place inside dumpee address space and CS:IP set respectively to point to our parasite code.&lt;br /&gt;
&lt;br /&gt;
From parsite context CRIU does more information such as&lt;br /&gt;
# Credentials&lt;br /&gt;
# Contents of memory&lt;br /&gt;
&lt;br /&gt;
=== Cleanup ===&lt;br /&gt;
&lt;br /&gt;
After everything dumped (such as memory pages, which can be written out only from inside dumpee address space) we use ptrace facility again and cure dumpee by dropping out all our parasite code and restoring original code. Then CRIU detaches from tasks and they continue to operate.&lt;br /&gt;
&lt;br /&gt;
== Restore ==&lt;br /&gt;
&lt;br /&gt;
The restore procedure (aka restorer) is done by CRIU morphing itself into the tasks it restores. On the top-level it consists of 4 steps&lt;br /&gt;
&lt;br /&gt;
=== Resolve shared resources ===&lt;br /&gt;
&lt;br /&gt;
At this step CRIU reads in image files and finds out which processes share which resources. Later shared resources are restored by some one process and all the others either inherit one on the 2nd stage (like session) or obtain in some other way. The latter is, for example, shared files which are sent with SCM_CREDS messages via unix sockets, or shared memory areas that are restoring via &amp;lt;code&amp;gt;memfd&amp;lt;/code&amp;gt; file descriptor.&lt;br /&gt;
&lt;br /&gt;
=== Fork the process tree ===&lt;br /&gt;
&lt;br /&gt;
At this step CRIU calls fork() many times to re-created the processes needed to be restored. Note, that threads are not restored here, but on the 4th step.&lt;br /&gt;
&lt;br /&gt;
=== Restore basic tasks resources ===&lt;br /&gt;
&lt;br /&gt;
Here CRIU restores all resources but&lt;br /&gt;
&lt;br /&gt;
# memory mappings exact location&lt;br /&gt;
# timers&lt;br /&gt;
# credentials&lt;br /&gt;
# threads&lt;br /&gt;
&lt;br /&gt;
The restoration of the above four types of resources are delayed till the last stage for the reasons described below. On this stage CRIU opens files, prepares [[namespaces]], maps (and fills with data) private memory areas, creates sockets, calls chdir() and chroot() and doing some more.&lt;br /&gt;
&lt;br /&gt;
=== Switch to restorer context, restore the rest and continue ===&lt;br /&gt;
&lt;br /&gt;
The reason for restorer blob is simple. Since criu morphs into the target process, it will have to unmap all its memory and put back the target one. While doing so, some code should exist in memory (the code doing the munmap and mmap). Therefore, the restorer blob is introduced. It's a small piece of code, that doesn't intersect with criu mappings AND target mappings. At the end of stage 2 criu jumps into this blob and restores the memory maps.&lt;br /&gt;
&lt;br /&gt;
At the same place we restore timers not to make them fire too early, here we restore credentials to let criu do privileged operations (like fork-with-pid) and threads not to make them suffer from sudden memory layout change.&lt;br /&gt;
&lt;br /&gt;
''See also: [[restorer context]], [[tree after restore]].''&lt;br /&gt;
&lt;br /&gt;
[[Category:Under the hood]]&lt;/div&gt;</summary>
		<author><name>OpenGridSchedulerGridEngine</name></author>
	</entry>
	<entry>
		<id>https://criu.org/index.php?title=Restorer_context&amp;diff=4152</id>
		<title>Restorer context</title>
		<link rel="alternate" type="text/html" href="https://criu.org/index.php?title=Restorer_context&amp;diff=4152"/>
		<updated>2017-04-17T03:15:26Z</updated>

		<summary type="html">&lt;p&gt;OpenGridSchedulerGridEngine: /* What is restored there and why */ typo&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page describes what this context is and why we need one.&lt;br /&gt;
&lt;br /&gt;
== What is it? ==&lt;br /&gt;
&lt;br /&gt;
The restorer context is the last stage of the [[Checkpoint/Restore|restore]] process. It differs from the regular CRIU's (process) context like the [[parasite code]] does -- it doesn't have any libraries, it is PIE-compiled and can only work on fixed amount of memory. In this context CRIU restores&lt;br /&gt;
&lt;br /&gt;
# memory&lt;br /&gt;
# timers&lt;br /&gt;
# credentials&lt;br /&gt;
# threads&lt;br /&gt;
&lt;br /&gt;
== Why separate context? ==&lt;br /&gt;
&lt;br /&gt;
The reasoning for this is simple -- when CRIU comes to the state when it needs to restore process' memory, it should unmap all the old mappings and map the new ones. But since CRIU process would do this operation on itself, once the old code is unmapped CRIU will seg-fault right on the exit from the &amp;lt;code&amp;gt;munmap()&amp;lt;/code&amp;gt; system call. Also this code should be get over-mmaped by the mapping is restores. So we need some code that would do this. And this other code should &amp;quot;sit&amp;quot; in two address spaces simultaneously -- in the CRIU's one and in the target one.&lt;br /&gt;
&lt;br /&gt;
The switch to this context is done in several steps.&lt;br /&gt;
&lt;br /&gt;
First, we collect the data needed by the restorer code and puts all it into one sequential memory area. Then, knowing the data size and the restorer code size, we find the appropriate hole in the intersection of CRIU's and target mappings. Then we mmap() this region, mremap() the data into it, put the restorer blob nearby, fix the pointers (see below) and call assember's &amp;quot;jump&amp;quot; instruction to get there.&lt;br /&gt;
&lt;br /&gt;
Now what &amp;quot;fix the pointers&amp;quot; mean. When we collected data for CRIU we addressed the objects in this are using pointers valid in CRIU address space. When we will jump into restorer code pointers in there should &amp;quot;know&amp;quot; where the respective objects are. So knowing where from the restorer counts pointers and the structure of the restorer data, we alter them respectively.&lt;br /&gt;
&lt;br /&gt;
== What is restored there and why ==&lt;br /&gt;
&lt;br /&gt;
So, memory is restored here for the reasons described above. Note, that here CRIU does two things only&lt;br /&gt;
&lt;br /&gt;
# Move anonymous VMAs to proper places&lt;br /&gt;
# Map new file VMAs&lt;br /&gt;
&lt;br /&gt;
The anonymous memory is mmaped and filled with data earlier, in restorer it's only mremap()-ed into proper addressed. The files mapping are just mmap()-ed, as data in them sits in files :)&lt;br /&gt;
&lt;br /&gt;
Timers are restore here, since CRIU processes can wait for each other for some time while restoring and not to lose timer ticks there, we delay timers arming this the last moment.&lt;br /&gt;
&lt;br /&gt;
Credentials are restored here to allow CRIU perform privileged operations such as fork-with-pid or chroot().&lt;br /&gt;
&lt;br /&gt;
Threads are restored here for simplicity. If we restored them before, we'd have to &amp;quot;park&amp;quot; them while we change the memory layout. Instead of doing this, we first toss the memory around, then create threads.&lt;br /&gt;
&lt;br /&gt;
== See also ==&lt;br /&gt;
* [[Code blobs]]&lt;br /&gt;
* [[Compel]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Under the hood]]&lt;/div&gt;</summary>
		<author><name>OpenGridSchedulerGridEngine</name></author>
	</entry>
	<entry>
		<id>https://criu.org/index.php?title=Restorer_context&amp;diff=4151</id>
		<title>Restorer context</title>
		<link rel="alternate" type="text/html" href="https://criu.org/index.php?title=Restorer_context&amp;diff=4151"/>
		<updated>2017-04-16T03:32:43Z</updated>

		<summary type="html">&lt;p&gt;OpenGridSchedulerGridEngine: /* What is it? */ typo&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page describes what this context is and why we need one.&lt;br /&gt;
&lt;br /&gt;
== What is it? ==&lt;br /&gt;
&lt;br /&gt;
The restorer context is the last stage of the [[Checkpoint/Restore|restore]] process. It differs from the regular CRIU's (process) context like the [[parasite code]] does -- it doesn't have any libraries, it is PIE-compiled and can only work on fixed amount of memory. In this context CRIU restores&lt;br /&gt;
&lt;br /&gt;
# memory&lt;br /&gt;
# timers&lt;br /&gt;
# credentials&lt;br /&gt;
# threads&lt;br /&gt;
&lt;br /&gt;
== Why separate context? ==&lt;br /&gt;
&lt;br /&gt;
The reasoning for this is simple -- when CRIU comes to the state when it needs to restore process' memory, it should unmap all the old mappings and map the new ones. But since CRIU process would do this operation on itself, once the old code is unmapped CRIU will seg-fault right on the exit from the &amp;lt;code&amp;gt;munmap()&amp;lt;/code&amp;gt; system call. Also this code should be get over-mmaped by the mapping is restores. So we need some code that would do this. And this other code should &amp;quot;sit&amp;quot; in two address spaces simultaneously -- in the CRIU's one and in the target one.&lt;br /&gt;
&lt;br /&gt;
The switch to this context is done in several steps.&lt;br /&gt;
&lt;br /&gt;
First, we collect the data needed by the restorer code and puts all it into one sequential memory area. Then, knowing the data size and the restorer code size, we find the appropriate hole in the intersection of CRIU's and target mappings. Then we mmap() this region, mremap() the data into it, put the restorer blob nearby, fix the pointers (see below) and call assember's &amp;quot;jump&amp;quot; instruction to get there.&lt;br /&gt;
&lt;br /&gt;
Now what &amp;quot;fix the pointers&amp;quot; mean. When we collected data for CRIU we addressed the objects in this are using pointers valid in CRIU address space. When we will jump into restorer code pointers in there should &amp;quot;know&amp;quot; where the respective objects are. So knowing where from the restorer counts pointers and the structure of the restorer data, we alter them respectively.&lt;br /&gt;
&lt;br /&gt;
== What is restored there and why ==&lt;br /&gt;
&lt;br /&gt;
So, memory is restored here for the reasons described above. Note, that here CRIU does two things only&lt;br /&gt;
&lt;br /&gt;
# Move anonymous VMAs to proper places&lt;br /&gt;
# Map new file VMAs&lt;br /&gt;
&lt;br /&gt;
The anonymous memory is mmaped and filled with data earlier, in restorer it's only mremap()-ed into proper addressed. The files mapping are just mmap()-ed, as data in them sits in files :)&lt;br /&gt;
&lt;br /&gt;
Timers are restore here, since CRIU processes can wait for each other for some time while restoring and not to lose timer ticks there, we delay timers arming this the last moment.&lt;br /&gt;
&lt;br /&gt;
Credentials are restored here to allow CRIU perform privileged operations such as fork-with-pid or chroot().&lt;br /&gt;
&lt;br /&gt;
Threads are restored here for simplicity. If we restored them before, we'd have to &amp;quot;park&amp;quot; them while we change the memory layout. Instead of doing this, we first toss the memory arond, then create threads.&lt;br /&gt;
&lt;br /&gt;
== See also ==&lt;br /&gt;
* [[Code blobs]]&lt;br /&gt;
* [[Compel]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Under the hood]]&lt;/div&gt;</summary>
		<author><name>OpenGridSchedulerGridEngine</name></author>
	</entry>
	<entry>
		<id>https://criu.org/index.php?title=TCP_repair_TODO&amp;diff=3022</id>
		<title>TCP repair TODO</title>
		<link rel="alternate" type="text/html" href="https://criu.org/index.php?title=TCP_repair_TODO&amp;diff=3022"/>
		<updated>2016-08-22T11:03:37Z</updated>

		<summary type="html">&lt;p&gt;OpenGridSchedulerGridEngine: a API -&amp;gt; an API&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;TCP repair feature in the Linux kernel is supposed to help migrating a TCP socket. It's not yet complete and this page lists what is to be done.&lt;br /&gt;
&lt;br /&gt;
; Transitional states&lt;br /&gt;
: Currently we support sockets in ''closed'' and ''establised'' states. However, if a socket is in e.g. ''syn-sent'' state the process of turning it into ''established'' can last long. We should teach the kernel and criu to checkpoint and restore this and other states&lt;br /&gt;
&lt;br /&gt;
; Optimized restore&lt;br /&gt;
: Currently the whole outgoing queue is restored in a &amp;quot;all was sent, waiting for ACK-s&amp;quot; state. After this the data that was really not sent yet will be re-transmitted after a while. This will make the connection work, but will delay it for some time. Need to improve this.&lt;br /&gt;
&lt;br /&gt;
; OOB data&lt;br /&gt;
: Nothing to say here actually. This data is just not supported currently.&lt;br /&gt;
&lt;br /&gt;
; Window restore fix&lt;br /&gt;
: Currently what we do it send the window probe skb when repair is OFF. The other side should send us the response, but this process is not guaranteed to work. Need to fix this either by saving and restoring the window value, or by re-transmitting the probe again and again.&lt;br /&gt;
&lt;br /&gt;
; Shutdown sockets repair&lt;br /&gt;
: Need to place checks in the inet shutdown code similar to those on connect/sendmsg paths.&lt;br /&gt;
&lt;br /&gt;
; Connection tracking&lt;br /&gt;
: The nf_conntrack thing in the kernel is about to be live-migrated too. There's currently an API for getting the conntrack info (/proc file), but no such for restoring it by hands.&lt;br /&gt;
&lt;br /&gt;
[[Category:Plans]]&lt;br /&gt;
[[Category:Network]]&lt;/div&gt;</summary>
		<author><name>OpenGridSchedulerGridEngine</name></author>
	</entry>
</feed>