Difference between revisions of "Copy-on-write memory"

From CRIU
Jump to navigation Jump to search
(Added introduction to the problem)
(s/do/does/g)
 
(4 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
== Problem ==
 
== Problem ==
Private anonymous mappings are tricky. They are declared to belong to a single process only and contain ''its'' data, but the Linux kernel optimizes the case when task calls fork() and creates a copy of himself. In this case all private anonymous mappings are "shared" between the parent and the child, but when either of them tries to modify the memory, the respective page is duplicated and the changes occur in the modifier's copy only.
+
Private anonymous mappings are tricky. They are declared to belong to a single process only and contain ''its'' data, but the Linux kernel optimizes the case when task calls fork() and creates a copy of itself. In this case all private anonymous mappings are "shared" between the parent and the child, but when either of them tries to modify the memory, the respective page is duplicated and the changes occur in the modifier's copy only.
  
 
When taking a dump of a process tree, it's totally correct to copy contents of all the anonymous private mappings independently and restore them in the same way -- just mmap and put the memory in there. But with this approach we effectively do the described memory duplication and thus increase memory usage by checkpointed and restore application.
 
When taking a dump of a process tree, it's totally correct to copy contents of all the anonymous private mappings independently and restore them in the same way -- just mmap and put the memory in there. But with this approach we effectively do the described memory duplication and thus increase memory usage by checkpointed and restore application.
  
To fix this, crtools in version 0.3 and above do special tricks.
+
To fix this, criu in version 0.3 and above does special tricks.
  
 
== How restore works to keep COW intact ==
 
== How restore works to keep COW intact ==
Line 10: Line 10:
  
 
# Which VMAs should be inherited?
 
# Which VMAs should be inherited?
# How to avoid intersections with crtools VMAs?
+
# How to avoid intersections with criu VMAs?
  
 
The first question is not resolved completely. Now a VMA is inherited if a parent has a VMA with the same start and end addresses. This covers 99% of cases, but it doesn't work if a VMA was moved.
 
The first question is not resolved completely. Now a VMA is inherited if a parent has a VMA with the same start and end addresses. This covers 99% of cases, but it doesn't work if a VMA was moved.
  
The second question is more interesting. Currently crtools reserves continuous space for all private VMAs, then restores all VMAs one by one in this space. Inherited VMAs are moved from a parent space. All VMAs are sorted by start addresses.
+
The second question is more interesting. Currently criu reserves continuous space for all private VMAs, then restores all VMAs one by one in this space. Inherited VMAs are moved from a parent space. All VMAs are sorted by start addresses.
  
 
[[File:cow.png]]
 
[[File:cow.png]]
  
In “restorer” all crtools’ VMAs are unmapped and private VMAs are space apart. The complexity of this algorithm is linear. Now it looks simple, but I spent a few hours to find it.
+
In “restorer” all criu’s VMAs are unmapped and private VMAs are space apart. The complexity of this algorithm is linear. Now it looks simple, but I spent a few hours to find it.
  
 
{{Out|“Complexity is easy; simplicity is difficult. -Georgy Shpagin”}}
 
{{Out|“Complexity is easy; simplicity is difficult. -Georgy Shpagin”}}
Line 32: Line 32:
  
 
{{Like}}
 
{{Like}}
 +
 +
[[Category:Under the hood]]
 +
[[Category:Memory]]

Latest revision as of 11:17, 29 April 2018

Problem[edit]

Private anonymous mappings are tricky. They are declared to belong to a single process only and contain its data, but the Linux kernel optimizes the case when task calls fork() and creates a copy of itself. In this case all private anonymous mappings are "shared" between the parent and the child, but when either of them tries to modify the memory, the respective page is duplicated and the changes occur in the modifier's copy only.

When taking a dump of a process tree, it's totally correct to copy contents of all the anonymous private mappings independently and restore them in the same way -- just mmap and put the memory in there. But with this approach we effectively do the described memory duplication and thus increase memory usage by checkpointed and restore application.

To fix this, criu in version 0.3 and above does special tricks.

How restore works to keep COW intact[edit]

We have different ideas how to restore COW[1] memory. In a moment we even thought to use KSM[2] for that. As result we found a good way for restoring COW memory (I guess). All VMAs are restored in the same way as they were created. Here are two questions:

  1. Which VMAs should be inherited?
  2. How to avoid intersections with criu VMAs?

The first question is not resolved completely. Now a VMA is inherited if a parent has a VMA with the same start and end addresses. This covers 99% of cases, but it doesn't work if a VMA was moved.

The second question is more interesting. Currently criu reserves continuous space for all private VMAs, then restores all VMAs one by one in this space. Inherited VMAs are moved from a parent space. All VMAs are sorted by start addresses.

Cow.png

In “restorer” all criu’s VMAs are unmapped and private VMAs are space apart. The complexity of this algorithm is linear. Now it looks simple, but I spent a few hours to find it.

“Complexity is easy; simplicity is difficult. -Georgy Shpagin”

“Everything should be made as simple as possible, but not more simpler. - Albert Einstein”

All VMAs and their contents are restored before forking children, so here is one more item. A parent can change some pages after forking a child, so such pages should be dropped from the child's VMA. For solving this problem bitmaps are used to mark touched pages and madvise() is used to remove extra pages.

One more case is not handled now. COW memory are not restored if a process is reparented to init.

References[edit]