Changes

2,113 bytes added ,  09:25, 3 August 2020
m
Line 1: Line 1:  
This page is the collection of typical situations when checkpoint or restore fails.
 
This page is the collection of typical situations when checkpoint or restore fails.
 +
 +
== PID mismatch on restore ==
 +
 +
If you see one of these lines in the failed restore logs
 +
 +
<pre>
 +
Pid $number do not match expected $another_number
 +
</pre>
 +
<pre>
 +
Thread pid mismatch $number1/$number2
 +
</pre>
 +
 +
this means that while restoring a process tree, CRIU has failed to recreate a process (or a thread) with the PID (TID) value the proceess (thread) used to have on dump. Most likely, this is because the desired PID/TID is used by other task or thread. There are several possible solutions to this.
 +
 +
=== Restore "by hand" ===
 +
One way is to restore the images into a separate pid namespace. This can be done by using the <code>unshare -p -m --fork --mount-proc</code> command and then doing the restore. In this case, you might also want to unshare the mount namespace and re-mount the /proc, so that the restored tasks can use it. This method effectively means restoring tasks in a container.
 +
 +
Note, however, that the most correct way is to run the tasks you plan to checkpoint in the pid namespace (with /proc tune-up if required) from the very beginning.
 +
 +
=== Kill the obstacle ===
 +
Another solution to PID mismatch, which is not correct, but still efficient, is to kill the offending task :)
 +
 +
=== With CRIU help ===
 +
 +
There's a helper script called <code>criu-ns</code> that does restore in pseudo-container and also can dump again the restored tree. For more details see the "[[CR in namespace]]" article.
 +
 +
== Mount namespace with external dependences ==
 +
 +
The dump-time error in logs
 +
 +
<pre>
 +
$nr:$path doesn't have a proper root mount
 +
</pre>
 +
 +
indicates that the container (or just mount namespace) being dumped has some external dependences. CRIU knows how to work with it, you just need to tell CRIU about [[external bind mounts]] you have.
 +
 +
== External unix socket connection ==
 +
 +
If an application has an open unix connection to the external world, the dump will fail with
 +
 +
<pre>
 +
"Cannot dump half of a stream unix connection
 +
</pre>
 +
 +
In this case the socket should be explained as "OK to break" and reconnected back on restore. How to do it is described in [[external UNIX socket|here]]
 +
 +
[[Category:Using]]
1

edit