Difference between revisions of "When C/R fails"
m (→Restore "by hand": Fix typo) |
|||
(6 intermediate revisions by 2 users not shown) | |||
Line 3: | Line 3: | ||
== PID mismatch on restore == | == PID mismatch on restore == | ||
− | If you see one of these lines in failed restore logs | + | If you see one of these lines in the failed restore logs |
<pre> | <pre> | ||
Line 12: | Line 12: | ||
</pre> | </pre> | ||
− | this means that while restoring a process tree CRIU has failed to recreate a process or a thread with the PID (TID) value | + | this means that while restoring a process tree, CRIU has failed to recreate a process (or a thread) with the PID (TID) value the proceess (thread) used to have on dump. Most likely, this is because the desired PID/TID is used by other task or thread. There are several possible solutions to this. |
− | One is to restore the images into a separate pid namespace. This can be done by using the <code>unshare -p --fork</code> command and then doing the restore. In this case you might also want to unshare the mount namespace and re-mount the /proc so that the restored tasks can use it. This method effectively means restoring tasks in container. | + | === Restore "by hand" === |
+ | One way is to restore the images into a separate pid namespace. This can be done by using the <code>unshare -p -m --fork --mount-proc</code> command and then doing the restore. In this case, you might also want to unshare the mount namespace and re-mount the /proc, so that the restored tasks can use it. This method effectively means restoring tasks in a container. | ||
− | + | Note, however, that the most correct way is to run the tasks you plan to checkpoint in the pid namespace (with /proc tune-up if required) from the very beginning. | |
− | Another solution to PID mismatch, not correct, but still, is | + | === Kill the obstacle === |
+ | Another solution to PID mismatch, which is not correct, but still efficient, is to kill the offending task :) | ||
+ | |||
+ | === With CRIU help === | ||
+ | |||
+ | There's a helper script called <code>criu-ns</code> that does restore in pseudo-container and also can dump again the restored tree. For more details see the "[[CR in namespace]]" article. | ||
+ | |||
+ | == Mount namespace with external dependences == | ||
+ | |||
+ | The dump-time error in logs | ||
+ | |||
+ | <pre> | ||
+ | $nr:$path doesn't have a proper root mount | ||
+ | </pre> | ||
+ | |||
+ | indicates that the container (or just mount namespace) being dumped has some external dependences. CRIU knows how to work with it, you just need to tell CRIU about [[external bind mounts]] you have. | ||
+ | |||
+ | == External unix socket connection == | ||
+ | |||
+ | If an application has an open unix connection to the external world, the dump will fail with | ||
+ | |||
+ | <pre> | ||
+ | "Cannot dump half of a stream unix connection | ||
+ | </pre> | ||
+ | |||
+ | In this case the socket should be explained as "OK to break" and reconnected back on restore. How to do it is described in [[external UNIX socket|here]] | ||
+ | |||
+ | [[Category:Using]] |
Latest revision as of 09:25, 3 August 2020
This page is the collection of typical situations when checkpoint or restore fails.
PID mismatch on restore[edit]
If you see one of these lines in the failed restore logs
Pid $number do not match expected $another_number
Thread pid mismatch $number1/$number2
this means that while restoring a process tree, CRIU has failed to recreate a process (or a thread) with the PID (TID) value the proceess (thread) used to have on dump. Most likely, this is because the desired PID/TID is used by other task or thread. There are several possible solutions to this.
Restore "by hand"[edit]
One way is to restore the images into a separate pid namespace. This can be done by using the unshare -p -m --fork --mount-proc
command and then doing the restore. In this case, you might also want to unshare the mount namespace and re-mount the /proc, so that the restored tasks can use it. This method effectively means restoring tasks in a container.
Note, however, that the most correct way is to run the tasks you plan to checkpoint in the pid namespace (with /proc tune-up if required) from the very beginning.
Kill the obstacle[edit]
Another solution to PID mismatch, which is not correct, but still efficient, is to kill the offending task :)
With CRIU help[edit]
There's a helper script called criu-ns
that does restore in pseudo-container and also can dump again the restored tree. For more details see the "CR in namespace" article.
Mount namespace with external dependences[edit]
The dump-time error in logs
$nr:$path doesn't have a proper root mount
indicates that the container (or just mount namespace) being dumped has some external dependences. CRIU knows how to work with it, you just need to tell CRIU about external bind mounts you have.
External unix socket connection[edit]
If an application has an open unix connection to the external world, the dump will fail with
"Cannot dump half of a stream unix connection
In this case the socket should be explained as "OK to break" and reconnected back on restore. How to do it is described in here