Difference between revisions of "When C/R fails"

From CRIU
Jump to navigation Jump to search
(Don't recomment unsharing pidns with old proc)
Line 14: Line 14:
 
this means that while restoring a process tree CRIU has failed to recreate a process or a thread with the PID (TID) value it used to have on dump. This most likely is due to the PID/TID value in question being busy with some other task or thread. There are several possible solutions to this.
 
this means that while restoring a process tree CRIU has failed to recreate a process or a thread with the PID (TID) value it used to have on dump. This most likely is due to the PID/TID value in question being busy with some other task or thread. There are several possible solutions to this.
  
One is to restore the images into a separate pid namespace. This can be done by using the <code>unshare -p --fork</code> command and then doing the restore. In this case you might also want to unshare the mount namespace and re-mount the /proc so that the restored tasks can use it. This method effectively means restoring tasks in container.
+
One is to restore the images into a separate pid namespace. This can be done by using the <code>unshare -p -m --fork --mount-proc</code> command and then doing the restore. In this case you might also want to unshare the mount namespace and re-mount the /proc so that the restored tasks can use it. This method effectively means restoring tasks in container.
  
 
Having said the above, the most correct way to handle this is to run the tasks you plan to checkpoint in the pid namespace (with /proc tune-up if required) from the very beginning.
 
Having said the above, the most correct way to handle this is to run the tasks you plan to checkpoint in the pid namespace (with /proc tune-up if required) from the very beginning.

Revision as of 10:26, 28 June 2016

This page is the collection of typical situations when checkpoint or restore fails.

PID mismatch on restore

If you see one of these lines in failed restore logs

Pid $number do not match expected $another_number
Thread pid mismatch $number1/$number2

this means that while restoring a process tree CRIU has failed to recreate a process or a thread with the PID (TID) value it used to have on dump. This most likely is due to the PID/TID value in question being busy with some other task or thread. There are several possible solutions to this.

One is to restore the images into a separate pid namespace. This can be done by using the unshare -p -m --fork --mount-proc command and then doing the restore. In this case you might also want to unshare the mount namespace and re-mount the /proc so that the restored tasks can use it. This method effectively means restoring tasks in container.

Having said the above, the most correct way to handle this is to run the tasks you plan to checkpoint in the pid namespace (with /proc tune-up if required) from the very beginning.

Another solution to PID mismatch, not correct, but still, is in killing the offending task :)


Mount namespace with external dependences

The dump-time error in logs

$nr:$path doesn't have a proper root mount

indicates that the container (or just mount namespace) being dumped has some external dependences. CRIU knows how to work with it, you just need to tell CRIU about external bind mounts you have.