Changes

Jump to navigation Jump to search
1,324 bytes added ,  22:49, 11 September 2018
Another crazy idea (dima)
Line 1: Line 1: −
This is a set of ideas how criu can be used
+
This is a set of ideas how criu can be used.
   −
== Container [[live migration]] ==
+
== Container live migration ==
   −
This is the use case from what the whole checkpoint/restore project appeared. Container is checkpointed, then the image is copied on another box, then restored. From the remote observer point of view the container is just frozen for a while. You can find more details on this scenario [[LXC | here]]
+
This is the use case from what the whole checkpoint/restore project appeared. Container is checkpointed, then the image is copied on another box, then restored. From the remote observer point of view the container is just frozen for a while.
 +
 
 +
''For more info, see [[:Category:live migration]].''
    
== Slow-boot services speed up ==
 
== Slow-boot services speed up ==
Line 11: Line 13:  
We have a rough preliminary measurement, showing that VNC server + eclipse start time reduces from ~29 seconds to ~1.5.
 
We have a rough preliminary measurement, showing that VNC server + eclipse start time reduces from ~29 seconds to ~1.5.
   −
== Reboot-less kernel upgrade ==
+
''Main article: [[slow-boot services speed up]].''
 +
 
 +
== Seamless kernel upgrade ==
    
When replacing a kernel on a box we can do it without stopping critical activity. Checkpoint it, then replace the kernel (e.g. using kexec) then restore services back. In a perfect world the applications memory shouldn't be put to disk image, but should rather be kept in RAM.
 
When replacing a kernel on a box we can do it without stopping critical activity. Checkpoint it, then replace the kernel (e.g. using kexec) then restore services back. In a perfect world the applications memory shouldn't be put to disk image, but should rather be kept in RAM.
 +
 +
''Main article: [[Seamless kernel upgrade]]''.
    
== Networking load balancing ==
 
== Networking load balancing ==
   −
Not the whole project, but the [[TCP_connection|TCP repair]] can be used to offload an app-level request handling on another box.
+
Not the whole project, but the [[TCP connection|TCP repair]] can be used to offload an app-level request handling on another box.
    
== HPC issues ==
 
== HPC issues ==
Line 30: Line 36:  
Suspending a screen session and restoring it on another box might be interesting.
 
Suspending a screen session and restoring it on another box might be interesting.
 
Suspending some X app (browser?) and restoring it later is also worth thinking about but requires knowledge of X-protocol.
 
Suspending some X app (browser?) and restoring it later is also worth thinking about but requires knowledge of X-protocol.
 +
 +
''Main article: [[X applications]].''
    
== Processes duplication ==
 
== Processes duplication ==
Line 42: Line 50:     
With CRIU one can save a series of app's states (all but first incremental) and revert later to any of them. The "apply-images" item from TODO list should help to revert the state faster, especially if the memory changes tracker state is with us.
 
With CRIU one can save a series of app's states (all but first incremental) and revert later to any of them. The "apply-images" item from TODO list should help to revert the state faster, especially if the memory changes tracker state is with us.
 +
 +
One of examples when this snapshot might be useful is debugging. One might need to bring an application into a "desired" state fast, and having dump at that state would speed things up.
 +
 +
''Main article: [[Incremental dumps]].''
    
== Move "forgotten" applications into "screen" ==
 
== Move "forgotten" applications into "screen" ==
    
Sometimes it's useful to launch a process in "screen". If you forgot to switch into screen, but launched a task, criu can help to "migrate" the app into it.
 
Sometimes it's useful to launch a process in "screen". If you forgot to switch into screen, but launched a task, criu can help to "migrate" the app into it.
 +
 +
== Applications behavior analysis on another machine ==
 +
 +
It's possible to take periodic snapshots of running applications and transfer them on another machine for debugging or behavior and performance analysis.
 +
 +
== Debugging of hung application ==
 +
 +
If there's some service, that got hung, but need to be restarted quickly, it's possible to take a dump of one, restart and debug why it hanged later, using its restored copy.
 +
 +
== Fault-tolerant systems ==
 +
 +
With CRIU it's possible to periodically duplicate process on another box. Requires [[applying images]] facility.
 +
 +
== Update dryrun ==
 +
 +
Before updating a kernel/system libs one may duplicate a system service(s) into VM with updates and check they continue to run OK. If this test passes, then the real system update can be done.
 +
 +
== Zero downtime crash restore ==
 +
 +
Checkpoint critical service from /proc/vmcore in crash kernel and migrate on another machine.
 +
 +
[[Category:Using]]
 +
[[Category:Editor help needed]]
105

edits

Navigation menu