Changes

Jump to navigation Jump to search
522 bytes added ,  22:49, 11 September 2018
Another crazy idea (dima)
Line 1: Line 1: −
This is a set of ideas how criu can be used
+
This is a set of ideas how criu can be used.
   −
== Container [[:Category:live migration|live migration]] ==
+
== Container live migration ==
   −
This is the use case from what the whole checkpoint/restore project appeared. Container is checkpointed, then the image is copied on another box, then restored. From the remote observer point of view the container is just frozen for a while. You can find more details on this scenario [[LXC | here]].
+
This is the use case from what the whole checkpoint/restore project appeared. Container is checkpointed, then the image is copied on another box, then restored. From the remote observer point of view the container is just frozen for a while.
 +
 
 +
''For more info, see [[:Category:live migration]].''
    
== Slow-boot services speed up ==
 
== Slow-boot services speed up ==
Line 11: Line 13:  
We have a rough preliminary measurement, showing that VNC server + eclipse start time reduces from ~29 seconds to ~1.5.
 
We have a rough preliminary measurement, showing that VNC server + eclipse start time reduces from ~29 seconds to ~1.5.
   −
== [[Seamless kernel upgrade]] ==
+
''Main article: [[slow-boot services speed up]].''
 +
 
 +
== Seamless kernel upgrade ==
    
When replacing a kernel on a box we can do it without stopping critical activity. Checkpoint it, then replace the kernel (e.g. using kexec) then restore services back. In a perfect world the applications memory shouldn't be put to disk image, but should rather be kept in RAM.
 
When replacing a kernel on a box we can do it without stopping critical activity. Checkpoint it, then replace the kernel (e.g. using kexec) then restore services back. In a perfect world the applications memory shouldn't be put to disk image, but should rather be kept in RAM.
 +
 +
''Main article: [[Seamless kernel upgrade]]''.
    
== Networking load balancing ==
 
== Networking load balancing ==
   −
Not the whole project, but the [[TCP_connection|TCP repair]] can be used to offload an app-level request handling on another box.
+
Not the whole project, but the [[TCP connection|TCP repair]] can be used to offload an app-level request handling on another box.
    
== HPC issues ==
 
== HPC issues ==
Line 26: Line 32:  
# Periodic state save to avoid recomputation in case of a cluster crash. Take server snapshot every few minutes and put on other machine. When doing failover resurrect the other side quickly.
 
# Periodic state save to avoid recomputation in case of a cluster crash. Take server snapshot every few minutes and put on other machine. When doing failover resurrect the other side quickly.
   −
== [[X applications|Desktop environment]] suspend/resume ==
+
== Desktop environment suspend/resume ==
    
Suspending a screen session and restoring it on another box might be interesting.
 
Suspending a screen session and restoring it on another box might be interesting.
 
Suspending some X app (browser?) and restoring it later is also worth thinking about but requires knowledge of X-protocol.
 
Suspending some X app (browser?) and restoring it later is also worth thinking about but requires knowledge of X-protocol.
 +
 +
''Main article: [[X applications]].''
    
== Processes duplication ==
 
== Processes duplication ==
Line 44: Line 52:     
One of examples when this snapshot might be useful is debugging. One might need to bring an application into a "desired" state fast, and having dump at that state would speed things up.
 
One of examples when this snapshot might be useful is debugging. One might need to bring an application into a "desired" state fast, and having dump at that state would speed things up.
 +
 +
''Main article: [[Incremental dumps]].''
    
== Move "forgotten" applications into "screen" ==
 
== Move "forgotten" applications into "screen" ==
Line 59: Line 69:  
== Fault-tolerant systems ==
 
== Fault-tolerant systems ==
   −
With CRIU it's possible to periodically duplicate process on another box. Requires [[Applying images]] facility.
+
With CRIU it's possible to periodically duplicate process on another box. Requires [[applying images]] facility.
 +
 
 +
== Update dryrun ==
 +
 
 +
Before updating a kernel/system libs one may duplicate a system service(s) into VM with updates and check they continue to run OK. If this test passes, then the real system update can be done.
 +
 
 +
== Zero downtime crash restore ==
 +
 
 +
Checkpoint critical service from /proc/vmcore in crash kernel and migrate on another machine.
 +
 
 +
[[Category:Using]]
 +
[[Category:Editor help needed]]
105

edits

Navigation menu