Difference between revisions of "Usage scenarios"

From CRIU
Jump to: navigation, search
(+ save game state)
(Merge two HPC items)
Line 22: Line 22:
  
 
# Load balancing a computational task over a cluster. It can be done in two directions -- push parts of a task on another box to utilize idle parts of a cluster or pull parts of a task to a local box to make better use of local caches.
 
# Load balancing a computational task over a cluster. It can be done in two directions -- push parts of a task on another box to utilize idle parts of a cluster or pull parts of a task to a local box to make better use of local caches.
# Periodic state save to avoid recomputation in case of a cluster crash.
+
# Periodic state save to avoid recomputation in case of a cluster crash. Take server snapshot every few minutes and put on other machine. When doing failover resurrect the other side quickly.
  
 
== Desktop environment suspend/resume ==
 
== Desktop environment suspend/resume ==
Line 28: Line 28:
 
Suspending a screen session and restoring it on another box might be interesting.
 
Suspending a screen session and restoring it on another box might be interesting.
 
Suspending some X app (browser?) and restoring it later is also worth thinking about but requires knowledge of X-protocol.
 
Suspending some X app (browser?) and restoring it later is also worth thinking about but requires knowledge of X-protocol.
 
== High availability ==
 
 
Take server snapshot every few minutes and put on other machine. When doing failover resurrect the other side quickly. This thing required "memory snapshotting" from TODO list.
 
  
 
== Processes duplication ==
 
== Processes duplication ==

Revision as of 21:34, 19 May 2013

This is a set of ideas how criu can be used

Container live migration

This is the use case from what the whole checkpoint/restore project appeared. Container is checkpointed, then the image is copied on another box, then restored. From the remote observer point of view the container is just frozen for a while. You can find more details on this scenario here

Slow-boot services speed up

If some service starts up too long (it can perform complex state initialization for example) we can checkpoint it after it finishes starting up and on the 2nd and subsequent starts restore it from the image.

Reboot-less upgrade

When replacing a kernel on a box we can do it without stopping critical activity. Checkpoint it, then replace the kernel (e.g. using kexec) then restore services back. In a perfect world the applications memory shouldn't be put to disk image, but should rather be kept in RAM.

Networking load balancing

Not the whole project, but the TCP repair can be used to offload an app-level request handling on another box.

HPC issues

High Performance Computing people may require it for two things:

  1. Load balancing a computational task over a cluster. It can be done in two directions -- push parts of a task on another box to utilize idle parts of a cluster or pull parts of a task to a local box to make better use of local caches.
  2. Periodic state save to avoid recomputation in case of a cluster crash. Take server snapshot every few minutes and put on other machine. When doing failover resurrect the other side quickly.

Desktop environment suspend/resume

Suspending a screen session and restoring it on another box might be interesting. Suspending some X app (browser?) and restoring it later is also worth thinking about but requires knowledge of X-protocol.

Processes duplication

Somewhat like a remote fork() ;)

"Save" ability in apps (games), that don't have such

Some arcades require you to complete next level to "fixup" the progress. With criu it can be done at any point.