Difference between revisions of "Usage scenarios"

Latest revision as of 22:49, 11 September 2018

This is a set of ideas how criu can be used.

Container live migration[edit]

This is the use case from what the whole checkpoint/restore project appeared. Container is checkpointed, then the image is copied on another box, then restored. From the remote observer point of view the container is just frozen for a while.

For more info, see Category:live migration.

Slow-boot services speed up[edit]

If some service starts up too long (it can perform complex state initialization for example) we can checkpoint it after it finishes starting up and on the 2nd and subsequent starts restore it from the image.

We have a rough preliminary measurement, showing that VNC server + eclipse start time reduces from ~29 seconds to ~1.5.

Main article: slow-boot services speed up.

Seamless kernel upgrade[edit]

When replacing a kernel on a box we can do it without stopping critical activity. Checkpoint it, then replace the kernel (e.g. using kexec) then restore services back. In a perfect world the applications memory shouldn't be put to disk image, but should rather be kept in RAM.

Main article: Seamless kernel upgrade.

Networking load balancing[edit]

Not the whole project, but the TCP repair can be used to offload an app-level request handling on another box.

HPC issues[edit]

High Performance Computing people may require it for two things:

Load balancing a computational task over a cluster. It can be done in two directions -- push parts of a task on another box to utilize idle parts of a cluster or pull parts of a task to a local box to make better use of local caches.
Periodic state save to avoid recomputation in case of a cluster crash. Take server snapshot every few minutes and put on other machine. When doing failover resurrect the other side quickly.

Desktop environment suspend/resume[edit]

Suspending a screen session and restoring it on another box might be interesting. Suspending some X app (browser?) and restoring it later is also worth thinking about but requires knowledge of X-protocol.

Main article: X applications.

Processes duplication[edit]

Somewhat like a remote fork() ;)

"Save" ability in apps (games), that don't have such[edit]

Some arcades require you to complete next level to "fixup" the progress. With criu it can be done at any point.

Snapshots of apps[edit]

With CRIU one can save a series of app's states (all but first incremental) and revert later to any of them. The "apply-images" item from TODO list should help to revert the state faster, especially if the memory changes tracker state is with us.

One of examples when this snapshot might be useful is debugging. One might need to bring an application into a "desired" state fast, and having dump at that state would speed things up.

Main article: Incremental dumps.

Move "forgotten" applications into "screen"[edit]

Sometimes it's useful to launch a process in "screen". If you forgot to switch into screen, but launched a task, criu can help to "migrate" the app into it.

Applications behavior analysis on another machine[edit]

It's possible to take periodic snapshots of running applications and transfer them on another machine for debugging or behavior and performance analysis.

Debugging of hung application[edit]

If there's some service, that got hung, but need to be restarted quickly, it's possible to take a dump of one, restart and debug why it hanged later, using its restored copy.

Fault-tolerant systems[edit]

With CRIU it's possible to periodically duplicate process on another box. Requires applying images facility.

Update dryrun[edit]

Before updating a kernel/system libs one may duplicate a system service(s) into VM with updates and check they continue to run OK. If this test passes, then the real system update can be done.

Zero downtime crash restore[edit]

Checkpoint critical service from /proc/vmcore in crash kernel and migrate on another machine.

@@ Line 1: / Line 1: @@
-This is a set of ideas how crtools can be used
+This is a set of ideas how criu can be used.
 == Container live migration ==
 This is the use case from what the whole checkpoint/restore project appeared. Container is checkpointed, then the image is copied on another box, then restored. From the remote observer point of view the container is just frozen for a while.
-[[LXC | Here are the detailed instructions how to dump/restore a container.]]
+''For more info, see [[:Category:live migration]].''
 == Slow-boot services speed up ==
@@ Line 10: / Line 11: @@
 If some service starts up too long (it can perform complex state initialization for example) we can checkpoint it after it finishes starting up and on the 2nd and subsequent starts restore it from the image.
-== Reboot-less upgrade ==
+We have a rough preliminary measurement, showing that VNC server + eclipse start time reduces from ~29 seconds to ~1.5.
+''Main article: [[slow-boot services speed up]].''
+== Seamless kernel upgrade ==
 When replacing a kernel on a box we can do it without stopping critical activity. Checkpoint it, then replace the kernel (e.g. using kexec) then restore services back. In a perfect world the applications memory shouldn't be put to disk image, but should rather be kept in RAM.
+''Main article: [[Seamless kernel upgrade]]''.
 == Networking load balancing ==
-Not the whole project, but the [[TCP_connection|TCP repair]] can be used to offload an app-level request handling on another box.
+Not the whole project, but the [[TCP connection|TCP repair]] can be used to offload an app-level request handling on another box.
 == HPC issues ==
@@ Line 23: / Line 30: @@
 # Load balancing a computational task over a cluster. It can be done in two directions -- push parts of a task on another box to utilize idle parts of a cluster or pull parts of a task to a local box to make better use of local caches.
-# Periodic state save to avoid recomputation in case of a cluster crash.
+# Periodic state save to avoid recomputation in case of a cluster crash. Take server snapshot every few minutes and put on other machine. When doing failover resurrect the other side quickly.
 == Desktop environment suspend/resume ==
@@ Line 29: / Line 36: @@
 Suspending a screen session and restoring it on another box might be interesting.
 Suspending some X app (browser?) and restoring it later is also worth thinking about but requires knowledge of X-protocol.
+''Main article: [[X applications]].''
+== Processes duplication ==
+Somewhat like a remote fork() ;)
+== "Save" ability in apps (games), that don't have such ==
+Some arcades require you to complete next level to "fixup" the progress. With criu it can be done at any point.
+== Snapshots of apps ==
+With CRIU one can save a series of app's states (all but first incremental) and revert later to any of them. The "apply-images" item from TODO list should help to revert the state faster, especially if the memory changes tracker state is with us.
+One of examples when this snapshot might be useful is debugging. One might need to bring an application into a "desired" state fast, and having dump at that state would speed things up.
+''Main article: [[Incremental dumps]].''
+== Move "forgotten" applications into "screen" ==
+Sometimes it's useful to launch a process in "screen". If you forgot to switch into screen, but launched a task, criu can help to "migrate" the app into it.
+== Applications behavior analysis on another machine ==
+It's possible to take periodic snapshots of running applications and transfer them on another machine for debugging or behavior and performance analysis.
+== Debugging of hung application ==
+If there's some service, that got hung, but need to be restarted quickly, it's possible to take a dump of one, restart and debug why it hanged later, using its restored copy.
+== Fault-tolerant systems ==
+With CRIU it's possible to periodically duplicate process on another box. Requires [[applying images]] facility.
+== Update dryrun ==
+Before updating a kernel/system libs one may duplicate a system service(s) into VM with updates and check they continue to run OK. If this test passes, then the real system update can be done.
+== Zero downtime crash restore ==
+Checkpoint critical service from /proc/vmcore in crash kernel and migrate on another machine.
+[[Category:Using]]
+[[Category:Editor help needed]]