Changes

522 bytes added , 22:49, 11 September 2018

Another crazy idea (dima)

Line 1: Line 1: −

This is a set of ideas how criu can be used

+

This is a set of ideas how criu can be used.

−

== Container ~~[[:Category:~~live migration~~|live migration]]~~ ==

+

== Container live migration ==

−

This is the use case from what the whole checkpoint/restore project appeared. Container is checkpointed, then the image is copied on another box, then restored. From the remote observer point of view the container is just frozen for a while. ~~You can find~~ more ~~details on this scenario~~ [[~~LXC | here~~]].

+

This is the use case from what the whole checkpoint/restore project appeared. Container is checkpointed, then the image is copied on another box, then restored. From the remote observer point of view the container is just frozen for a while.

+

''For more info, see [[:Category:live migration]].''

== Slow-boot services speed up ==

Line 11: Line 13:

We have a rough preliminary measurement, showing that VNC server + eclipse start time reduces from ~29 seconds to ~1.5.

−

== [[Seamless kernel upgrade]] ==

+

''Main article: [[slow-boot services speed up]].''

+

== Seamless kernel upgrade ==

When replacing a kernel on a box we can do it without stopping critical activity. Checkpoint it, then replace the kernel (e.g. using kexec) then restore services back. In a perfect world the applications memory shouldn't be put to disk image, but should rather be kept in RAM.

+

''Main article: [[Seamless kernel upgrade]]''.

== Networking load balancing ==

−

Not the whole project, but the [[~~TCP_connection~~|TCP repair]] can be used to offload an app-level request handling on another box.

+

Not the whole project, but the [[TCP connection|TCP repair]] can be used to offload an app-level request handling on another box.

== HPC issues ==

Line 26: Line 32:

# Periodic state save to avoid recomputation in case of a cluster crash. Take server snapshot every few minutes and put on other machine. When doing failover resurrect the other side quickly.

−

== ~~[[X applications|~~Desktop environment]] suspend/resume ==

+

== Desktop environment suspend/resume ==

Suspending a screen session and restoring it on another box might be interesting.

Suspending some X app (browser?) and restoring it later is also worth thinking about but requires knowledge of X-protocol.

+

''Main article: [[X applications]].''

== Processes duplication ==

Line 44: Line 52:

One of examples when this snapshot might be useful is debugging. One might need to bring an application into a "desired" state fast, and having dump at that state would speed things up.

+

''Main article: [[Incremental dumps]].''

== Move "forgotten" applications into "screen" ==

Line 59: Line 69:

== Fault-tolerant systems ==

−

With CRIU it's possible to periodically duplicate process on another box. Requires [[~~Applying~~ images]] facility.

+

With CRIU it's possible to periodically duplicate process on another box. Requires [[applying images]] facility.

+

== Update dryrun ==

+

Before updating a kernel/system libs one may duplicate a system service(s) into VM with updates and check they continue to run OK. If this test passes, then the real system update can be done.

+

== Zero downtime crash restore ==

+

Checkpoint critical service from /proc/vmcore in crash kernel and migrate on another machine.

+

[[Category:Using]]

+

[[Category:Editor help needed]]

Dsafonov

105

edits

Changes

Usage scenarios (edit)

Revision as of 22:49, 11 September 2018