Line 1: |
Line 1: |
− | This is a set of ideas how criu can be used | + | This is a set of ideas how criu can be used. |
| | | |
− | == Container [[:Category:live migration|live migration]] == | + | == Container live migration == |
| | | |
− | This is the use case from what the whole checkpoint/restore project appeared. Container is checkpointed, then the image is copied on another box, then restored. From the remote observer point of view the container is just frozen for a while. You can find more details on this scenario [[LXC | here]]. | + | This is the use case from what the whole checkpoint/restore project appeared. Container is checkpointed, then the image is copied on another box, then restored. From the remote observer point of view the container is just frozen for a while. |
| + | |
| + | ''For more info, see [[:Category:live migration]].'' |
| | | |
| == Slow-boot services speed up == | | == Slow-boot services speed up == |
Line 11: |
Line 13: |
| We have a rough preliminary measurement, showing that VNC server + eclipse start time reduces from ~29 seconds to ~1.5. | | We have a rough preliminary measurement, showing that VNC server + eclipse start time reduces from ~29 seconds to ~1.5. |
| | | |
− | == [[Seamless kernel upgrade]] == | + | ''Main article: [[slow-boot services speed up]].'' |
| + | |
| + | == Seamless kernel upgrade == |
| | | |
| When replacing a kernel on a box we can do it without stopping critical activity. Checkpoint it, then replace the kernel (e.g. using kexec) then restore services back. In a perfect world the applications memory shouldn't be put to disk image, but should rather be kept in RAM. | | When replacing a kernel on a box we can do it without stopping critical activity. Checkpoint it, then replace the kernel (e.g. using kexec) then restore services back. In a perfect world the applications memory shouldn't be put to disk image, but should rather be kept in RAM. |
| + | |
| + | ''Main article: [[Seamless kernel upgrade]]''. |
| | | |
| == Networking load balancing == | | == Networking load balancing == |
| | | |
− | Not the whole project, but the [[TCP_connection|TCP repair]] can be used to offload an app-level request handling on another box. | + | Not the whole project, but the [[TCP connection|TCP repair]] can be used to offload an app-level request handling on another box. |
| | | |
| == HPC issues == | | == HPC issues == |
Line 26: |
Line 32: |
| # Periodic state save to avoid recomputation in case of a cluster crash. Take server snapshot every few minutes and put on other machine. When doing failover resurrect the other side quickly. | | # Periodic state save to avoid recomputation in case of a cluster crash. Take server snapshot every few minutes and put on other machine. When doing failover resurrect the other side quickly. |
| | | |
− | == [[X applications|Desktop environment]] suspend/resume == | + | == Desktop environment suspend/resume == |
| | | |
| Suspending a screen session and restoring it on another box might be interesting. | | Suspending a screen session and restoring it on another box might be interesting. |
| Suspending some X app (browser?) and restoring it later is also worth thinking about but requires knowledge of X-protocol. | | Suspending some X app (browser?) and restoring it later is also worth thinking about but requires knowledge of X-protocol. |
| + | |
| + | ''Main article: [[X applications]].'' |
| | | |
| == Processes duplication == | | == Processes duplication == |
Line 44: |
Line 52: |
| | | |
| One of examples when this snapshot might be useful is debugging. One might need to bring an application into a "desired" state fast, and having dump at that state would speed things up. | | One of examples when this snapshot might be useful is debugging. One might need to bring an application into a "desired" state fast, and having dump at that state would speed things up. |
| + | |
| + | ''Main article: [[Incremental dumps]].'' |
| | | |
| == Move "forgotten" applications into "screen" == | | == Move "forgotten" applications into "screen" == |
Line 59: |
Line 69: |
| == Fault-tolerant systems == | | == Fault-tolerant systems == |
| | | |
− | With CRIU it's possible to periodically duplicate process on another box. Requires [[Applying images]] facility. | + | With CRIU it's possible to periodically duplicate process on another box. Requires [[applying images]] facility. |
| + | |
| + | == Update dryrun == |
| + | |
| + | Before updating a kernel/system libs one may duplicate a system service(s) into VM with updates and check they continue to run OK. If this test passes, then the real system update can be done. |
| + | |
| + | == Zero downtime crash restore == |
| + | |
| + | Checkpoint critical service from /proc/vmcore in crash kernel and migrate on another machine. |
| + | |
| + | [[Category:Using]] |
| + | [[Category:Editor help needed]] |