Academic Research
- GPU CRIU
[SoCC '24] On-demand and Parallel Checkpoint/Restore for GPU Applications
[EuroSys '24] Just-In-Time Checkpointing: Low Cost Error Recovery from Deep Learning Training Failures
[arXiv '23] PARALLELGPUOS: A Concurrent OS-level GPU Checkpoint and Restore System using Validated Speculation
[SC-W '23] Checkpoint/Restart for CUDA Kernels
[arXiv:2202.07848 '22] Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads
[Wiley '21] Cricket: A virtualization layer for distributed execution of CUDA applications with checkpoint/restart support
[EuroSys '20] Balancing efficiency and fairness in heterogeneous GPU clusters for deep learning
[HPEC '20] Using Container Migration for HPC Workloads Resilience
- CRIU for Migration
[APNet '24] Software-based Live Migration for Containerized RDMA
[SEATED '24] Live Migration of Multi-Container Kubernetes Pods in Multi-Cluster Serverless Edge Systems
[ICT '24] Packet Buffering to Minimize Service Downtime and Packet Loss During Redundancy Switchover
[SIGMOD/PODS '24] Demonstration of ElasticNotebook: Migrating Live Computational Notebook States
[ICDCS '24] Dapper: A Lightweight and Extensible Framework for Live Program State Rewriting
[Cloud '24] FastMig: Leveraging FastFreeze to Establish Robust Service Liquidity in Cloud 2.0
[CCGRID '24] Workload-Aware Live Migratable Cloud Instance Detector
[VLDB '23] ElasticNotebook: Enabling Live Migration for Computational Notebooks
[SRDS '23] Transparent Fault Tolerance for Stateful Applications in Kubernetes with Checkpoint/Restore
[ICFEC '23] Migration of Isolated Application Across Heterogeneous Edge Systems
[TNSM '23] Design, Modeling, and Implementation of Robust Migration of Stateful Edge Microservices
[WORDS '23] Evicting for the greater good: The case for Reactive Checkpointing in serverless computing
[Cloud Summit '23] Microservice Debugging with Checkpoint-Restart
[ICC '23] Processing-Aware Migration Model for Stateful Edge Microservices
[DRONES '23] A Dynamic Checkpoint Interval Decision Algorithm for Live Migration-Based Drone-Recovery System
[arXiv:2301.05861 '23] Async-fork: Mitigating Query Latency Spikes Incurred by the Fork-based Snapshot Mechanism from the OS Level
[TOCS '22] H-Container: Enabling Heterogeneous-ISA Container Migration in Edge Computing
[VEE '22] Portkey: hypervisor-assisted container migration in nested cloud environments
[ICPADS '22] A Container Pre-copy Migration Method Based on Dirty Page Prediction and Compression
[NetSoft '22] Demonstration of Containerized Central Unit Live Migration in 5G Radio Access Network
[ATC '22] RRC: Responsive Replicated Containers
[HAL '22] Good Shepherds Care For Their Cattle: Seamless Pod Migration in Geo-Distributed Kubernetes
[ATC '21] MigrOS: Transparent Live-Migration Support for Containerised RDMA Applications
[WoWMoM '21] Extending the QUIC Protocol to Support Live Container Migration at the Edge
[MobileCloud '20] Docker Container Deployment in Distributed Fog Infrastructures with Checkpoint/Restart
- CRIU Acceleration
[EuroSys '24] Pronghorn: Effective Checkpoint Orchestration for Serverless Hot-Starts
[FGCS '24] Prebaking runtime environments to improve the FaaS cold start latency
[Middleware '23] DynaCut: A Framework for Dynamic and Adaptive Program Customization
[Virginia Tech '23] CRIU-RTX: Remote Thread eXecution using Checkpoint/Restore in Userspace
[Virginia Tech '23] HetMigrate: Secure and Efficient Cross-architecture Process Live Migration
[OSDI '23] No Provisioned Concurrency: Fast RDMA-codesigned Remote Fork for Serverless Computing
[SC '22] Out of hypervisor (OoH): efficient dirty page tracking in userspace using hardware virtualization features
[JNCA '22] iContainer: Consecutive checkpointing with rapid resilience for immortal container-based services
[VLSI '21] Standard-compliant parallel SystemC simulation of loosely-timed transaction level models: From baremetal to Linux-based applications support
[Middleware '20] Prebaking Functions to Warm the Serverless Cold Start
[MEMSYS '19] Fast in-memory CRIU for docker containers
[MCHPC '19] Optimizing Post-Copy Live Migration with System-Level Checkpoint Using Fabric-Attached Memory
- CRIU Security
[APSys '24] Towards Efficient End-to-End Encryption for Container Checkpointing Systems
[eBPF '24] Custom Page Fault Handling With eBPF
[ARES '24] Don't, Stop, Drop, Pause: Forensics of CONtainer CheckPOINTs (ConPoint)
[ATC '22] RRC: Responsive Replicated Containers
[NDSS '22] FitM: Binary-Only Coverage-Guided Fuzzing for Stateful Network Protocols
[SYSTEX '22] Transparent, Cross-ISA Enclave Offloading
[IPDPS '20] Fault-Tolerant Containers Using NiLiCon
- CRIU for Database
[Journal of Cloud Computing '24] MDB-KCP: persistence framework of in-memory database with CRIU-based container checkpoint in Kubernetes
[VLDB '23] Async-fork: Mitigating Query Latency Spikes Incurred by the Fork-based Snapshot Mechanism from the OS Level
[VLDB '23] ElasticNotebook: Enabling Live Migration for Computational Notebooks
[arXiv:2301.05861 '23] Async-fork: Mitigating Query Latency Spikes Incurred by the Fork-based Snapshot Mechanism from the OS Level
[EuroSys '21] On-demand-fork: a microsecond fork for memory-intensive and latency-sensitive applications