Line 2: |
Line 2: |
| | | |
| [SoCC '24] On-demand and Parallel Checkpoint/Restore for GPU Applications | | [SoCC '24] On-demand and Parallel Checkpoint/Restore for GPU Applications |
| + | |
| + | [EuroSys '24] Just-In-Time Checkpointing: Low Cost Error Recovery from Deep Learning Training Failures |
| | | |
| [arXiv '23] PARALLELGPUOS: A Concurrent OS-level GPU Checkpoint and Restore System using Validated Speculation | | [arXiv '23] PARALLELGPUOS: A Concurrent OS-level GPU Checkpoint and Restore System using Validated Speculation |
| | | |
| [SC-W '23] Checkpoint/Restart for CUDA Kernels | | [SC-W '23] Checkpoint/Restart for CUDA Kernels |
| + | |
| + | [arXiv:2202.07848 '22] Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads |
| + | |
| + | [Wiley '21] Cricket: A virtualization layer for distributed execution of CUDA applications with checkpoint/restart support |
| + | |
| + | [EuroSys '20] Balancing efficiency and fairness in heterogeneous GPU clusters for deep learning |
| + | |
| + | [HPEC '20] Using Container Migration for HPC Workloads Resilience |
| + | |
| + | |
| + | |
| | | |
| | | |
Line 22: |
Line 35: |
| | | |
| [Cloud '24] FastMig: Leveraging FastFreeze to Establish Robust Service Liquidity in Cloud 2.0 | | [Cloud '24] FastMig: Leveraging FastFreeze to Establish Robust Service Liquidity in Cloud 2.0 |
| + | |
| + | [CCGRID '24] Workload-Aware Live Migratable Cloud Instance Detector |
| + | |
| + | [VLDB '23] ElasticNotebook: Enabling Live Migration for Computational Notebooks |
| + | |
| + | [SRDS '23] Transparent Fault Tolerance for Stateful Applications in Kubernetes with Checkpoint/Restore |
| + | |
| + | [ICFEC '23] Migration of Isolated Application Across Heterogeneous Edge Systems |
| + | |
| + | [TNSM '23] Design, Modeling, and Implementation of Robust Migration of Stateful Edge Microservices |
| + | |
| + | [WORDS '23] Evicting for the greater good: The case for Reactive Checkpointing in serverless computing |
| + | |
| + | [Cloud Summit '23] Microservice Debugging with Checkpoint-Restart |
| + | |
| + | [ICC '23] Processing-Aware Migration Model for Stateful Edge Microservices |
| + | |
| + | [DRONES '23] A Dynamic Checkpoint Interval Decision Algorithm for Live Migration-Based Drone-Recovery System |
| + | |
| + | [arXiv:2301.05861 '23] Async-fork: Mitigating Query Latency Spikes Incurred by the Fork-based Snapshot Mechanism from the OS Level |
| | | |
| [TOCS '22] H-Container: Enabling Heterogeneous-ISA Container Migration in Edge Computing | | [TOCS '22] H-Container: Enabling Heterogeneous-ISA Container Migration in Edge Computing |
| | | |
| [VEE '22] Portkey: hypervisor-assisted container migration in nested cloud environments | | [VEE '22] Portkey: hypervisor-assisted container migration in nested cloud environments |
| + | |
| + | [ICPADS '22] A Container Pre-copy Migration Method Based on Dirty Page Prediction and Compression |
| + | |
| + | [NetSoft '22] Demonstration of Containerized Central Unit Live Migration in 5G Radio Access Network |
| + | |
| + | [ATC '22] RRC: Responsive Replicated Containers |
| + | |
| + | [HAL '22] Good Shepherds Care For Their Cattle: Seamless Pod Migration in Geo-Distributed Kubernetes |
| + | |
| + | [ATC '21] MigrOS: Transparent Live-Migration Support for Containerised RDMA Applications |
| + | |
| + | [WoWMoM '21] Extending the QUIC Protocol to Support Live Container Migration at the Edge |
| + | |
| + | [MobileCloud '20] Docker Container Deployment in Distributed Fog Infrastructures with Checkpoint/Restart |
| + | |
| + | |
| + | |
| + | * CRIU Acceleration |
| + | |
| + | [EuroSys '24] Pronghorn: Effective Checkpoint Orchestration for Serverless Hot-Starts |
| + | |
| + | [FGCS '24] Prebaking runtime environments to improve the FaaS cold start latency |
| + | |
| + | [Middleware '23] DynaCut: A Framework for Dynamic and Adaptive Program Customization |
| + | |
| + | [Virginia Tech '23] CRIU-RTX: Remote Thread eXecution using Checkpoint/Restore in Userspace |
| + | |
| + | [Virginia Tech '23] HetMigrate: Secure and Efficient Cross-architecture Process Live Migration |
| + | |
| + | [OSDI '23] No Provisioned Concurrency: Fast RDMA-codesigned Remote Fork for Serverless Computing |
| + | |
| + | [SC '22] Out of hypervisor (OoH): efficient dirty page tracking in userspace using hardware virtualization features |
| + | |
| + | [JNCA '22] iContainer: Consecutive checkpointing with rapid resilience for immortal container-based services |
| + | |
| + | [VLSI '21] Standard-compliant parallel SystemC simulation of loosely-timed transaction level models: From baremetal to Linux-based applications support |
| + | |
| + | [Middleware '20] Prebaking Functions to Warm the Serverless Cold Start |
| + | |
| + | [MEMSYS '19] Fast in-memory CRIU for docker containers |
| + | |
| + | [MCHPC '19] Optimizing Post-Copy Live Migration with System-Level Checkpoint Using Fabric-Attached Memory |
| + | |
| + | |
| | | |
| | | |
Line 37: |
Line 114: |
| | | |
| [ATC '22] RRC: Responsive Replicated Containers | | [ATC '22] RRC: Responsive Replicated Containers |
| + | |
| + | [NDSS '22] FitM: Binary-Only Coverage-Guided Fuzzing for Stateful Network Protocols |
| + | |
| + | [SYSTEX '22] Transparent, Cross-ISA Enclave Offloading |
| + | |
| + | [IPDPS '20] Fault-Tolerant Containers Using NiLiCon |
| | | |
| | | |
Line 44: |
Line 127: |
| | | |
| [VLDB '23] Async-fork: Mitigating Query Latency Spikes Incurred by the Fork-based Snapshot Mechanism from the OS Level | | [VLDB '23] Async-fork: Mitigating Query Latency Spikes Incurred by the Fork-based Snapshot Mechanism from the OS Level |
| + | |
| + | [VLDB '23] ElasticNotebook: Enabling Live Migration for Computational Notebooks |
| + | |
| + | [arXiv:2301.05861 '23] Async-fork: Mitigating Query Latency Spikes Incurred by the Fork-based Snapshot Mechanism from the OS Level |
| | | |
| [EuroSys '21] On-demand-fork: a microsecond fork for memory-intensive and latency-sensitive applications | | [EuroSys '21] On-demand-fork: a microsecond fork for memory-intensive and latency-sensitive applications |