Academic Research

Revision as of 02:55, 18 December 2024 by Wenhuizhang (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
  • GPU CRIU

[SoCC '24] On-demand and Parallel Checkpoint/Restore for GPU Applications

[EuroSys '24] Just-In-Time Checkpointing: Low Cost Error Recovery from Deep Learning Training Failures

[arXiv '23] PARALLELGPUOS: A Concurrent OS-level GPU Checkpoint and Restore System using Validated Speculation

[SC-W '23] Checkpoint/Restart for CUDA Kernels

[arXiv:2202.07848 '22] Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads

[Wiley '21] Cricket: A virtualization layer for distributed execution of CUDA applications with checkpoint/restart support

[EuroSys '20] Balancing efficiency and fairness in heterogeneous GPU clusters for deep learning

[HPEC '20] Using Container Migration for HPC Workloads Resilience




  • CRIU for Migration

[APNet '24] Software-based Live Migration for Containerized RDMA

[SEATED '24] Live Migration of Multi-Container Kubernetes Pods in Multi-Cluster Serverless Edge Systems

[ICT '24] Packet Buffering to Minimize Service Downtime and Packet Loss During Redundancy Switchover

[SIGMOD/PODS '24] Demonstration of ElasticNotebook: Migrating Live Computational Notebook States

[ICDCS '24] Dapper: A Lightweight and Extensible Framework for Live Program State Rewriting

[Cloud '24] FastMig: Leveraging FastFreeze to Establish Robust Service Liquidity in Cloud 2.0

[CCGRID '24] Workload-Aware Live Migratable Cloud Instance Detector

[VLDB '23] ElasticNotebook: Enabling Live Migration for Computational Notebooks

[SRDS '23] Transparent Fault Tolerance for Stateful Applications in Kubernetes with Checkpoint/Restore

[ICFEC '23] Migration of Isolated Application Across Heterogeneous Edge Systems

[TNSM '23] Design, Modeling, and Implementation of Robust Migration of Stateful Edge Microservices

[WORDS '23] Evicting for the greater good: The case for Reactive Checkpointing in serverless computing

[Cloud Summit '23] Microservice Debugging with Checkpoint-Restart

[ICC '23] Processing-Aware Migration Model for Stateful Edge Microservices

[DRONES '23] A Dynamic Checkpoint Interval Decision Algorithm for Live Migration-Based Drone-Recovery System

[arXiv:2301.05861 '23] Async-fork: Mitigating Query Latency Spikes Incurred by the Fork-based Snapshot Mechanism from the OS Level

[TOCS '22] H-Container: Enabling Heterogeneous-ISA Container Migration in Edge Computing

[VEE '22] Portkey: hypervisor-assisted container migration in nested cloud environments

[ICPADS '22] A Container Pre-copy Migration Method Based on Dirty Page Prediction and Compression

[NetSoft '22] Demonstration of Containerized Central Unit Live Migration in 5G Radio Access Network

[ATC '22] RRC: Responsive Replicated Containers

[HAL '22] Good Shepherds Care For Their Cattle: Seamless Pod Migration in Geo-Distributed Kubernetes

[ATC '21] MigrOS: Transparent Live-Migration Support for Containerised RDMA Applications

[WoWMoM '21] Extending the QUIC Protocol to Support Live Container Migration at the Edge

[MobileCloud '20] Docker Container Deployment in Distributed Fog Infrastructures with Checkpoint/Restart


  • CRIU Acceleration

[EuroSys '24] Pronghorn: Effective Checkpoint Orchestration for Serverless Hot-Starts

[FGCS '24] Prebaking runtime environments to improve the FaaS cold start latency

[Middleware '23] DynaCut: A Framework for Dynamic and Adaptive Program Customization

[Virginia Tech '23] CRIU-RTX: Remote Thread eXecution using Checkpoint/Restore in Userspace

[Virginia Tech '23] HetMigrate: Secure and Efficient Cross-architecture Process Live Migration

[OSDI '23] No Provisioned Concurrency: Fast RDMA-codesigned Remote Fork for Serverless Computing

[SC '22] Out of hypervisor (OoH): efficient dirty page tracking in userspace using hardware virtualization features

[JNCA '22] iContainer: Consecutive checkpointing with rapid resilience for immortal container-based services

[VLSI '21] Standard-compliant parallel SystemC simulation of loosely-timed transaction level models: From baremetal to Linux-based applications support

[Middleware '20] Prebaking Functions to Warm the Serverless Cold Start

[MEMSYS '19] Fast in-memory CRIU for docker containers

[MCHPC '19] Optimizing Post-Copy Live Migration with System-Level Checkpoint Using Fabric-Attached Memory



  • CRIU Security

[APSys '24] Towards Efficient End-to-End Encryption for Container Checkpointing Systems

[eBPF '24] Custom Page Fault Handling With eBPF

[ARES '24] Don't, Stop, Drop, Pause: Forensics of CONtainer CheckPOINTs (ConPoint)

[ATC '22] RRC: Responsive Replicated Containers

[NDSS '22] FitM: Binary-Only Coverage-Guided Fuzzing for Stateful Network Protocols

[SYSTEX '22] Transparent, Cross-ISA Enclave Offloading

[IPDPS '20] Fault-Tolerant Containers Using NiLiCon


  • CRIU for Database

[Journal of Cloud Computing '24] MDB-KCP: persistence framework of in-memory database with CRIU-based container checkpoint in Kubernetes

[VLDB '23] Async-fork: Mitigating Query Latency Spikes Incurred by the Fork-based Snapshot Mechanism from the OS Level

[VLDB '23] ElasticNotebook: Enabling Live Migration for Computational Notebooks

[arXiv:2301.05861 '23] Async-fork: Mitigating Query Latency Spikes Incurred by the Fork-based Snapshot Mechanism from the OS Level

[EuroSys '21] On-demand-fork: a microsecond fork for memory-intensive and latency-sensitive applications