Difference between revisions of "Academic Research"

Revision as of 01:14, 11 March 2023

Optimization of CRIU
CRIU for Migration
CRIU for Security
CRIU for Database

[VLDB'23] Async-fork: Mitigating Query Latency Spikes Incurred by the Fork-based Snapshot Mechanism from the OS Level Abstract In-memory key-value stores (IMKVSes) serve many online applications. They generally adopt the fork-based snapshot mechanism to support data backup. However, this method can result in query latency spikes because the engine is out-of-service for queries during the snapshot. In contrast to existing research optimizing snapshot algorithms, we address the problem from the operating system (OS) level, while keeping the data persistent mechanism in IMKVSes unchanged. Specifically, we first study the impact of the fork operation on query latency. Based on findings in the study, we propose Async-fork, which performs the fork operation asynchronously to reduce the out-of-service time of the engine. Async-fork is implemented in the Linux kernel and deployed into the online Redis database in public clouds. Our experiment results show that Async-fork can significantly reduce the tail latency of queries during the snapshot.

[EuroSys'21] On-demand-fork: a microsecond fork for memory-intensive and latency-sensitive applications ABSTRACT Fork has long been the process creation system call for Unix. At its inception, fork was hailed as an efficient system call due to its use of copy-on-write on memory shared between parent and child processes. However, application memory demand has increased drastically since the early days and the cost incurred by fork to simply set up virtual memory (e.g., copy page tables) is now a concern, even for applications that only require hundreds of MBs of memory. In practice, fork performance already holds back system efficiency and latency across a range of uses cases that fork large processes, such as fault-tolerant systems, serverless frameworks, and testing frameworks.

This paper proposes On-demand-fork, a fast implementation of the fork system call specifically designed for applications with large memory footprints. On-demand-fork relies on the observation that copy-on-write can be generalized to page tables, even on commodity hardware. On-demand-fork executes faster than the traditional fork implementation by additionally sharing page tables between parent and child at fork time and selectively copying page tables in small chunks, on-demand, when handling page faults. On-demand-fork is a drop-in replacement for fork that requires no changes to applications or hardware.

We evaluated On-demand-fork on a range of micro-benchmarks and real-world workloads. On-demand-fork significantly reduces the fork invocation time and has improved scalability. For processes with 1 GB of allocated memory, On-demand-fork has a 65× performance advantage over Fork. We also evaluated On-demand-fork on testing, fuzzing, and snapshotting workloads of well-known applications, obtaining execution throughput improvements between 59% and 226% and up to 99% invocation latency reduction.

Difference between revisions of "Academic Research"

Revision as of 01:14, 11 March 2023

Navigation menu

Search

@@ Line 3: / Line 3: @@
 * CRIU for Security
 * CRIU for Database
+[VLDB'23] Async-fork: Mitigating Query Latency Spikes Incurred by the Fork-based Snapshot Mechanism from the OS Level
+Abstract
+In-memory key-value stores (IMKVSes) serve many online applications. They generally adopt the fork-based snapshot mechanism to support data backup. However, this method can result in query latency spikes because the engine is out-of-service for queries during the snapshot. In contrast to existing research optimizing snapshot algorithms, we address the problem from the operating system (OS) level, while keeping the data persistent mechanism in IMKVSes unchanged. Specifically, we first study the impact of the fork operation on query latency. Based on findings in the study, we propose Async-fork, which performs the fork operation asynchronously to reduce the out-of-service time of the engine. Async-fork is implemented in the Linux kernel and deployed into the online Redis database in public clouds. Our experiment results show that Async-fork can significantly reduce the tail latency of queries during the snapshot.
+[EuroSys'21] On-demand-fork: a microsecond fork for memory-intensive and latency-sensitive applications
+ABSTRACT
+Fork has long been the process creation system call for Unix. At its inception, fork was hailed as an efficient system call due to its use of copy-on-write on memory shared between parent and child processes. However, application memory demand has increased drastically since the early days and the cost incurred by fork to simply set up virtual memory (e.g., copy page tables) is now a concern, even for applications that only require hundreds of MBs of memory. In practice, fork performance already holds back system efficiency and latency across a range of uses cases that fork large processes, such as fault-tolerant systems, serverless frameworks, and testing frameworks.
+This paper proposes On-demand-fork, a fast implementation of the fork system call specifically designed for applications with large memory footprints. On-demand-fork relies on the observation that copy-on-write can be generalized to page tables, even on commodity hardware. On-demand-fork executes faster than the traditional fork implementation by additionally sharing page tables between parent and child at fork time and selectively copying page tables in small chunks, on-demand, when handling page faults. On-demand-fork is a drop-in replacement for fork that requires no changes to applications or hardware.
+We evaluated On-demand-fork on a range of micro-benchmarks and real-world workloads. On-demand-fork significantly reduces the fork invocation time and has improved scalability. For processes with 1 GB of allocated memory, On-demand-fork has a 65× performance advantage over Fork. We also evaluated On-demand-fork on testing, fuzzing, and snapshotting workloads of well-known applications, obtaining execution throughput improvements between 59% and 226% and up to 99% invocation latency reduction.