Changes

3,472 bytes added ,  08:21, 18 April 2022
draft of the article about rseq support in CRIU
"Restartable sequences" (<code>rseq</code>) are small segments of user-space code designed to access per-CPU data structures without the need for heavyweight locking.
rseq is supported since Linux kernel 4.18 [1]

== Linux kernel interface ==

The Linux kernel interface for rseq is fairly simple. It's just <code>rseq</code> syscall:
<code>sys_rseq(struct rseq *rseq, uint32_t rseq_len, int flags, uint32_t sig)</code>

<pre>
enum rseq_cs_flags {
RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT = (1U << RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT_BIT),
RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL = (1U << RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL_BIT),
RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE = (1U << RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT),
};

struct rseq_cs {
__u32 version; /* always 0 at this moment */
enum rseq_cs_flags flags;
void *start_ip;
/* Offset from start_ip. */
intptr_t post_commit_offset;
void *abort_ip;
}

struct rseq {
__u32 cpu_id_start;
__u32 cpu_id;
struct rseq_cs *rseq_cs;
enum rseq_cs_flags flags;
}
</pre>

From the userspace side, we need to keep <code>struct rseq</code> somewhere and register it on the kernel side using the <code>rseq</code> syscall.
Then, once we want to execute some code as a rseq critical section (<code>rseq cs</code> or just CS) we need to allocate and fill with the data
<code>struct rseq_cs</code>. We have to specify the start address of our CS, and the address of the abort handler (called when CS was interrupted by a preemption, migration
or signal). Then we need to put an pointer to <code>struct rseq_cs</code> to the <code>(struct rseq).rseq_cs</code> field.

== What about <code>flags</code>? ==

You may have noticed that both <code>struct rseq</code> and <code>struct rseq_cs</code> have <code>flags</code> field. It may took values from <code>enum rseq_cs_flags</code>.

First of all, a user may specify flags in any place they will be combined on the kernel side:
<pre>
static int rseq_need_restart(struct task_struct *t, u32 cs_flags)
{
u32 flags, event_mask;
int ret;

/* Get thread flags. */
ret = get_user(flags, &t->rseq->flags);
if (ret)
return ret;

/* Take critical section flags into account. */
flags |= cs_flags; // <<<<<<<< here we have flags combined from struct rseq + struct rseq_cs
</pre>

The most common <code>flags</code> value is zero. In this case, the rseq CS will be interrupted (and IP will be fixed up to the abort handler)
if preemption, migration, or signal occurs. But there are situations when users may want not to abort section once one of these events happen.

It's worth mentioning that <code>RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL</code> can be used only in combination with <code>RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT</code> and <code>RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE</code>:
<pre>
/*
* Restart on signal can only be inhibited when restart on
* preempt and restart on migrate are inhibited too. Otherwise,
* a preempted signal handler could fail to restart the prior
* execution context on sigreturn.
*/
if (unlikely((flags & RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL) &&
(flags & RSEQ_CS_PREEMPT_MIGRATE_FLAGS) !=
RSEQ_CS_PREEMPT_MIGRATE_FLAGS))
return -EINVAL;
</pre>



== Useful links ==

* [1] https://github.com/torvalds/linux/blob/b2d229d4ddb17db541098b83524d901257e93845/kernel/rseq.c#L1
* [2] https://www.efficios.com/blog/2019/02/08/linux-restartable-sequences/
* [3] https://lwn.net/Articles/883104/

[[Category: Under the hood]]
[[Category: Editor help needed]]