Difference between revisions of "TCP connection"

From CRIU
Jump to: navigation, search
(crtools -> criu)
(some more formatting)
Line 27: Line 27:
 
When set the queue to repair as described above, one can call recv or send syscalls on a repaired socket. Both calls
 
When set the queue to repair as described above, one can call recv or send syscalls on a repaired socket. Both calls
 
result on peeking or poking data from/to the respective queue. This sounds funny, but yes, for repaired socket one
 
result on peeking or poking data from/to the respective queue. This sounds funny, but yes, for repaired socket one
can receve the outgoing and send the incoming queues. Using the MSG_PEEK flag for recv is required.
+
can receve the outgoing and send the incoming queues. Using the <code>MSG_PEEK</code> flag for <code>recv()</code> is required.
  
 
=== Options ===
 
=== Options ===
Line 38: Line 38:
 
* tstamp -- whether timestamps on packets are supported
 
* tstamp -- whether timestamps on packets are supported
  
All four can be read with getsockopt calls to a socket and in order to restore them the TCP_REPAIR_OPTIONS sockoption
+
All four can be read with <code>getsockopt()</code> calls to a socket and in order to restore them the <code>TCP_REPAIR_OPTIONS</code> sockoption is introduced.
is introduced.
 
  
 
== Checkpoint and restore TCP connection ==
 
== Checkpoint and restore TCP connection ==
Line 46: Line 45:
 
state and restores it back letting the protocol resurrect the data sequence.
 
state and restores it back letting the protocol resurrect the data sequence.
  
One thing to note here -- while the socket is closed between dump and restore the connection should be "locked", i.e.
+
One thing to note here while the socket is closed between dump and restore the connection should be "locked", i.e.
 
no packets from peer should enter the stack, otherwise the RST will be sent by a kernel. In order to do so a simple
 
no packets from peer should enter the stack, otherwise the RST will be sent by a kernel. In order to do so a simple
 
netfilter rule is configured that drops all the packets from peer to a socket we're dealing with. This rule sits
 
netfilter rule is configured that drops all the packets from peer to a socket we're dealing with. This rule sits
Line 52: Line 51:
 
criu restore one.
 
criu restore one.
  
That said, the command line option --tcp-established should be used when calling criu to explicitly state, that the
+
That said, the command line option <code>--tcp-established</code> should be used when calling criu to explicitly state, that the
 
caller is aware of this "transitional" state of the netfilter.
 
caller is aware of this "transitional" state of the netfilter.
  

Revision as of 16:57, 30 April 2013

This page describes how we handle established TCP connections.

TCP repair mode in kernel

The TCP_REPAIR socket option was added to the kernel 3.5 to help with C/R for TCP sockets.

When this option is used, a socket is switched into a special mode, in which any action performed on it does not result in anything defined by an appropriate protocol actions, but rather directly puts the socket into a state, in which the socket is expected to be at the end of the successfully finished operation.

For example, calling connect() on a repaired socket just changes its state to ESTABLISHED, with the peer address set as requested. The bind() call forcibly binds the socket to a given address (ignoring any potential conflicts). The close() call closes the socket without any transient FIN_WAIT/TIME_WAIT/etc states, socket is silently killed.

Sequences

To restore the connection properly, bind() and connect() is not enough. One also needs to restore the TCP sequence numbers. To do so, the TCP_REPAIR_QUEUE and TCP_QUEUE_SEQ options were introduced.

The former one selects which queue (input or output) will be repaired and the latter gets/sets the sequence. Note setting the sequence is only possible on CLOSE-d socket.

Packets in queue

When set the queue to repair as described above, one can call recv or send syscalls on a repaired socket. Both calls result on peeking or poking data from/to the respective queue. This sounds funny, but yes, for repaired socket one can receve the outgoing and send the incoming queues. Using the MSG_PEEK flag for recv() is required.

Options

There are 4 options that are negotiated by the socket at the connecting stage. These are

  • mss_clamp -- the maximum size of the segment peer is ready to accept
  • snd _scale -- the scale factor for a window
  • sack -- whether selective acks are permitted or not
  • tstamp -- whether timestamps on packets are supported

All four can be read with getsockopt() calls to a socket and in order to restore them the TCP_REPAIR_OPTIONS sockoption is introduced.

Checkpoint and restore TCP connection

With the above sockoptions dumping and restoring TCP connection becomes possible. The criu just reads the socket state and restores it back letting the protocol resurrect the data sequence.

One thing to note here — while the socket is closed between dump and restore the connection should be "locked", i.e. no packets from peer should enter the stack, otherwise the RST will be sent by a kernel. In order to do so a simple netfilter rule is configured that drops all the packets from peer to a socket we're dealing with. This rule sits in the host netfilter tables after the criu dump command finishes and it should be there when you issue the criu restore one.

That said, the command line option --tcp-established should be used when calling criu to explicitly state, that the caller is aware of this "transitional" state of the netfilter.

More info