Difference between revisions of "TCP connection"
(SUBJ1) |
(Undo revision 340 by 91.121.27.33 (talk)) |
||
Line 1: | Line 1: | ||
− | + | This page describes how we handle established TCP connections | |
+ | |||
+ | == TCP repair mode in kernel == | ||
+ | |||
+ | The sockoption called TCP_REPAIR was recently added to the kernel and helps with doing C/R for TCP sockets. | ||
+ | |||
+ | When set this option turn the socket into a special state in which any action performed on it doesn't | ||
+ | result in any defined by protocol actions, but instead directly puts the socket into a state, which | ||
+ | should be at the end of the successfully finished operation. | ||
+ | |||
+ | E.g. calling connect() on a repaired socket just switches one to the ESTABLISHED state with the peer set as requested. | ||
+ | The bind() call forcibly binds the socket to a given address (ignoring any potential conflicts). Close()-ing the | ||
+ | socket under repair happens without any transient FIN_WAIT/TIME_WAIT/etc states. Socket is silently killed. | ||
+ | |||
+ | === Sequences === | ||
+ | |||
+ | In order to restore the connection properly only binding and connecting it is not enough. One also needs to restore the | ||
+ | TCP sequence numbers. To do so the TCP_REPAIR_QUEUE and TCP_QUEUE_SEQ options were introduced. | ||
+ | |||
+ | The former one selects which queue (input or output) will be repaired and the latter gets/sets the sequence. Note, that | ||
+ | setting the sequence is only possible on CLOSE-d socket. | ||
+ | |||
+ | === Packets in queue === | ||
+ | |||
+ | When set the queue to repair as described above, one can call recv or send syscalls on a repaired socket. Both calls | ||
+ | result on peeking or poking data from/to the respective queue. This sounds funny, but yes, for repaired socket one | ||
+ | can receve the outgoing and send the incoming queues. Using the MSG_PEEK flag for recv is required. | ||
+ | |||
+ | === Options === | ||
+ | |||
+ | There are 4 options that are negotiated by the socket at the connecting stage. These are | ||
+ | |||
+ | * mss_clamp -- the maximum size of the segment peer is ready to accept | ||
+ | * snd _scale -- the scale factor for a window | ||
+ | * sack -- whether selective acks are permitted or not | ||
+ | * tstamp -- whether timestamps on packets are supported | ||
+ | |||
+ | All four can be read with getsockopt calls to a socket and in order to restore them the TCP_REPAIR_OPTIONS sockoption | ||
+ | is introduced. | ||
+ | |||
+ | == Checkpoint and restore TCP connection == | ||
+ | |||
+ | With the above sockoptions dumping and restoring TCP connection becomes possible. The crtools just reads the socket | ||
+ | state and restores it back letting the protocol resurrect the data sequence. | ||
+ | |||
+ | One thing to note here -- while the socket is closed between dump and restore the connection should be "locked", i.e. | ||
+ | no packets from peer should enter the stack, otherwise the RST will be sent by a kernel. In order to do so a simple | ||
+ | netfilter rule is configured that drops all the packets from peer to a socket we're dealing with. This rule sits | ||
+ | in the host netfilter tables after the crtools dump command finishes and it should be there when you issue the | ||
+ | crtools restore one. | ||
+ | |||
+ | That said, the command line option --tcp-established should be used when calling crtools to explicitly state, that the | ||
+ | caller is aware of this "transitional" state of the netfilter. |
Revision as of 17:33, 25 July 2012
This page describes how we handle established TCP connections
TCP repair mode in kernel
The sockoption called TCP_REPAIR was recently added to the kernel and helps with doing C/R for TCP sockets.
When set this option turn the socket into a special state in which any action performed on it doesn't result in any defined by protocol actions, but instead directly puts the socket into a state, which should be at the end of the successfully finished operation.
E.g. calling connect() on a repaired socket just switches one to the ESTABLISHED state with the peer set as requested. The bind() call forcibly binds the socket to a given address (ignoring any potential conflicts). Close()-ing the socket under repair happens without any transient FIN_WAIT/TIME_WAIT/etc states. Socket is silently killed.
Sequences
In order to restore the connection properly only binding and connecting it is not enough. One also needs to restore the TCP sequence numbers. To do so the TCP_REPAIR_QUEUE and TCP_QUEUE_SEQ options were introduced.
The former one selects which queue (input or output) will be repaired and the latter gets/sets the sequence. Note, that setting the sequence is only possible on CLOSE-d socket.
Packets in queue
When set the queue to repair as described above, one can call recv or send syscalls on a repaired socket. Both calls result on peeking or poking data from/to the respective queue. This sounds funny, but yes, for repaired socket one can receve the outgoing and send the incoming queues. Using the MSG_PEEK flag for recv is required.
Options
There are 4 options that are negotiated by the socket at the connecting stage. These are
- mss_clamp -- the maximum size of the segment peer is ready to accept
- snd _scale -- the scale factor for a window
- sack -- whether selective acks are permitted or not
- tstamp -- whether timestamps on packets are supported
All four can be read with getsockopt calls to a socket and in order to restore them the TCP_REPAIR_OPTIONS sockoption is introduced.
Checkpoint and restore TCP connection
With the above sockoptions dumping and restoring TCP connection becomes possible. The crtools just reads the socket state and restores it back letting the protocol resurrect the data sequence.
One thing to note here -- while the socket is closed between dump and restore the connection should be "locked", i.e. no packets from peer should enter the stack, otherwise the RST will be sent by a kernel. In order to do so a simple netfilter rule is configured that drops all the packets from peer to a socket we're dealing with. This rule sits in the host netfilter tables after the crtools dump command finishes and it should be there when you issue the crtools restore one.
That said, the command line option --tcp-established should be used when calling crtools to explicitly state, that the caller is aware of this "transitional" state of the netfilter.