C SC 481.20 Lecture 8: Transmission Control Protocol (TCP)
major resource: Computer Networking (4th Edition),
Kurose and Ross, Addison Wesley, 2008
[ previous
| schedule
| next ]
TCP – Transport Control Protocol
Overview
TCP provides connection-oriented service.
- host-to-host (end-to-end) service
- reliable, error free, sequenced, stream service
- pipelined (not stop-and-wait) transmission with buffers
- full duplex (bi-directional at same time)
- provides flow and congestion control - details below
- Unit of transmission is segment
- Connection-oriented but not virtual circuit. Why?
- VC requires routers to maintain circuit. Routing for TCP is done by IP, which is datagram.
TCP provides connection management
- 3 way handshake to establish connection
- full duplex communication (both ways at same time)
- send and receive buffers established during handshake
- maximum segment size (MSS), actually max payload size
- sliding window for pipelined stream service
- double exchange sequence to terminate connection
TCP Segment Structure
-
Field |
Bits |
Description |
Source Port |
16 |
service access point at source host |
Destination Port |
16 |
service access point at destination host |
Sequence Number |
32 |
byte offset, for sliding window transmission |
Acknowledgement Number |
32 |
also used in sliding window |
TCP Header Length |
4 |
in 32-bit words |
unused |
6 |
|
Code Bits |
6 |
six 1-bit control fields, explained below |
Window Size |
16 |
used in variable-length sliding window |
Checksum |
16 |
1's comp. sum of segment in 16-bit words |
Urgent Pointer |
16 |
byte offset of special signals (e.g. Ctrl-C) |
Options |
n * 32 |
n = # options specified |
Data |
0-max |
max = what will fit in datagram |
- Variable length header (includes everything above except Data field)
- Header has 20 byte fixed plus variable options field
- Variable length payload (the Data field)
- Selected field notes:
- Header length is 4 byte field, count of header length in 4-byte words
- Sequence and ack numbers are 32 bits, and represent byte counts
- Sequence number is byte stream position of first data byte in segment
- Ack number is sequence number of next byte expected
- Receiver window size field is 16 bits, and represents the number of bytes that receiver is willing to receive (e.g. has available in receive buffer)
- For each of the above, correlate field lengths with allowable value range
- The "6 flags" are:
- URG : payload contains high priority info, there is 16 bit field elsewhere containing its byte offset in payload. (not generally used)
- ACK : this segment contains acknowledgement
- PSH : deliver directly, w/o going through receive buffer (not generally used)
- RST, SYN, FIN : used for connection maintenance
TCP Reliable Transmission
- Variable length sliding window with per-segment timers and retransmission
- cumulative ACKs
- ACK may be piggybacked on data segment heading in other direction on full duplex connection
- Receiver can either accept or reject out-of-order segment
Sequence and ACK numbers are handled thusly:
- seq# and ack# are byte counts in transmission stream
- initial seq# selected at random (what is probability of 0?)
- sender and receiver exchange initial seq# during 3-way handshake
- for a given segment, seq# = byte offset for first byte of payload, e.g. (init seq# + stream byte offset) mod 2**16
- for a given segment, ack# = seq# of next byte expected from sender
Note this handles full duplex. For further info, see discussion below of flow and congestion control.
Example:
- A sends 3600 bytes to B
- MSS = 1500
- Error-free
- A’s initial seq# is 26
- Assume connection already established
- DC means "don’t care" it is not relevant to this example
- One possible sequence (essentially a stop-and-wait)
- A -> B : seq# 26, ack# DC, data 1500 bytes
- B -> A : seq# DC, ack# 1526
- A -> B : seq# 1526, ack# DC, data 1500 bytes
- B -> A : seq# DC, ack# 3026
- A -> B : seq# 3026, ack# DC, data 600 bytes
- B -> A : seq# DC, ack# 3626
- Could rearrange to pipeline all three, e.g.
- A -> B : seq# 26, ack# DC, data 1500 bytes
- A -> B : seq# 1526, ack# DC, data 1500 bytes
- A -> B : seq# 3026, ack# DC, data 600 bytes
- B -> A : seq# DC, ack# 1526
- B -> A : seq# DC, ack# 3026
- B -> A : seq# DC, ack# 3626
Example:
- Full Duplex transmit
- A transmits 1111 bytes to B
- B transmits 666 bytes to A
- MSS = 512
- Error-free
- A’s initial seq# is 451
- B’s initial seq# is 103
- Assume connection already established
- DC means “don’t care” it is not relevant to this example
- One possible sequence
- A -> B : seq# 451, ack# 103, data 512 bytes
- B -> A : seq# 103, ack# 963, data 512 bytes
- A -> B : seq# 963, ack# 615, data 512 bytes
- B -> A : seq# 615, ack# 1475, data 154 bytes
- A -> B : seq# 1475, ack# 769, data 87 bytes
- B -> A : seq# DC, ack# 1562
TCP Connection setup : 3 way handshake
Three-way handshake to establish connection : 3 segments exchanged
- client-to-server connection request :
- SYN bit set,
- seq# is clients initial number (random selection)
- empty payload
- server-to-client response:
- (allocates TCP buffer space and variables before responding)
- SYN bit set
- seq# is servers initial number (random selection)
- ack# is clients initial number + 1
- empty payload
- client-to-server confirmation:
- (allocates TCP buffer space and variables)
- seq# is clients initial number + 1
- ack# is servers initial number + 1
- empty payload
If server socket not prepared for connection (or client sends wrong socket number), server responds
with RST flag instead of SYN flag
SYN Flood: Buffer allocation by server upon receiving SYN request from client (see above) makes server vulnerable
to SYN flood denial of service attack: Malicious client sends flood of fake SYN requests, server
allocates buffer space before verifying the request via handshake. Solution? SYN cookie: server does
not allocate upon SYN request but instead crafts special initial Sequence number in its SYN response (hash based on
client IP addr, port, and its own magic number). If ACK response arrives, verify by running ack number (which
should be the hashed number plus 1) through the hash. Once server verifies, then allocate buffers.
Double exchange to terminate connection
Think of it as one exchange to cut off one direction of duplex and second exchange to cut off other direction.
(we’ll show client initiating the sequence; could be either party)
- A is finished, sends B a segment with FIN bit set.
- B sends acknowledgement to A, and deallocates its buffers
A-to-B direction is now closed.
- B sends A a segment with FIN bit set
- A sends acknowledgement to B, waits awhile, then deallocates its buffers
B-to-A direction is now closed.
TCP Flow Control
- Note that receiver gives sender “receiver window size” in each return segment.
- Thus sender knows how much receive buffer space sender has
- Sender maintains sending window no larger than that.
- Sending window is from Last-Byte-Sent to Last-Byte-Acked
- Special case:
- receiver window fills, thus “receiver window size” is 0.
- sender stops sending
- receiver window empties as data moved to application layer
- how does receiver tell sender it now has space? Since it is receiving nothing, it is not sending ACKs. Unless it is also sending data the other direction, it has no way to give its updated “receiver window size” to sender.
- solution is: sender continues to send segments with one data byte, which trigger ACKs from receiver.
TCP Congestion Control
Distinguish flow control from congestion control
- flow control is strictly between a sender-receiver pair
- congestion control is network-wide – routers and links can become saturated
We did not cover TCP congestion algorithm below (it is interesting though)
TCP, as a transport layer protocol, has only indirect knowledge of congestion (from ACK behaviors, sender notices that segments are being delayed and dropped)
A given TCP sender contributes to congestion by transmitting a wider send window (called its “congestion window”), and does its part to relieve congestion by narrowing its window (throttle).
A typical scenario for a large transmission follows these phases (see RFC 2581 for details):
- initialize variables:
- congestion window variable set to MSS (or 2 * MSS),
- threshold variable set to arbitrary value (max. at receiver’s advertised window size?)
- slow start phase :
- sender transmits congestion window (e.g 1 or 2 segments).
- For each ACK received on time, congestion window increased by MSS.
- Results in exponential growth.
- congestion avoidance phase :
- entered when congestion window exceeds threshold.
- Congestion window is increased by MSS only when an entire window’s worth of ACKs is received on time.
- Results in linear growth.
- evidence of congestion occurs:
- timeout ("Tahoe")
- reset the threshold to half the congestion window size,
- reset congestion window size to MSS, and
- go back into slow start phase
- 3 consecutive ACKs for same # ("Reno")
- fast retransmit, e.g. retransmit before timeout
- set congestion window to threshold
- go back into congestion avoidance phase (“fast recovery”)
Algorithm described as Additive-Increase, Multiplicative-Decrease (AIMD), which applies only if you ignore the slow start phase.
[ C
SC 481 | Peter
Sanderson | Math Sciences server
| Math Sciences home page
| Otterbein ]
Last updated:
Peter Sanderson (PSanderson@otterbein.edu)