Transport layer emphasizing TCP and UDP

Basic Services

Transport layer provides, minimally, process-to-process (e.g. client application-to-server application) communication.

For Internet communication, this involves:

(a) getting the PDU from the sender’s machine to the receiver’s machine –IP, the network layer, does this.

(b) once PDU arrives at receiver, delivering it to the correct process – TCP and UDP both do this.

· Recall that PDU is Protocol Data Unit.

· IP provides unreliable service: its PDU is the datagram.

· Transport layer PDU is the segment.

· The only “value added” by UDP is delivery to the process if the IP datagram arrives at receiver.

· TCP, in addition to delivery, provides a substantial added value: reliable transport service.

Service: Reliable Delivery

For Internet, this is provided by TCP, because IP does not provide it.

Complexity of protocol for reliable delivery depends on underlying assumptions. The fewer the assumptions, the more complex. Several are explained in detail below.

Service: Deliver to correct process once datagram arrives

Both TCP and UDP provide this service

Header contains two 16-bit fields:

(a) source port #

(b) destination port #

Scenario 1: simple exchange between client and server.

1. server is up and running and listening on port X.

2. client initiates contact, sends segment with destination port X, source port Y (selected automatically from pool of unused ports)

3. segment arrives at server machine, destination port is read, and delivered to the process running on port X (the server process).

4. server formulates response, sends segment with destination port Y, source port X.

5. segment arrives at client machine, destination port is read, and delivered to the process running on port Y (the client process).

6. client formulates response, sends segment with destination port X, source port Y.

7. etc.

Scenario 2: two clients on same machine to same server (e.g. two browser windows).

1. server is up and running and listening on port X.

2. client1 initiates contact, sends segment with destination port X, source port Y1 (selected automatically from pool of unused ports).

3. client2 initiates contact, sends segment with destination port X, source port Y2 (selected automatically from pool of unused ports).

4. segment arrives at server machine, destination port is read, and delivered to the process running on port X (the server process). Segment could be from either client1 or client2.

5. server formulates response, sends segment with destination port Y1 or Y2 (depending on source port of incoming segment), source port X.

6. segment arrives at client machine, destination port is read, and delivered to the process running on whichever port is specified in source port field.

7. etc.

Scenario 3: two clients on different machines to same server.

1. server is up and running and listening on port X.

2. client1 initiates contact, sends segment with destination port X, source port Y (selected automatically from pool of unused ports).

3. client2 initiates contact, sends segment with destination port X, source port Y (selects same port number as client1!).

4. segment arrives at server machine, destination port is read, and delivered to the process running on port X (the server process). Segment could be from either client1 or client2. Q: How does server know which one, since both have source port Y? A: looks at IP address.

5. server formulates response, sends segment with destination port Y, source port X.

6. etc.

The lesson for delivering to correct process:

Combination of destination and source port #s will be unique, except if two clients connect to same server from different IP addresses through same source port#. In this case, differentiate based on IP address.

Note that position of source and destination port #’s in segment header is same for both UDP and TCP (e.g. 0 and 2 byte displacement, respectively).

UDP – User Datagram Protocol

Datagram service. Used by DNS, SNMP, some A/V protocols, and a few others.

Motivation: if best-effort delivery is good enough, then UDP is fast (no connection setup) and cheap. Also helps if sender and receiver operate at similar rates (to avoid flooding).

Segment Structure:

1. source port (16 bits)

2. destination port(16 bits)

3. segment length(16 bits)

4. checksum(16 bits)

5. payload (variable length)

Fields 1 and 2 already covered.

Field 3 is segment length, including header, in bytes.

Field 4 checksum is used so receiver can determine if segment was received error-free.

The 16-bit checksum is calculated by taking the sum of all 16-bit portions of the segment, then computing the one’s complement of that result (e.g. flip the bits). It is actually a little more complicated than that, see RFC 768 for details. If checksum field is all zero’s, receiver assumes checksum is not being used.

If checksum is being used, receiver calculates sum of all 16-bit portions of segment, including the checksum itself. The result will be all 1’s (think about it*), unless an error occurred.

* Since original checksum is complement of sum of the rest, including it into the sum will produce a 1 in each bit position. To use a 4-bit example, if sum of rest is 0101, checksum will be 1010. Receiver calculates sum of the rest plus checksum, or 0101 + 1010 = 1111.

Simplest protocol for reliable delivery

Simple protocol: assumes reliable channel and fast receiver (textbook “rdt1.0”)

Sender: when application supplies data, put into packets and send

Receiver: when packet arrives, pass data up to application layer.

Stop-and-wait protocols for reliable delivery

This family of protocols drops assumption that receiver infinitely fast. Sender must avoid flooding slow receiver. Must implement flow control.

Stop-and-wait: assumes reliable channel (no textbook equivlanent)

Sender: when application supplies data, send data packet and wait for ACK (acknowledgement) packet to arrive. After ACK arrives, send next packet.

Receiver: when data packet arrives, send ACK packet to sender. Pass data up to application layer as appropriate.

Stop-and-wait with corrupt channel: unreliable channel which can corrupt bits (but not lose them) and slow receiver (textbook “rdt2.x”)

The following protocol (rdt2.0) will work in all cases except one:

Sender: when application supplies data, send data packet and wait for ACK or NAK (negative acknowledgement) packet to arrive. If ACK, send next packet. If NAK, resend same packet.

Receiver: when data packet arrives, check for error. If none, send ACK packet to sender else send NAK packet. Pass data up to application layer as appropriate.

When will this not work? (hint: errors are not exclusive to data packets)

Problem: what if ACK/NAK is corrupted?

Solution: Add sequence number to data packet. 0 or 1 will do.

Sender: toggle sequence number before sending new data packet.

Receiver: If arriving packet has same sequence number as last one, it is duplicate soACK it (since sender didn’t know it was received) then discard it.

Total solution requires all the functionality listed above for sender and receiver.

Sender: when application supplies data, toggle sequence number, send data packet and wait for ACK or NAK packet to arrive. If ACK, send next packet. If NAK or garbled, resend same packet.

Receiver: when data packet arrives, check for garbled. If garbled, send NAK packet to sender. If duplicate, send ACK then discard data. If correct packet, send ACK packet. Pass data up to application layer as appropriate.

NOTE: textbook shows version 2.2, that uses ACK with sequence number and no NAK. TCP uses a version of this.

1. Receiver always responds with ACK containing sequence number of last correct packet.

2. Sender must recognize when ACK really means NAK!

Stop-and-wait with lossy channel: unreliable channel which can corrupt bits and lose packets and maybe slow receiver (textbook “rdt3.0”)

Given version 2.1 or 2.2, you need only add timer to sender.

Consider performance of all stop-and-wait protocols

Propagation delay is major factor:

Transmission rate (bps) determines how long it takes for sender to spit out the packet.
Round trip propagation determines how long until it receives ACK! It is idle all this time.

Sliding window protocols for reliable delivery

Improve performance by providing pipelining.

Two flavors are:

· Go-Back-N (retransmit all packets since last confirmed ACK)

· Selective Repeat (retransmit only packets w/o confirmed ACK)

Study TCP sliding window protocol later (it is combo).

Go-Back-N.

· Sender has window (range) of packet “slots”:

- packets transmitted but no ACK received yet

- packets slots usable but no data received yet from app layer

· Sender window has length N.

· If sender window full, new data from app layer is “rejected”
(may use semaphore as in producer-consumer problem to prevent)

· Requires sequence number

· Textbook uses only ACKs (can also define using ACK/NAK)

· ACKs cumulative (one ACK can cover more than one packet)

· Textbook receiver accepts only in-order packets (window size 1).
Can also implement to accept out-of-order packets (window size > 1)

Sender action:

· Transmit data packets until:

- ACK received –or-

- Timer times out –or-

- window full –or-

- waiting for data from app layer

· When ACK x received

- slide window’s “left edge” to slot x+1

- slide window’s “right edge” by similar amount to keep size N

- (re)start timer unless window is now empty

· When timer times out

- Resend all packets in the window!

Receiver action:

· If received packet is non-corrupt and in-order

- deliver data to app layer

- ACK with sequence number of that packet

· else (corrupt or out-of-order)

- ACK with sequence number of last accepted packet

NOTE: It appears that ACK covers only one packet instead of being cumulative. But suppose:

· sender sends seq# 0,1,2,3

· receiver sends ACK 0,1,2,3

· receiver’s ACK 0, ACK 1, and ACK 2 are lost

· sender will get ACK 3 first and acknowledge all four.

Consider performance:

- pipelined, so much better than stop-and-wait

- due to corrupted/lost ACKs, correctly transmitted packets may be retransmitted

Selective Repeat.

· Address performance issue: Go-Back-N can result in many correct data packets being retransmitted.

· Receiver buffers out-of-order packets, so now has window

· ACKs are individual (not cumulative)

· Each packet has individual timer

Sender action:

· Transmit until… (same as Go-Back-N).

· When ACK x received,

o If ACK duplicate or outside window, ignore. Otherwise……

o Delete timer for slot x

o Mark slot x as ACKed

o If all slots less than x are ACKed (e.g. slot x is “left edge”),

§ slide “left edge” of window to first unACKed slot

§ slide “right edge” of window by same amount

§ send packets that come into window as a result of slide.

· When slot times out, resend that packet only.

Receiver Action:

· When packet x received and x is within receiver window,

o Send ACK x.

o If duplicate, ignore. Otherwise…….

o Store data in buffer slot x.

o If all slots less than x have been delivered to app layer,

§ Slide “left edge” of window to first undelivered slot y.

§ Slide “right edge” of window by same amount.

§ Deliver buffer slots x through y-1 to app layer.

· When packet x is received and x is “behind” current window,

o Send ACK x.

· Otherwise, ignore the packet.

Example to illustrate two things.

Example: (window size N = 4)

1. Sender sends 0, 1, 2 and 3.

2. Receiver ACKs 0, 1, 2, and 3. And slides left edge up to slot 4.

3. ACK 0 is lost.

4. Sender gets ACK 1, 2 and 3. Cannot slide left edge; no ACK 0.

5. Sender slot 0 times out and resends.

6. Receiver gets duplicate slot 0. What should it do? Slot 0 is now behind its window.

Illustrates that sender and receiver windows need not coincide.

Illustrates that range of sequence numbers must be at least twice the window size.

Suppose:

· sequence numbers range from 0 to N-1 (01230123…).

· After step 2, what is receiver’s left edge? Slot 0. Right edge? Slot 3.

· At step 6, receiver gets packet for slot 0. It thinks “slot 0 within window”, which is wrong!

Suppose:

· sequence numbers range from 0 to 2N-2 (01234560123456…).

· After step 2, what is receiver’s left edge? Slot 4. Right edge? Slot 0.

· At step 6, receiver gets packet for slot 0. It thinks “slot 0 within window”, which is wrong!

Suppose:

· sequence numbers range from 0 to 2N-1 (0123456701234567…).

· After step 2, what is receiver’s left edge? Slot 4. Right edge? Slot 7.

· At step 6, receiver gets packet for slot 0. It thinks “slot 0 behind window”, which is right!

END OF MATERIAL FOR EXAM 1.

TCP – Transport Control Protocol

Overview

TCP provides connection-oriented service.

Connection-oriented but not virtual circuit. Why?
VC requires routers to maintain circuit. Routing for TCP is done by IP, which is datagram.

TCP provides flow and congestion control.

Details later

TCP provides connection management

a. 3 way handshake to establish connection

b. full duplex communication (both ways at same time)

c. send and receive buffers established during handshake

d. maximum segment size (MSS), actually max payload size

e. sliding window for pipelined stream service

f. double exchange sequence to terminate connnection

TCP Segment Structure

Highlights (for specific structure, see e.g. my spring 2000 notes)

Variable length header
Variable length payload
Header has 20 byte fixed plus variable options field

Selected field notes:

· Header length is 4 byte field, count of header length in 4-byte words

· Sequence and ack numbers are 32 bits, and represent byte counts

· Receiver window size field is 16 bits, and represents the number of bytes that receiver is willing to receive (e.g. has available in receive buffer)

· For each of the above, correlate field lengths with allowable value range

· The “6 flags” are:

o URG : payload contains high priority info, there is 16 bit field elsewhere containing its byte offset in payload.

o ACK : this segment contains acknowledgement

o PSH : deliver directly, w/o going through receive buffer

o RST, SYN, FIN : used for connection maintenance

TCP Reliable Transmission

Variable length sliding window with per-segment timers and retransmission, and cumulative ACKs

Receiver can either accept or reject out-of-order segment

Sequence and ACK numbers are handled thusly:

a. seq# and ack# are byte counts in transmission stream

b. initial seq# selected at random (what is probability of 0?)

c. sender and receiver exchange initial seq# during 3-way handshake

d. for a given segment, seq# = byte offset for first byte of payload, e.g. (init seq# + stream byte offset) mod 2**16

e. for a given segment, ack# = seq# of next byte expected from sender

Note this handles full duplex. For further info, see discussion below of flow and congestion control.

Example:

· A sends 3600 bytes to B

· MSS = 1500

· Error-free

· A’s initial seq# is 26

· Assume connection already established

· DC means “don’t care” it is not relevant to this example

One possible sequence (essentially a stop-and-wait)

1. A à B : seq# 26, ack# DC, data 1500 bytes

2. B à A : seq# DC, ack# 1526

3. A à B : seq# 1526, ack# DC, data 1500 bytes

4. B à A : seq# DC, ack# 3026

5. A à B : seq# 3026, ack# DC, data 600 bytes

6. B à A : seq# DC, ack# 3626

Could rearrange to pipeline all three, e.g.

1. A à B : seq# 26, ack# DC, data 1500 bytes

2. A à B : seq# 1526, ack# DC, data 1500 bytes

3. A à B : seq# 3026, ack# DC, data 600 bytes

4. B à A : seq# DC, ack# 1526

5. B à A : seq# DC, ack# 3026

6. B à A : seq# DC, ack# 3626

Example:

· Full Duplex transmit

· A transmits 1111 bytes to B

· B transmits 666 bytes to A

· MSS = 512

· Error-free

· A’s initial seq# is 451

· B’s initial seq# is 103

· Assume connection already established

· DC means “don’t care” it is not relevant to this example

One possible sequence

1. A à B : seq# 451, ack# 103, data 512 bytes

2. B à A : seq# 103, ack# 963, data 512 bytes

3. A à B : seq# 963, ack# 615, data 512 bytes

4. B à A : seq# 615, ack# 1475, data 154 bytes

5. A à B : seq# 1475, ack# 769, data 87 bytes

6. B à A : seq# DC, ack# 1562

TCP Connection management

Three-way handshake to establish connection : 3 segments exchanged

1. client-to-server connection request :

SYN bit set,
seq# is clients initial number (random selection)
empty payload

2. server-to-client response:

(allocates TCP buffer space and variables before responding)

SYN bit set
seq# is servers initial number (random selection)
ack# is clients initial number + 1
empty payload

3. client-to-server confirmation:

(allocates TCP buffer space and variables)

seq# is clients initial number + 1
ack# is servers initial number + 1
empty payload

Double exchange to terminate connection

Think of it as one exchange to cut off one direction of duplex and second exchange to cut off other direction.

(we’ll show client initiating the sequence; could be either party)

1. A is finished, sends B a segment with FIN bit set.

2. B sends acknowledgement to A, and deallocates its buffers

A-to-B direction is now closed.

3. B sends A a segment with FIN bit set

4. A sends acknowledgement to B, waits awhile, then deallocates its buffers

B-to-A direction is now closed.

HOW ABOUT RESET PROTOCOL?

TCP Flow Control

· Note that receiver gives sender “receiver window size” in each return segment.

· Thus sender knows how much receive buffer space sender has

· Sender maintains sending window no larger than that.

· Sending window is Last-Byte-Sent - Last-Byte-Acked

· Special case:

o receiver window fills, thus “receiver window size” is 0.

o sender stops sending

o receiver window empties as data moved to application layer

o how does receiver tell sender it now has space? Since it is receiving nothing, it is not sending ACKs. Unless it is also sending data the other direction, it has no way to give its updated “receiver window size” to sender.

o solution is: sender continues to send segments with one data byte, which trigger ACKs from receiver.

TCP Congestion Control

Distinguish flow control from congestion control

flow control is strictly between a sender-receiver pair
congestion control is network-wide – routers and links can become saturated

TCP, as a transport layer protocol, has only indirect knowledge of congestion (from ACK behaviors, sender notices that segments are being delayed and dropped)

A given TCP sender contributes to congestion by transmitting a wider send window (called its “congestion window”), and does its part to relieve congestion by narrowing its window (throttle).

A typical scenario for a large transmission follows these phases (see RFC 2581 for details):

initialize variables:

o congestion window variable set to MSS (or 2 * MSS),

o threshold variable set to arbitrary value (max. at receiver’s advertised window size?)

slow start phase :

o sender transmits congestion window (e.g 1 or 2 segments).

o For each ACK received on time, congestion window increased by MSS.

o Results in exponential growth.

congestion avoidance phase :

o entered when congestion window exceeds threshold.

o Congestion window is increased by MSS only when an entire window’s worth of ACKs is received on time.

o Results in linear growth.

evidence of congestion occurs:

o timeout

· reset the threshold to half the congestion window size,

· reset congestion window size to MSS, and

· go back into slow start phase

o 3 consecutive ACKs for same #

· fast retransmit, e.g. retransmit before timeout

· set congestion window to threshold

· go back into congestion avoidance phase (“fast recovery”)

Algorithm described as Additive-Increase, Multiplicative-Decrease (AIMD), which applies only if you ignore the slow start phase.

[notes | CSC 465 | Peter Sanderson | Computer Science | SMSU ]

Updated 1 March 2001

PeteSanderson@smsu.edu