Java程序辅导

C C++ Java Python Processing编程在线培训 程序编写 软件开发 视频讲解

客服在线QQ:2653320439 微信:ittutor Email:itutor@qq.com
wx: cjtutor
QQ: 2653320439
Transport Layer (TCP/UDP)
Where we are in the Course
•Moving on up to the Transport Layer!
CSE 461 University of Washington 2
Physical
Link
Network
Transport
Application
Recall
• Transport layer provides end-to-end connectivity    
across the network
CSE 461 University of Washington 3
RouterHost Host
TCP
IP
802.11
app
IP
802.11
IP
Ethernet
TCP
IP
Ethernet
app
Recall (2)
• Segments carry application data across the network
• Segments are carried within packets within frames
CSE 461 University of Washington 4
802.11 IP TCP App, e.g., HTTP
Segment
Packet
Frame
Transport Layer Services
•Provide different kinds of data delivery across the 
network to applications
CSE 461 University of Washington 5
Unreliable Reliable
Messages Datagrams (UDP)
Bytestream Streams (TCP)
Comparison of Internet Transports
• TCP is full-featured, UDP is a glorified packet
CSE 461 University of Washington 6
TCP (Streams) UDP (Datagrams)
Connections Datagrams
Bytes are delivered once, 
reliably, and in order
Messages may be lost, 
reordered, duplicated
Arbitrary length content Limited message size
Flow control matches 
sender to receiver
Can send regardless
of receiver state
Congestion control matches 
sender to network
Can send regardless
of network state
Socket API
• Simple abstraction to use the network
• The “network” API (really Transport service) used to write 
all Internet apps
• Part of all major OSes and languages; originally Berkeley 
(Unix) ~1983
• Supports both Internet transport services (Streams 
and Datagrams)
CSE 461 University of Washington 7
Socket API (2)
• Sockets let apps attach to the local network at 
different ports
CSE 461 University of Washington 8
Socket,
Port #1
Socket,
Port #2
Socket API (3)
• Same API used for Streams and Datagrams
CSE 461 University of Washington 9
Primitive Meaning
SOCKET Create a new communication endpoint
BIND Associate a local address (port) with a socket
LISTEN Announce willingness to accept connections
ACCEPT Passively establish an incoming connection
CONNECT Actively attempt to establish a connection
SEND(TO) Send some data over the socket
RECEIVE(FROM) Receive some data over the socket
CLOSE Release the socket
Only needed 
for Streams
To/From for 
Datagrams
Ports
•Application process is identified by the tuple IP 
address, transport protocol, and port
• Ports are 16-bit integers representing local “mailboxes” 
that a process leases
• Servers often bind to “well-known ports”
• <1024, require administrative privileges
•Clients often assigned “ephemeral” ports
• Chosen by OS, used temporarily 
CSE 461 University of Washington 10
Some Well-Known Ports
CSE 461 University of Washington 11
Port Protocol Use
20, 21 FTP File transfer
22 SSH Remote login, replacement for Telnet
25 SMTP Email
80 HTTP World Wide Web
110 POP-3 Remote email access
143 IMAP Remote email access
443 HTTPS Secure Web (HTTP over SSL/TLS)
543 RTSP Media player control
631 IPP Printer sharing
Topics
• Service models
• Socket API and ports
• Datagrams, Streams
• User Datagram Protocol (UDP)
• Connections (TCP)
• Sliding Window (TCP)
• Flow control (TCP)
• Retransmission timers (TCP)
• Congestion control (TCP)
CSE 461 University of Washington 12
UDP
User Datagram Protocol (UDP)
•Used by apps that don’t want reliability or 
bytestreams
• Like what?
CSE 461 University of Washington 14
User Datagram Protocol (UDP)
•Used by apps that don’t want reliability or 
bytestreams
• Voice-over-IP 
• DNS, RPC 
• DHCP
(If application wants reliability and messages then it 
has work to do!)
CSE 461 University of Washington 15
Datagram Sockets
CSE 461 University of Washington 16
Client (host 1) Server (host 2)Time
request
reply
Datagram Sockets (2)
CSE 461 University of Washington 17
Client (host 1) Server (host 2)Time
1: socket
2: bind
1: socket
6: sendto
3: recvfrom*
4: sendto
5: recvfrom*
7: close 7: close
*= call blocks
request
reply
UDP Buffering
CSE 461 University of Washington 18
App
Port Mux/Demux
App AppApplication
Transport
(UDP)
Network (IP) packet
Message queues
Ports
UDP Header
•Uses ports to identify sending and receiving 
application processes
•Datagram length up to 64K
•Checksum (16 bits) for reliability
CSE 461 University of Washington 19
UDP Header (2)
•Optional checksum covers UDP segment and IP 
pseudoheader
• Checks key IP fields (addresses)
• Value of zero means “no checksum”
CSE 461 University of Washington 20
TCP
TCP
• TCP Consists of 3 primary phases:
• Connection Establishment (Setup)
• Sliding Windows/Flow Control
• Connection Release (Teardown)
Connection Establishment
•Both sender and receiver must be ready before we 
start the transfer of data
• Need to agree on a set of parameters
• e.g., the Maximum Segment Size (MSS)
• This is signaling
• It sets up state at the endpoints
• Like “dialing” for a telephone call
CSE 461 University of Washington 23
CSE 461 University of Washington 24
Three-Way Handshake
• Used in TCP; opens connection for 
data in both directions
• Each side probes the other with a 
fresh Initial Sequence Number (ISN)
• Sends on a SYNchronize segment
• Echo on an ACKnowledge segment
• Chosen to be robust even against 
delayed duplicates
Active party
(client)
Passive party
(server)
CSE 461 University of Washington 25
Three-Way Handshake (2)
• Three steps:
• Client sends SYN(x)
• Server replies with SYN(y)ACK(x+1)
• Client replies with ACK(y+1)
• SYNs are retransmitted if lost
• Sequence and ack numbers carried 
on further segments
1
2
3
Active party
(client)
Passive party
(server)
Time
CSE 461 University of Washington 26
Three-Way Handshake (3)
• Suppose delayed, duplicate 
copies of the SYN and ACK arrive 
at the server!
• Improbable, but anyhow …
Active party
(client)
Passive party
(server)
CSE 461 University of Washington 27
Three-Way Handshake (4)
• Suppose delayed, duplicate 
copies of the SYN and ACK arrive 
at the server!
• Improbable, but anyhow …
•Connection will be cleanly 
rejected on both sides 
Active party
(client)
Passive party
(server)
X
X
REJECT
REJECT
TCP Connection State Machine
•Captures the states ([]) and transitions (->)
• A/B means event A triggers the transition, with action B
Both parties 
run instances 
of this state 
machine
TCP Connections (2)
• Follow the path of the client: 
TCP Connections (3)
• And the path of the server: 
TCP Connections (4)
• Again, with states …
CSE 461 University of Washington 31
LISTEN
SYN_RCVD
SYN_SENT
ESTABLISHED
ESTABLISHED
1
2
3
Active party (client) Passive party (server)
Time
CLOSEDCLOSED
TCP Connections (5)
• Finite state machines are a useful tool to specify 
and check the handling of all cases that may occur
• TCP allows for simultaneous open
• i.e., both sides open instead of the client-server pattern
• Try at home to confirm it works
CSE 461 University of Washington 32
Connection Release
•Orderly release by both parties when done
• Delivers all pending data and “hangs up”
• Cleans up state in sender and receiver
•Key problem is to provide reliability while releasing
• TCP uses a “symmetric” close in which both sides 
shutdown independently
CSE 461 University of Washington 33
CSE 461 University of Washington 34
TCP Connection Release
• Two steps:
• Active sends FIN(x), passive ACKs
• Passive sends FIN(y), active ACKs
• FINs are retransmitted if lost
• Each FIN/ACK closes one direction 
of data transfer
Active 
party
Passive 
party
CSE 461 University of Washington 35
TCP Connection Release (2)
• Two steps:
• Active sends FIN(x), passive ACKs
• Passive sends FIN(y), active ACKs
• FINs are retransmitted if lost
• Each FIN/ACK closes one direction 
of data transfer
Active 
party
Passive 
party
1
2
TCP Connection State Machine
CSE 461 University of Washington 36
Both parties 
run instances 
of this state 
machine
•Captures the 
states ([]) and 
transitions (->)
• A/B means 
event A triggers 
the transition, 
with action B
TCP Release
• Follow the active party
CSE 461 University of Washington 37
TCP Release (2)
• Follow the passive party
CSE 461 University of Washington 38
TCP Release (3)
•Again, with states …
CSE 461 University of Washington 39
1
2
CLOSED
Active party Passive party
FIN_WAIT_1
CLOSE_WAIT
LAST_ACK
FIN_WAIT_2
TIME_WAIT
CLOSED
ESTABLISHED
(timeout)
ESTABLISHED
TIME_WAIT State
•Wait a long time after sending all segments and 
before completing the close
• Two times the maximum segment lifetime of 60 seconds
•Why?
CSE 461 University of Washington 40
TIME_WAIT State
•Wait a long time after sending all segments and 
before completing the close
• Two times the maximum segment lifetime of 60 seconds
•Why?
• ACK might have been lost, in which case FIN will be resent 
for an orderly close
• Could otherwise interfere with a subsequent connection
CSE 461 University of Washington 41
Flow Control
Recall
•ARQ with one message at a time is Stop-and-Wait 
(normal case below)
CSE 461 University of Washington 43
Frame 0
ACK 0Timeout Time
Sender Receiver
Frame 1
ACK 1
Limitation of Stop-and-Wait
• It allows only a single message to be outstanding 
from the sender:
• Fine for LAN (only one frame fits in network anyhow)
• Not efficient for network paths with BD >> 1 packet
CSE 461 University of Washington 44
Limitation of Stop-and-Wait (2)
• Example: R=1 Mbps, D = 50 ms, 10kb packets 
• RTT (Round Trip Time) = 2D = 100 ms
• How many packets/sec? 
• What if R=10 Mbps?
CSE 461 University of Washington 45
Sliding Window
•Generalization of stop-and-wait
• Allows W packets to be outstanding
• Can send W packets per RTT (=2D)
• Pipelining improves performance 
• Need W=2BD to fill network path
CSE 461 University of Washington 46
Sliding Window (2)
• What W will use the network capacity?
• Assume 10kb packets
• Ex: R=1 Mbps, D = 50 ms 
• Ex: What if R=10 Mbps?
CSE 461 University of Washington 47
Sliding Window (3)
• Ex: R=1 Mbps, D = 50 ms 
• 2BD = 106 b/sec x 100. 10-3 sec = 100 kbit
• W = 2BD = 10 packets of 1250 bytes
• Ex: What if R=10 Mbps?
• 2BD = 1000 kbit
• W = 2BD = 100 packets of 1250 bytes
CSE 461 University of Washington 48
Sliding Window Protocol
•Many variations, depending on how buffers, 
acknowledgements, and retransmissions are handled
•Go-Back-N
• Simplest version, can be inefficient
• Selective Repeat
• More complex, better performance
CSE 461 University of Washington 49
Sliding Window – Sender 
• Sender buffers up to W segments until they are 
acknowledged
• LFS=LAST FRAME SENT, LAR=LAST ACK REC’D
• Sends while LFS – LAR < W 
CSE 461 University of Washington 50
5 6 7 .
.
2 3 4 5 2 3 .
.
LAR LFS
W=5
Acked Unacked 3Unavailable
Available
seq. number
Sliding
Window
Sliding Window – Sender (2) 
• Transport accepts another segment of data from 
the Application ...
• Transport sends it (as LFS–LAR < 5)
CSE 461 University of Washington 51
5 6 7 .
.
2 3 4 5 2 3 .
.
LAR
W=5
Acked Unacked 3Unavailable
Sent
seq. number
Sliding
Window
LFS
Sliding Window – Sender (3) 
•Next higher ACK arrives from peer…
• Window advances, buffer is freed 
• LFS–LAR < 5 (can send one more) 
CSE 461 University of Washington 52
5 6 7 .
.
2 3 4 5 2 3 .
.
LAR
W=5
Acked Unacked 3Unavailable
Available
seq. number
Sliding
Window
LFS
Sliding Window – Go-Back-N
•Receiver keeps only a single packet buffer for the 
next segment
• State variable, LAS = LAST ACK SENT
•On receive:
• If seq. number is LAS+1, accept and pass it to app, update 
LAS, send ACK
• Otherwise discard (as out of order)
CSE 461 University of Washington 53
Sliding Window – Selective Repeat
• Receiver passes data to app in order, and buffers out-of-
order segments to reduce retransmissions
• ACK conveys highest in-order segment, plus hints about out-
of-order segments
• TCP uses a selective repeat design; we’ll see the details later
CSE 461 University of Washington 54
Sliding Window – Selective Repeat (2)
•Buffers W segments, keeps state variable LAS = LAST
ACK SENT
•On receive:
• Buffer segments [LAS+1, LAS+W] 
• Send app in-order segments from LAS+1, and update LAS
• Send ACK for LAS regardless
CSE 461 University of Washington 55
5Sliding Window – Selective Retransmission (3) 
•Keep normal sliding window
• If receive something out of order
• Send last unacked packet again!
CSE 461 University of Washington 56
5 6 7 .
.
2 4 5 3 .
.
LAR+1 
again
W=5
Acked Unacked 3Unavailable
Ack Arrives Out of Order!
seq. number
Sliding
Window
LFS
.
.
5Sliding Window – Selective Retransmission (4) 
•Keep normal sliding window
• If correct packet arrives, move window and LAR, 
send more messages
CSE 461 University of Washington 57
5 6 7 .
.
4 5 3 .
.
LAR
W=5
Acked Unacked 3
Correct ack arrives…
seq. 
number
Sliding
Window
LFS
.
.
.
.
Now Available
Sliding Window – Retransmissions
•Go-Back-N uses a single timer to detect losses
• On timeout, resends buffered packets  starting at LAR+1
• Selective Repeat uses a timer per unacked segment 
to detect losses
• On timeout for segment, resend it
• Hope to resend fewer segments
CSE 461 University of Washington 58
Sequence Numbers
•Need more than 0/1 for Stop-and-Wait …
•But how many?
• For Selective Repeat, need W numbers for packets, plus 
W for acks of earlier packets
• 2W seq. numbers
• Fewer for Go-Back-N (W+1)
•Typically implement seq. number with an N-bit 
counter that wraps around at 2N—1 
• E.g., N=8:   …, 253, 254, 255, 0, 1, 2, 3, …
CSE 461 University of Washington 59
Sequence Time Plot
CSE 461 University of Washington 60
Time
Se
q
. N
u
m
b
er
Acks
(at Receiver)
Delay (=RTT/2)
Transmissions
(at Sender)
Sequence Time Plot (2)
CSE 461 University of Washington 61
Time
Se
q
. N
u
m
b
er
Go-Back-N scenario
Sequence Time Plot (3)
CSE 461 University of Washington 62
Time
Se
q
. N
u
m
b
er Loss
Timeout
Retransmissions
ACK Clocking
Sliding Window ACK Clock
• Each in-order ACK advances the sliding window and 
lets a new segment enter the network
• ACKs “clock” data segments
CSE 461 University of Washington 64
Ack 1  2  3  4  5  6  7  8  9 10
20 19 18 17 16 15 14 13 12 11 Data
Benefit of ACK Clocking
•Consider what happens when sender injects a burst 
of segments into the network
CSE 461 University of Washington 65
Fast link Fast linkSlow (bottleneck) link
Queue
Benefit of ACK Clocking (2)
• Segments are buffered and spread out on slow link
CSE 461 University of Washington 66
Fast link Fast linkSlow (bottleneck) link
Segments 
“spread out”
Benefit of ACK Clocking (3)
• ACKs maintain the spread back to the original sender
CSE 461 University of Washington 67
Slow link
Acks maintain spread
Benefit of ACK Clocking (4)
• Sender clocks new segments with the spread
• Now sending at the bottleneck link without queuing!
CSE 461 University of Washington 68
Slow link
Segments spread Queue no longer builds
Benefit of ACK Clocking (4)
•Helps run with low levels of loss and delay!
• The network smooths out the burst of data segments
• ACK clock transfers this smooth timing back to sender
• Subsequent data segments are not sent in bursts so 
do not queue up in the network
CSE 461 University of Washington 69
TCP Uses ACK Clocking
• TCP uses a sliding window because of the value of 
ACK clocking
• Sliding window controls how many segments are 
inside the network
• TCP only sends small bursts of segments to let the 
network keep the traffic smooth
CSE 461 University of Washington 70
Problem
• Sliding window has pipelining to keep network busy
• What if the receiver is overloaded?
CSE 461 University of Washington 71
Streaming 
videoBig 
Iron
Wee 
Mobile
Arg …
Sliding Window – Receiver 
•Consider receiver with W buffers
• LAS=LAST ACK SENT, app pulls in-order data from buffer with 
recv() call
CSE 461 University of Washington 72
Sliding
Window
5 6 7 5 2 3 .
.
LAS
W=5
Finished 3Too high
seq. number
555 5Acceptable
Sliding Window – Receiver (2) 
• Suppose the next two segments arrive but app does 
not call recv()
CSE 461 University of Washington 73
5 6 7 5 2 3 .
.
LAS
W=5
Finished 3Too high
seq. 
number
555 5Acceptable
Sliding Window – Receiver (3) 
• Suppose the next two segments arrive but app does 
not call recv()
• LAS rises, but we can’t slide window!
CSE 461 University of Washington 74
5 6 7 5 2 3 .
.
LAS
W=5
Finished 3Too high
seq. number
555 5Acked
Sliding Window – Receiver (4) 
• Further segments arrive (in order) we fill buffer 
• Must drop segments until app recvs!
CSE 461 University of Washington 75
Nothing
Acceptable!
5 6 7 5 2 3 .
.
W=5
Finished 3Too high
seq. number
555 5Acked
LAS
Sliding Window – Receiver (5) 
•App recv() takes two segments
• Window slides (phew)
CSE 461 University of Washington 76
Acceptable
5 6 7 5 2 3 .
.
W=5
Finished 3
seq. number
555 5Acked
LAS
Flow Control
•Avoid loss at receiver by telling sender the available 
buffer space
• WIN=#Acceptable, not W (from LAS)
CSE 461 University of Washington 77
Acceptable
5 6 7 5 2 3 .
.
W=5
Finished 3
seq. number
555 5Acked
LAS
Flow Control (2)
• Sender uses lower of the sliding window and flow 
control window (WIN) as the effective window size
CSE 461 University of Washington 78
Acceptable
5 6 7 5 2 3 .
.
LAS
W=3
Finished 3Too high
seq. number
555 5Acked
CSE 461 University of Washington 79
Flow Control (3)
• TCP-style example
• SEQ/ACK sliding window
• Flow control with WIN
• SEQ + length < ACK+WIN
• 4KB buffer at receiver
• Circular buffer of bytes
Topic
•How to set the timeout for sending a retransmission
• Adapting to the network path
CSE 461 University of Washington 80
Lost?
Network
Retransmissions
•With sliding window, detecting loss with timeout
• Set timer when a segment is sent
• Cancel timer when ack is received
• If timer fires, retransmit data as lost
CSE 461 University of Washington 81
Retransmit!
Timeout Problem
• Timeout should be “just right”
• Too long wastes network capacity
• Too short leads to spurious resends
• But what is “just right”?
• Easy to set on a LAN (Link)
• Short, fixed, predictable RTT
•Hard on the Internet (Transport)
• Wide range, variable RTT
CSE 461 University of Washington 82
Example of RTTs
CSE 461 University of Washington 83
0
100
200
300
400
500
600
700
800
900
1000
0 20 40 60 80 100 120 140 160 180 200
R
o
u
n
d
 T
ri
p
 T
im
e 
(m
s)
BCN→SEA→BCN
Seconds
Example of RTTs (2)
CSE 461 University of Washington 84
0
100
200
300
400
500
600
700
800
900
1000
0 20 40 60 80 100 120 140 160 180 200
R
o
u
n
d
 T
ri
p
 T
im
e 
(m
s)
Variation due to queuing at routers, 
changes in network paths, etc.
BCN→SEA→BCN
Propagation (+transmission) delay ≈ 2D
Second
s
Example of RTTs (3)
CSE 461 University of Washington 85
0
100
200
300
400
500
600
700
800
900
1000
0 20 40 60 80 100 120 140 160 180 200
R
o
u
n
d
 T
ri
p
 T
im
e 
(m
s)
Timer too high!
Timer too low!
Need to adapt to the 
network conditions
Seconds
Adaptive Timeout
• Smoothed estimates of the RTT (1) and variance in RTT (2)
• Update estimates with a moving average
1. SRTTN+1 = 0.9*SRTTN + 0.1*RTTN+1
2. SvarN+1 = 0.9*SvarN + 0.1*|RTTN+1– SRTTN+1|
• Set timeout to a multiple of estimates
• To estimate the upper RTT in practice
• TCP TimeoutN = SRTTN + 4*SvarN
CSE 461 University of Washington 86
Example of Adaptive Timeout
CSE 461 University of Washington 87
0
100
200
300
400
500
600
700
800
900
1000
0 20 40 60 80 100 120 140 160 180 200
R
TT
 (
m
s)
SRTT
Svar
Seconds
Example of Adaptive Timeout (2)
CSE 461 University of Washington 88
0
100
200
300
400
500
600
700
800
900
1000
0 20 40 60 80 100 120 140 160 180 200
R
TT
 (
m
s)
Timeout (SRTT + 4*Svar)
Early
timeout
Seconds
Adaptive Timeout (2)
• Simple to compute, does a good job of tracking 
actual RTT
• Little “headroom” to lower
• Yet very few early timeouts
• Turns out to be important for good performance 
and robustness
CSE 461 University of Washington 89
Congestion
TCP to date:
•We can set up a connection (connection 
establishment)
• Tear down a connection (connection release)
•Keep the sending and receiving buffers from 
overflowing (flow control)
What’s missing?
Network Congestion
•A “traffic jam” in the network
• Later we will learn how to control it
CSE 461 University of Washington 92
What’s the hold up?
Networ
k
Congestion Collapse in the 1980s
• Early TCP used fixed size window (e.g., 8 packets)
• Initially fine for reliability
•But something happened as the ARPANET grew
• Links stayed busy but transfer rates fell by orders of 
magnitude! 
CSE 461 University of Washington 93
Nature of Congestion
•Routers/switches have internal buffering 
CSE 461 University of Washington 94
. . 
. . . 
.
. . 
.
. . 
.
Input 
Buffer
Output 
Buffer
Fabric
Input Output
Nature of Congestion (2)
• Simplified view of per port output queues
• Typically FIFO (First In First Out), discard when full
CSE 461 University of Washington 95
Router
=
(FIFO) 
Queue
Queued
Packets
Route
r
Nature of Congestion (3)
•Queues help by absorbing bursts when input > 
output rate
•But if input > output rate persistently, queue will 
overflow
• This is congestion
•Congestion is a function of the traffic patterns – can 
occur even if every link has the same capacity
CSE 461 University of Washington 96
Effects of Congestion
•What happens to performance as we increase load?
Effects of Congestion (2)
•What happens to performance as we increase load?
Effects of Congestion (3)
•As offered load rises, congestion occurs as queues 
begin to fill:
• Delay and loss rise sharply with more load
• Throughput falls below load (due to loss)
• Goodput may fall below throughput (due to spurious 
retransmissions)
•None of the above is good!
• Want network performance just before congestion
CSE 461 University of Washington 99
Van Jacobson (1950—) 
•Widely credited with saving the 
Internet from congestion collapse in 
the late 80s
• Introduced congestion control 
principles
• Practical solutions (TCP Tahoe/Reno) 
•Much other pioneering work:
• Tools like traceroute, tcpdump, 
pathchar
• IP header compression, multicast tools
CSE 461 University of Washington 100
TCP Tahoe/Reno
• TCP extensions and features we will study:
• AIMD
• Fair Queuing
• Slow-start
• Fast Retransmission
• Fast Recovery
CSE 461 University of Washington 101
TCP Timeline
CSE 461 University of Washington 102
19901970 19801975 1985
Origins of “TCP”
(Cerf & Kahn, ’74)
3-way handshake
(Tomlinson, ‘75)
TCP Reno
(Jacobson, ‘90)
Congestion collapse 
Observed, ‘86
TCP/IP “flag day”
(BSD Unix 4.2, ‘83)
TCP Tahoe
(Jacobson, ’88)
Pre-history Congestion control
. . .
TCP and IP
(RFC 791/793, ‘81)
TCP Timeline (2)
CSE 461 University of Washington 103
201020001995 2005
ECN
(Floyd, ‘94)
TCP Reno
(Jacobson, ‘90) TCP New Reno
(Hoe, ‘95) TCP BIC
(Linux, ‘04
TCP with SACK
(Floyd, ‘96)
DiversificationClassic congestion control
. . .
1990
TCP LEDBAT
(IETF ’08)
TCP Vegas
(Brakmo, ‘93)
TCP CUBIC
(Linux, ’06)
. . .
BackgroundRouter support
Delay
based
FAST TCP
(Low et al., ’04)
Compound TCP
(Windows, ’07)
Bandwidth Allocation
• Important task for network is to allocate its capacity 
to senders
• Good allocation is both efficient and fair
• Efficient means most capacity is used but there is 
no congestion
• Fair means every sender gets a reasonable share 
the network
CSE 461 University of Washington 104
Bandwidth Allocation (2)
•Key observation:
• In an effective solution, Transport and Network layers 
must work together
•Network layer witnesses congestion
• Only it can provide direct feedback
• Transport layer causes congestion
• Only it can reduce offered load
CSE 461 University of Washington 105
Bandwidth Allocation (3)
•Why is it hard? (Just split equally!)
• Number of senders and their offered load changes
• Senders may lack capacity in different parts of network
• Network is distributed; no single party has an overall 
picture of its state
CSE 461 University of Washington 106
Bandwidth Allocation (4)
• Solution context:
• Senders adapt concurrently based on their own view of 
the network
• Design this adaption so the network usage as a whole is 
efficient and fair
• Adaption is continuous since offered loads continue to 
change over time
CSE 461 University of Washington 107
Fair Allocations
Fair Allocation
• What’s a “fair” bandwidth allocation?
• The max-min fair allocation
CSE 461 University of Washington 109
Recall
•We want a good bandwidth allocation to be both 
fair and efficient
• Now we learn what fair means
•Caveat: in practice, efficiency is more important 
than fairness
CSE 461 University of Washington 110
Efficiency vs. Fairness
•Cannot always have both!
• Example network with traffic:
• A→B, B→C and A→ C 
• How much traffic can we carry?
CSE 461 University of Washington 111
A B C
1 1
Efficiency vs. Fairness (2)
• If we care about fairness:
• Give equal bandwidth to each flow
• A→B: ½ unit, B→C: ½, and A→C, ½ 
• Total traffic carried is 1 ½ units
CSE 461 University of Washington 112
A B C
1 1
Efficiency vs. Fairness (3)
• If we care about efficiency:
• Maximize total traffic in network
• A→B: 1 unit, B→C: 1, and A→C, 0 
• Total traffic rises to 2 units!
CSE 461 University of Washington 113
A B C
1 1
The Slippery Notion of Fairness
•Why is “equal per flow” fair anyway?
• A→C uses more network resources than A→B or B→C
• Host A sends two flows, B sends one
•Not productive to seek exact fairness
• More important to avoid starvation
• A node that cannot use any bandwidth
• “Equal per flow” is good enough
CSE 461 University of Washington 114
Generalizing “Equal per Flow”
•Bottleneck for a flow of traffic is  the link that limits 
its bandwidth
• Where congestion occurs for the flow
• For A→C, link A–B is the bottleneck 
CSE 461 University of Washington 115
A B C
1 10
Bottleneck
Generalizing “Equal per Flow” (2)
• Flows may have different bottlenecks
• For A→C, link A–B is the bottleneck
• For B→C, link B–C is the bottleneck
• Can no longer divide links equally …
CSE 461 University of Washington 116
A B C
1 10
Max-Min Fairness
• Intuitively, flows bottlenecked on a link get an equal 
share of that link
•Max-min fair allocation is one that:
• Increasing the rate of one flow will decrease the rate of a 
smaller flow
• This “maximizes the minimum” flow
CSE 461 University of Washington 117
Max-Min Fairness (2)
• To find it given a network, imagine “pouring water 
into the network”
1. Start with all flows at rate 0
2. Increase the flows until there is a new bottleneck in 
the network
3. Hold fixed the rate of the flows that are bottlenecked
4. Go to step 2 for any remaining flows
CSE 461 University of Washington 118
Max-Min Example
• Example: network with 4 flows, link bandwidth = 1
• What is the max-min fair allocation? 
CSE 461 University of Washington 119
Max-Min Example (2)
•When rate=1/3, flows B, C, and D bottleneck R4—
R5 
• Fix B, C, and D, continue to increase A 
CSE 461 University of Washington 120
Bottlenec
k
Bottlenec
k
Max-Min Example (3)
•When rate=2/3, flow A bottlenecks R2—R3. Done. 
CSE 461 University of Washington 121
Bottlenec
k
Bottlenec
k
Max-Min Example (4)
• End with A=2/3, B, C, D=1/3, and R2/R3, R4/R5 full 
• Other links have extra capacity that can’t be used
•
CSE 461 University of Washington 122
Adapting over Time
•Allocation changes as flows start and stop
CSE 461 University of Washington 123
Time 
Adapting over Time (2)
CSE 461 University of Washington 124
Flow 1 slows when 
Flow 2 starts
Flow 1 speeds up 
when Flow 2 stops
Time 
Flow 3 limit 
is elsewhere
Bandwidth Allocation
Recall
•Want to allocate capacity to senders
• Network layer provides feedback
• Transport layer adjusts offered load
• A good allocation is efficient and fair
•How should we perform the allocation?
• Several different possibilities …
CSE 461 University of Washington 126
Bandwidth Allocation Models
•Open loop versus closed loop
• Open: reserve bandwidth before use
• Closed: use feedback to adjust rates
•Host versus Network support
• Who is sets/enforces allocations?
•Window versus Rate based
• How is allocation expressed?
CSE 461 University of Washington 127
TCP is a closed loop, host-driven, and window-based
Bandwidth Allocation Models (2)
•We’ll look at closed-loop, host-driven, and window-
based too
•Network layer returns feedback on current 
allocation to senders 
• For TCP signal is “a packet dropped”
• Transport layer adjusts sender’s behavior via 
window in response
• How senders adapt is a control law
CSE 461 University of Washington 128
Additive Increase Multiplicative Decrease 
•AIMD is a control law hosts can use to reach a good 
allocation
• Hosts additively increase rate while network not congested
• Hosts multiplicatively decrease rate when congested
• Used by TCP
• Let’s explore the AIMD game …
CSE 461 University of Washington 129
AIMD Game
•Hosts 1 and 2 share a bottleneck
• But do not talk to each other directly
•Router provides binary feedback
• Tells hosts if network is congested
CSE 461 University of Washington 130
Rest of
Network
Bottleneck
Router
Host 1
Host 2
1
1
1
AIMD Game (2)
• Each point is a possible allocation
CSE 461 University of Washington 131
Host 1
Host 2
0 1
1
Fair
Efficient
Optimal
Allocation
Congested
AIMD Game (3)
•AI and MD move the allocation 
CSE 461 University of Washington 132
Host 1
Host 2
0 1
1
Fair, y=x
Efficient, x+y=1
Optimal
Allocation
Congested
Multiplicative
Decrease
Additive
Increase
AIMD Game (4)
•Play the game!
CSE 461 University of Washington 133
Host 1
Host 2
0 1
1
Fair
Efficient
Congested
A starting 
point
AIMD Game (5)
•Always converge to good allocation!
CSE 461 University of Washington 134
Host 1
Host 2
0 1
1
Fair
Efficient
Congested
A starting 
point
AIMD Sawtooth
•Produces a “sawtooth” pattern  over time for rate 
of each host
• This is the TCP sawtooth (later)
CSE 461 University of Washington 135
Multiplicative
Decrease
Additive
Increase
Time
Host 1 or 
2’s Rate
AIMD Properties
•Converges to an allocation that is efficient and fair 
when hosts run it
• Holds for more general topologies
•Other increase/decrease control laws do not! (Try 
MIAD, MIMD, MIAD)
•Requires only binary feedback from the network
CSE 461 University of Washington 136
Feedback Signals
• Several possible signals, with different pros/cons
• We’ll look at classic TCP that uses packet loss as a signal
CSE 461 University of Washington 137
Signal Example Protocol Pros / Cons
Packet loss TCP NewReno
Cubic TCP (Linux)
Hard to get wrong
Hear about congestion late
Packet delay TCP BBR (Youtube) Hear about congestion early
Need to infer congestion
Router 
indication
TCPs with Explicit 
Congestion Notification
Hear about congestion early
Require router support
Slow Start (TCP Additive 
Increase)
Practical AIMD
•We want TCP to follow an AIMD control law for a 
good allocation
• Sender uses a congestion window or cwnd to set its 
rate (≈cwnd/RTT)
• Sender uses loss as network congestion signal
•Need TCP to work across a very large range of rates 
and RTTs
CSE 461 University of Washington 139
TCP Startup Problem
•We want to quickly near the right rate, cwndIDEAL, but 
it varies greatly
• Fixed sliding window doesn’t adapt and is rough on the 
network (loss!) 
• Additive Increase with small bursts adapts cwnd gently to 
the network, but might take a long time to become 
efficient
CSE 461 University of Washington 140
Slow-Start Solution
• Start by doubling cwnd every RTT
• Exponential growth (1, 2, 4, 8, 16, …)
• Start slow, quickly reach large values
141
AI
Fixed
Time
W
in
d
o
w
 (
cw
n
d
)
Slow-start
Slow-Start Solution (2)
• Eventually packet loss will occur when the network 
is congested
• Loss timeout tells us cwnd is too large
• Next time, switch to AI beforehand
• Slowly adapt cwnd near right value
• In terms of cwnd:
• Expect loss for cwndC ≈ 2BD+queue
• Use ssthresh = cwndC/2 to switch to AI
CSE 461 University of Washington 142
Slow-Start Solution (3)
•Combined behavior, after first time
• Most time spend near right value
143
AI
Fixed
Time
Window
ssthresh
cwndC
cwndIDEAL
AI phase
Slow-start
Slow-Start (Doubling) Timeline
CSE 461 University of Washington 144
Increment cwnd 
by 1 packet for 
each ACK
Additive Increase Timeline
CSE 461 University of Washington 145
Increment cwnd by 1 
packet every cwnd 
ACKs (or 1 RTT)
TCP Tahoe (Implementation)
• Initial slow-start (doubling) phase
• Start with cwnd = 1 (or small value)
• cwnd += 1 packet per ACK
•Later Additive Increase phase
• cwnd += 1/cwnd packets per ACK
• Roughly adds 1 packet per RTT
•Switching threshold (initially infinity)
• Switch to AI when cwnd > ssthresh
• Set ssthresh = cwnd/2 after loss
• Begin with slow-start after timeout
CSE 461 University of Washington 146
Timeout Misfortunes
•Why do a slow-start after timeout?
• Instead of MD cwnd (for AIMD)
• Timeouts are sufficiently long that the ACK clock will 
have run down
• Slow-start ramps up the ACK clock
•We need to detect loss before a timeout to get to 
full AIMD
CSE 461 University of Washington 147
Fast Recovery 
(TCP Multiplicative Decrease)
Practical AIMD (2)
•We want TCP to follow an AIMD control law for a 
good allocation
• Sender uses a congestion window or cwnd to set its 
rate (≈cwnd/RTT)
• Sender uses slow-start to ramp up the ACK clock, 
followed by Additive Increase
•But after a timeout, sender slow-starts again with 
cwnd=1 (as it no ACK clock)
CSE 461 University of Washington 149
Inferring Loss from ACKs
• TCP uses a cumulative ACK
• Carries highest in-order seq. number
• Normally a steady advance
•Duplicate ACKs give us hints about what data hasn’t 
arrived
• Tell us some new data did arrive, but it was not next 
segment
• Thus the next segment may be lost
CSE 461 University of Washington 150
Fast Retransmit
• Treat three duplicate ACKs as a loss 
• Retransmit next expected segment
• Some repetition allows for reordering, but still detects 
loss quickly
CSE 461 University of Washington 151
Ack 1  2  3  4  5  5  5  5  5  5
Fast Retransmit (2)
CSE 461 University of Washington 152
Ack 10
Ack 11
Ack 12
Ack 13
. . . 
Ack 13
Ack 13
Ack 13
Data 
14
. . . 
Ack 13
Ack 20
. . . . . . 
Data 
20Third duplicate 
ACK, so send 14 Retransmission fills 
in the hole at 14
ACK jumps after 
loss is repaired
. . . . . . 
Data 14 was lost 
earlier, but got 
15 to 20
Fast Retransmit (3)
• It can repair single segment loss quickly, typically 
before a timeout
•However, we have quiet time at the sender/receiver 
while waiting for the ACK to jump
•And we still need to MD cwnd …
CSE 461 University of Washington 153
Inferring Non-Loss from ACKs
•Duplicate ACKs also give us hints about what data 
has arrived
• Each new duplicate ACK means that some new segment 
has arrived
• It will be the segments after the loss
• Thus advancing the sliding window will not increase the 
number of segments stored in the network
CSE 461 University of Washington 154
Fast Recovery
• First fast retransmit, and MD cwnd
• Then pretend further duplicate ACKs are the 
expected ACKs
• Lets new segments be sent for ACKs 
• Reconcile views when the ACK jumps
CSE 461 University of Washington 155
Ack 1  2  3  4  5  5  5  5  5  5
Fast Recovery (2)
CSE 461 University of Washington 156
Ack 12
Ack 13
Ack 13
Ack 13
Ack 13
Data 
14
Ack 13
Ack 20
. . . . . . 
Data 
20
Third duplicate 
ACK, so send 14
Data 14 was lost 
earlier, but got 
15 to 20
Retransmission fills 
in the hole at 14
Set ssthresh, 
cwnd =  cwnd/2 
Data 
21Data 
22
More ACKs advance 
window; may send 
segments before jump
Ack 13
Exit Fast Recovery
Fast Recovery (3)
•With fast retransmit, it repairs a single segment loss 
quickly and keeps the ACK clock running
• This allows us to realize AIMD
• No timeouts or slow-start after loss, just continue with a 
smaller cwnd
• TCP Reno combines slow-start, fast retransmit and 
fast recovery
• Multiplicative Decrease is ½ 
CSE 461 University of Washington 157
TCP Reno
CSE 461 University of Washington 158
MD of ½ , no slow-start
ACK clock 
running
TCP sawtooth
TCP Reno, NewReno, and SACK
•Reno can repair one loss per RTT
• Multiple losses cause a timeout
•NewReno further refines ACK heuristics
• Repairs multiple losses without timeout
• Selective ACK (SACK) is a better idea
• Receiver sends ACK ranges so sender can retransmit 
without guesswork
CSE 461 University of Washington 159
TCP CUBIC
• Standard TCP Stack in Linux (> 2.6.19) and Windows (> 
10.1709)
• Internet grows to have more long-distance, high 
bandwidth connections
• Seeks to resolve two key problems with “standard” TCP:
● Flows with lower RTT’s “grow” faster than those with 
higher RTTs
● Flows grow too “slowly” (linearly) after congestion
CSE 461 University of Washington 160
TCP CUBIC
1) At the time of experiencing congestion event the window size for that instant will 
be recorded as Wmax or the maximum window size.
2) The Wmax value will be set as the inflection point of the cubic function that will 
govern the growth of the congestion window.
3) The transmission will then be restarted with a smaller window value (20%) and, if 
no congestion is experienced, this value will increase according to the concave 
portion of the cubic function (not depending on received ACKs for cadence).
4) As the window approaches Wmax the increments will slow down.
5) Once the tipping point has been reached, i.e. Wmax, the value of the window will 
continue to increase discreetly.
6) Finally, if the network is still not experiencing any congestion, the window size will 
continue to increase according to the convex portion of the function.
CSE 461 University of Washington 161
TCP CUBIC
CSE 461 University of Washington 162
TCP CUBIC vs Everyone
CSE 461 University of Washington 163
TCP BBR
• Bottleneck Bandwidth and Round-trip propagation time 
• Developed at Google in 2016 primarily for YouTube traffic
• Attempting to solve “bufflerbloat” problem
• “Model-based” (Vegas) rather than “Loss-based” (CUBIC)
● Measure RTT, latency, bottleneck bandwidth
● Use this to predict window size
CSE 461 University of Washington 164
Bufferbloat
• Larger queues are better than smaller queues right?
CSE 461 University of Washington 165
Bufferbloat
• Given TCP loss semantics…
• Performance can decrease 
buffer size is increased
• Consider a full buffer:
● New packets arrive and 
are dropped (‘tail drop’)
● SACK doesn’t arrive until 
entire buffer sent
CSE 461 University of Washington 166
TCP BBR
• BBR Has 4 Distinct Phases
1) Startup: Basically identical to Cubic. Expontentially grow until 
RTTs start to increase (instead of dropped packet). Set cwnd. 
2) Drain: Startup filled a queue.  Temporarily reduce sending 
rate (known as “pacing gain”)
3) Probe Bandwidth: Increase sending rate to see if there’s more 
capacity. If not, drain again.
4) Probe RTT: Reduce rate dramatically (4 packets) to measure 
RTT. Use this as our baseline for above. 
CSE 461 University of Washington 167
TCP BBR vs Everyone
CSE 461 University of Washington 168
Network-Side Congestion 
Control
Congestion Avoidance vs. Control
•Classic TCP drives the network into congestion and 
then recovers
• Needs to see loss to slow down
•Would be better to use the network but avoid 
congestion altogether!
• Reduces loss and delay
•But how can we do this?
CSE 461 University of Washington 170
Feedback Signals
•Delay and router signals can let us avoid congestion
CSE 461 University of Washington 171
Signal Example Protocol Pros / Cons
Packet loss Classic TCP
Cubic TCP (Linux)
Hard to get wrong
Hear about congestion late
Packet delay TCP BBR (Youtube) Hear about congestion early
Need to infer congestion
Router 
indication
TCPs with Explicit 
Congestion Notification
Hear about congestion early
Require router support
ECN (Explicit Congestion Notification)
•Router detects the onset of congestion via its queue
• When congested, it marks affected packets (IP header)
CSE 461 University of Washington 172
ECN (2)
•Marked packets arrive at receiver; treated as loss
• TCP receiver reliably informs TCP sender of the congestion
CSE 461 University of Washington 173
ECN (3)
•Advantages:
• Routers deliver clear signal to hosts
• Congestion is detected early, no loss
• No extra packets need to be sent
•Disadvantages:
• Routers and hosts must be upgraded (currently 1%)
• More work at router
CSE 461 University of Washington 174
Random Early Detection (RED)
• Jacobson (again!) and Floyd
•Alternative idea: instead of marking packets, drop
• We know they’re using TCP, make use of that fact
• Signals congestion to sender
• But without adding headers or doing packet inspection
•Drop at random, depending on queue size
• If queue empty, accept packet always
• If queue full, always drop
• As queue approaches full, increase likelihood of packet drop
• Example: 1 queue slot left, 10 packets expected, 90% chance of drop
RED (Random Early Detection)
•Router detects the onset of congestion via its queue
• Prior to congestion, drop a packet to signal
CSE 461 University of Washington 176
Drop packet
RED (Random Early Detection)
• Sender enters MD, slows packet flow
• We shed load, everyone is happy 
CSE 461 University of Washington 177
Drop packet