LAN

LAN/MAN interface - Transport layer

Transmission Control Protocol - TCP
User Datagram Protocol - UDP
- Services provided by UDP
- Header structure of the UDP segment

About TCP (Transmission Control Protocol)

Elements of the recommendation, standard describing the protocol:

RFC 793

Services provided by the TCP

Task of TCP is to provide a reliable full duplex byte stream in case of connection of best-effort networks between programs operating in the two terminals. Therefore TCP generates a logical connection between the two applications. (As a reminder: IP generates a logical connection between two hosts, while Ethernet generates a physical connection between two neighboring network elements.) TCP means always a connection between two terminals, we cannot realize a point-multipoint connection with TCP. A typical example for this is the connection between the web browser, and the webserver. The two terminals are identified by their IP-numbers, while applications running on them, wishing to communicate with each other are identified by port numbers (with that we mark actually interfaces). This identification is a globally individual one. It is worth to note, that (i) this identifier can change at different points of the transmission route (see also: NAT), and (ii) the individual elements of this identifier are used in differenet layers.

TCP has an influence to the size of the segment, considering its manager functions, making a traffic control and a congestion control. It is important to see, that because of that TCP has a rate/timing/scheduling of data writing and sending, "decoupled" from applications connected by itself. TCP cannot be used properly in case of certain applications because of this - e.g. in case of real-time connections.

TCP generates a connection from a technical viewpoint between the so-called sockets controlled by the operation systems, which can be considered to interfaces from the viewpoint of the application. We can refer to the socket with a port number (see structure of the TCP header below). Operation of a TCP protocol can be divided into 3 parts: (i) generation of connection, (ii) data transmission, (iii) disassembly of connection.

Header structure of a TCP segment

   
          0                   1                   2                   3 
          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |         Source port           |      Destination port         |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |                      Sequence number                          |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |                   Acknowledgement number                      |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         | TCP   |           |U|A|P|R|S|F|                               |
         |header | Reserved  |R|C|S|S|Y|I|            Window             |
         |length |           |G|K|H|T|N|N|                               |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |       Checksum                |       Urgent pointer          |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |       Options (if any)                        |   Padding     |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |       Data to be transmitted                                  |
                                         .
                                         .
                                         .
         |                                                               |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

According to the issues mentioned above, header consists of at least 20 octets, and we will introduce the individual elements below:

Source Port - a 16-bit number, which identifies the sending port, and thus the sending application, or process. We can divided the possible port numbers into three groups:
- 0...1023 (0x3FF) - "well known port numbers" Pl: 80: World Wide Web HTTP, 53: Domain Name System
- 1024...49151 (0xBFFF) - registrated ports (IANA website)
- 49152...65535 (0xFFFF) - dynamic or private ports, for temporary usage in a client-server connection
Management of registrated port numbers is the task of Internet Assigned Numbers Authority (IANA).
Destination Port - Identifier of a properly chosen port at the addressee. Port number of the source and the aim, and their IP-numbers identify globally and individually the individual TCP connections.
Sequence Number - This number of 32 bits is the sequence number of the first data byte of the segment, handling uniquely the whole communication. After all the singular values the counter will turn.
Acknowledgement Number - If value of the ACK control bit is 1, then the number of 32 bits given in this field will be the sequence number of the byte, which the sender of the segment expects from the receiver.
Lenght of the TCP header (Data Offset or Header Length) - Whole length of the header in words of 32 bits (in units of 4 bytes). Its value is 5 without options, while the maximum length of a TCP header could be bytes. Its original name is Data Offset in the standard, related to that Options could be located between the smallest possible TCP header, and data block after it, so with what a shift we can reach these data. This field is necessary, because it is the only one, which relates to the presence of Options.
Reserved) - 6 bits reserved for a later usage, its value is compulsorily consists of totally zeros.
Control Bits or Flags - 6 bits in the following order:
- URG - Urgent Pointer - If its values is 1, then receiver must take into consideration value of the Urgent Pointer field.
- ACK - Acknoledgement - If its value is 1, then the Acknowledgement Number field has a real time value.
- PSH - Push Function - If its value is 1, then receiver must forward the segment to the receiving application, as soon as possible. Content of the segment could be, fo example, a control message, which has to precede the normal data flow.
- RST - Reset the Connection - It signs to the receiver, that sender deleted the connection. Queueing data packets can be deleted, and the reserved puffers can be freed. On the basis of sequence number of the segment, and the acknowledgement the receiver can ignore the RST comand.
- SYN - Synchronize sequence numbers - If its value is 1, then it signs, that sender would like to "synchronize" sequence numbers. It is used by generating a connection.
- FIN - No more data from sender - If its value is 1, then it signs to the receiver, that the sender will not send more data in this connection.
Window - It is a variant used during the traffic control of TCP. It tells, how many data is the receiver able to receive. The possible 65535 bytes is not too large, it can be enhanced with the help of the "Window Scale" option.
Checksum - Receiver calculates the 16-bit checksum sent by the sender, too. If the two numbers are the same, then this IP packet is very likely without any errors. Method of calculation: we add up elements of the segment by 16 bits of units according to the singular complement arithmetics, and we write the singular complement of the amount to the CRC field. We consider the so called pseudo-header in the calculation, that illustrates IP-address of source and target in order, a byte consisting of all zeros, sign of the inner protocol at 8 bits (its value is 6 in our case because of TCP) and lenght of the TCP segment at 16 bits. We put this 96 bits forward the TCP segment, and fill in the field of the checksum with zeros, then we make the summary and write the result to the field of the checksum.
Urgent Pointer - A pointer pointing the the byte after the urgently processable (forwardable towards the receiving application as soon as possible) data. It must be interpreted relatively within the segment.
Options - Description of the options happen in 8-bit of units. We have two opportunities: (i) an option marking of an octet, or (ii) e an option marking of an octet, an option length of an octet, and then the parameters belonging to the option. Some interesting options (first the option marking, and then the value of length and parameters):
- 0 (8 bits) - it is the end of the option list, there will not be more options.
- 1 (8 bits) - there is no operation or padding between the options.
- 2, 4, SS (32 bits) - specification of the maximum segment size.
- 3, 3, S (24 bits) - application of "window scale".
- 4, 2 (16 bits) - enabling of Selective ACKnowledgement (SACK).
- 5, N, BBBB, EEEE, ... - a selective acknowledgement (with a various length, its N value could be 10, 18, 26 or 34). After the first two bytes the next issues are begin and end pointers of the acknowledged blocks.
- 8, 10, TTTT, EEEE - time stamp. After the first two bytes the next issue is sending back the sent timestamp, and the received timestamp (Echo).
Options 2, 3 and 4 can be used only in segments with SYN flag.
Padding - Since size of the options could be various, but header must consist of words of 32 bits, we pad it with zeros, if it is necessary.

Note, that RFC 3168 defined further two signal bits forward the URG bit (thus there are only a space of 4 bits remained in the Maintained field), sequentially CRW (Congestion Window Reduced) and ECE (ECN Echo) bits, that have a great significance in connection with the improved congestion-handling mechanism of TCP. State diagram of TCP is demonstrated on the following figure (Source: Wikimedia)::

TCP connection setup

Since TCP connection is a connection oriented network layer protocol, thus before we would like to transmit any data between two applications running on two different computers with the help of TCP, we must build up a connection. Partner initiating a TCP connection is called client host, while the another partner is called server host. Generation of a TCP connection - which is often called 3-way-handshake - is demonstrated on the following figure:

We can read from this figure, that what flags have a value of 1 in the individual segments, and we signed the sequence numbers and sequence numbers of acknowledgements everywhere, where it could be interesting. Initial sequence numbers (ISN) are selected randomly to avoid a possible attack (see in details: RFC 1948).

As it is shown on the figure, we determine the maximum segment size (MSS) by building up a connection, that is the maximum size of data in the application layer at the TCP segment. Measurement of MSS depends on the TCP implementation (that is determined by the operation system), but it can mostly be configured (as it is demonstrated on the figure) with the help of the proper Option header element. Actual value is usually set to avoid the IP fragmentation. IP fragmentation will happen, when size of the IP packet exceeds the maximum size permitted in connection with size of the PDU (Maximum Transmission Unit - MTU). We would like to avoid fragmentation, if it is possible, because it generates an extra traffic in case of resending. That is why value of MSS orientates to MTU. In case of the current Internet IP usually runs over Ethernet (more exactly: we pack the IP packets into EthernetII frames), which have an MTU with 1500 bytes. Certain manufacturers allow the use of the so called Jumbo-frames, and in the case of their usage MTU could even be 9000 bytes, but it may occur only in LANs, and the border protocols (e.g. PPPoE) reducate itt o the common size. In case of a WLAN value of MTU is 2272 bytes. Since TCP generates a connection between two applications running on two different computers, that can be at an optional place in the Internet, we cannot be sure, that the same MTU is valid everywhere during the transmission route. We have to learn the MTU (that will be obviously the smallest MTU) valid for the whole route to avoid repacking en route. This process described by RFC 1191 and RFC 1981, in the case of IPv4 and IPv6, respectively, is called "Path MTU Discovery", and it uses the ICMP.

Transmission of a data flow through a TCP connection

In the data transmission state TCP provides the following services:

Sequentially correct data transmission - the destination node arranges segments on the basis of their sequential numbers.
Resending of lost packets - the destination node sends a cumulative acknowledgement, and source resends the unacknowledged data.
Drop of duplicated packets
Error-safe data transmission ? usage of CRC in the header.
Flow control - slow receiver can limit speed of the sender.
Congestion control - it limits speed of the sender in case of overloading the network.

Since we cannot avoid fully possibilities of the network errors, a reliable data transmission is only possible, if the source learns, which segments have arrived, and then resends the lost ones. As we can read it from the structure of the header, sequence numbering gives a possibility to the target for restore the original sequence of segments on one hand, and on the other hand, it can tell the resource, referring to the sequence numbers by the acknowledgement, which segments were received. TCP allows for sake of enhancing the efficiency to have several segments at the same time "out on the network" (value of the "Window" field determines quantity of unacknowledged data being maximally at the network), however, buffer of the destiantion assigned to the given port are finite, and we can make the process mostly on the coherent byte stream, so we have to try after, that segments arrive "roughly" in the order, as we have sent them. That is why acknowledgement refers to the already arrived coherent amount of data: sequence number of acknowledgement is the sequence number of a byte, which would be the next after the coherent byte stream received so far. If resource concludes on the basis of acknowledgements, that one of the segments is lost, then resends it. Destination sends an acknowledgement after all of the arrived segments, although it is not the most efficient solution. We will resort less network resources, if we delay the acknowledgement a bit. The so called "piggyback acknowledgement" solution can be applied well, if the TCP traffic is symmetric between the client and the server. We have to send the acknowledgement in the header of the data packet going to the opposite direction. If there is no traffic to the opposite direction, we have a possibility for sending the acknowledgement not exactly by packets, but a bit more rarely. (At least by 500 msecs and at least after all the second segments. See also: RFC 1122.)

According to the presently valid recommendations, implementations must support compulsorily the selective acknowledgement (SACK). In this case we use the Option field, and we can acknowledge for one or more coherent range of sequence numbers. This solution improves significantly the efficiency of resending.

TCP flow control

Quantity of data being unacknowledged in the network simultaneously is specified by the "Window" (Advertised Window). Destination may tell the resource, how many data it will be able to get, so it prevents, that a fast sender should overrun the receiver (flow control). When packets arrive correctly, in a perfect order, then size of the window is actually constant (or fluctuates only a little bit because of the delayed acknowledgement), but at the same time its beginning point is gradually raising ("sliding? - the name of the window derives from this). We can refer to this self-controlling characteristic with the expression of "self clocking". If packets do not arrive in a correct order, then sequence number of the acknowledgement (or beginning point of the window) is constant, but its size is reducing, because we store packets, that did not arrive in a correct order (if they fall within the window, otherwise we will throw them away). We will also throw the duplicated packets away. This is the way the receiving puffer with a sliding window serves for restoring the sequence of segments. The space of lost parts of the data flow will remain empty in the window. We sign this lack with acknowledgements, and we send a cumulative acknowledgement about the additionally arriving segments, thus the data transmission will be reliable. Usage of Sliding Window is easy to understand from the following figure and formulas.

Conditions of a reliable and ordered data transmission:

We give a sequence number to every byte.
We use cumulative acknowledgement.
On the sender side
- LastByteAcked <=LastByteSent
- LastByteSent <= LastByteWritten
- We have to store bytes with sequence numbers between LastByteAcked and LastByteWritten.
On the receiver side
- LastByteRead < NextByteExpected
- NextByteExpected <= LastByteRcvd + 1
- We have to store bytes with sequence numbers between NextByteRead and LastByteRcvd.

Rules of realization of the flow control

Size of the sender's buffer : MaxSendBuffer
Size of the receiver's buffer : MaxRcvBuffer
On the receiver side
- LastByteRcvd - NextBytteRead <= MaxRcvBuffer
- AdvertisedWindow = MaxRcvBuffer - (LastByteRcvd - NextByteRead) - It signs free puffer capacity of the receiver.
On the sender side
- LastByteSent - LastByteAcked <= AdvertisedWindow
- EffectiveWindow = AdvertisedWindow - (LastByteSent - LastByteAcked) - It gives, how many bytes can source send yet.
- LastByteWritten - LastByteAcked <= MaxSendBuffer
- If (LastByteWritten - LastByteAcked) + y > MaxSendBuffer, then source cannot send any data.

Retransmission in TCP

Since TCP provides a reliable transmission, we resend every segment, to that does not arrive any acknowledgements within a certain time. We connect a resending timer to every segment, which after the expiration of RTO (Retransmission Timeout) resends the segment. RTO can be determined on the basis of Round Trip Time (RTT) between the two terminals of the communication, however, between two optionally chosen terminals of Internet, RTT can fall into a range covering several orders of magnitudes, and even between two selected hosts generation of delay can show a significant fluctuation in time, so determination of RTT can be a quite difficult task, and there were made better and better solutions over time: :

According to the original algorithm in case of every pair of segments/acknowledgements we measured the actual round trip time (SampleRTT), to that we calculated a weighted amount:

EstimatedRTT = a*EstimatedRTT + b*SampleRTT,

where a+b = 1 and the value of a and b is between 0.8 and 0.9, and between 0.1 and 0.2, respectively.

RTO is chosen on the basis of EstimatedRTT for the following:

TimeOut = 2 * EstimatedRTT

Karn/Partridge algoritm modified the original one, because it was observed, that in case of resent packets SampleRTT does not bear a real information (see the figure below), so we should not take the resent packets into consideration by estimating RTT. At the same time, it generates a further problem, because if we assume a situation, where the transmission delay starts growing significantly very suddenly (between two segments), then the rule mentioned above would mean, that we will not refresh RTT any more. That is why we apply the "exponential backoff", which means, that in case of every resending we will increase RTO twice as large:

new_TimeOut = old_TimeOut*2.

This process provides stability of network in practice.

The Jacobson/Karels algoritm was published first in an article of Van Jacobson in 1988. This article discussed actually more algorithms, that became later official standards, published in RFCs. These algorithms - although they concerned different fields - all recommended solutions for the serial collapses of Internet, beginning from October of 1986. These collapses were caused by congestion (congestion collapse - (C) John Nagle), since standards were not prepared for its handling properly. (As an example, we can tell, that the original algorithm serving for the estimation of RTT was operating approximately till a 30% of loading.) Congestion reduces not only the transmission ability - Jacobson's example: a bandwidth reducing from 32 kbps to 40 bps between two computers, to 400 yard from each other, with 3 IMP hops (Interface Message Processor - network element of ARPAnet) between them -, but it enhances significantly the deviation of the delay of arriving packets. It is easy to understand, that if deviation of RTT is large, then RTO must be significantly higher then the average of RTT. So we will apply the following rules:

Difference = SampleRTT - EstimatedRTT

EstimatedRTT = EstimatedRTT + ( d * Difference)

Deviation = Deviation + d ( |Difference| - Deviation)), where d is a fraction between 0 and 1.

We should take into consideration this spreading by setting RTO:

Timeout = u * EstimatedRTT + q * Deviation, where u = 1 and q = 4 .

TCP Congestion Control

Congestion, as we have described it earlier, does not belong to the initial phenomena of Internet, but it was experienced more and more only from the second half of the '80 years. Recommendations for solution have been published in Jacobson's mentioned article. TCP congestion control consists of four algorithms connected to each other, which were described as a standard by RFC 2581. Recently RFC 5681 is valid. There are two new variables occurring in the algorithms. One of them is the "congestion window", which is signed with the "cwnd" (or "CongWin") abbreviation, and henceforth we have to take into consideration the smaller one from the advertised window of the receiver and the congestion window, as a window of the sender. Another variable is the "slow start threshold" (ssthresh), which is a boundary point of the change between the two algorithms. Further important novelty is, that loss of packet is considered to be the sign of congestion, and we react to the loss of packet differently, depending on, if we perceived it on the basis of expiration of the timer, or if we consider it to be lost on the basis of 3 duplicated acknowledgements.

The four processes are the following:

Slow Start - Congestion window grows with an MSS after every acknowledgements in this phase, which means duplication of cwnd as an RTT. (Delayed acknowledgement naturally slows this measure of growing.) This phase lasts, till sending window reaches value of SSthresh (naturally it is possible, that it will not happen ever, because even receiver can limit the rate of sending, and sending window is the minimum of AdvWindow and cwnd) or if we does not perceive any loss of packet. We usually set value of cwnd to 1 segment in case of newly built TCP conncetions, while value of ssthresh is set to 65535 bytes. If sending window reaches value of ssthresh, then we get into the congestion avoiding phase. If we perceive a packet loss on the basis of 3 duplicated akcnowledgements, then we get into the quick resending phase, considering half of the actual sending window to a new ssthresh, and we set value of cwnd to ssthresh (Multiplicative Decrease), and if the timer of resending expires, we start again the slow start phase, so value of ssthresh will be half of the actual sending window, while we set value of cwnd to 1 MSS.
Congestion Avoidance - We use the congestion avoidance phase to search the maximum speed at the network with the smallest fluctuation. Inversely to the exponential growing of slow start, cwnd grows linearily as an RTT. When we get to this phase, value of ssthresh will be the half of that sending window, in which we perceived the congestion, while we set value of cwnd to ssthresh. After that, we will increase the value of cwnd for every newly arriving Acs, with MSS*MSS/cwnd. It increases practically cwnd with 1 MSS by RTTs (Additive Increase). If a new packet loss will happen, and it is perceived on the basis of duplicated acknowledgements, then we get to the quick resending phase, considering the half of the actual sending window to a new ssthresh, and we set value of cwnd to ssthresh-ra (Multiplicative Decrease), and if the timer of resending expires, then we will start the slow start phase again, so value of ssthresh will be the half of the actual sending window, while we set value of cwnd to 1 MSS.
Fast Retransmit - We resend the lost packet. If an ACK arrives, acknowledging the new data, then phase of a fast recovery will follow, and if the timer expires, then phase of the slow start will follow.
Rast Recovery - Since we know, that data traffic does not stop entirely, when we get into this phase, we does not have to get into the phase of slow start, but we can continue the transmission with a higher cwnd value instead. Usually it is implemented together with the fast retransmit by the following:
1. As we described above, after 3 duplicated acknowledgements we will set value of ssthresh to be half of the actual cwnd, and retransmit the lost segment. After the retransmission we will set value of cwnd to ssthresh + 3 MSS.
2. If a subsequent duplicated ACK arrives, we will increase value of cwnd with an MSS, and send a segment, if value of cwnd allows it.
3. If an ACK related to some new data arrives, then we set value of cwnd to ssthresh. This acknowledgement must acknowledge the segment sent during the fast retransmit by expiration of RTT compared to the retransmit.If the packet loss was a unique one, then this ACK acknowledges all the other packets, too, which we have sent before the reception of duplicated acknowledgements. If the situation is like this, then we get into the phase of avoiding congestion, if not, then we will remain in the phase of fast recovery.

Congestion control of TCP can be considered, as a state machine. The following table informs us about, what kind of transit is possible between the individual states.

State Event TCP Sender Action Commentary

Slow Start (SS) ACK receipt for previously unacked data CongWin = CongWin + MSS, If (CongWin > Threshold) set state to "Congestion Avoidance" Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA) ACK receipt for previously unacked data CongWin = CongWin+MSS * (MSS/CongWin) Additive increase, resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK Threshold = CongWin/2, CongWin = Threshold, Set state to "Congestion Avoidance" Fast recovery, implementing multiplicative decrease. CongWin will not drop below 1 MSS.

SS or CA Timeout Threshold = CongWin/2, CongWin = 1 MSS, Set state to "Slow Start" Enter slow start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked CongWin and Threshold not changed

State	Event	TCP Sender Action	Commentary
Slow Start (SS)	ACK receipt for previously unacked data	CongWin = CongWin + MSS, If (CongWin > Threshold) set state to "Congestion Avoidance"	Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)	ACK receipt for previously unacked data	CongWin = CongWin+MSS * (MSS/CongWin)	Additive increase, resulting in increase of CongWin by 1 MSS every RTT
SS or CA	Loss event detected by triple duplicate ACK	Threshold = CongWin/2, CongWin = Threshold, Set state to "Congestion Avoidance"	Fast recovery, implementing multiplicative decrease. CongWin will not drop below 1 MSS.
SS or CA	Timeout	Threshold = CongWin/2, CongWin = 1 MSS, Set state to "Slow Start"	Enter slow start
SS or CA	Duplicate ACK	Increment duplicate ACK count for segment being acked	CongWin and Threshold not changed

It is worth to know, that for congestion control occurring at IP networks we have other possibilities not realized at TCP, that, however, are connected to it somehow. These possibilities are, for example, RED (Random Early Detection) or ECN (Explicit Congestion Notification) - we have already mentioned two new TCP signal bits in connection with this later one.

Fairness of TCP

If there is a narrow link shared by several TCP streams, then we may expect, that they should share the resources fairly, independently from the initial parameters. It is provided by Additive Increase/Multiplicative Decrease speed control used at the congestion avoidance as it is demonstrated by the following figure.

It results the well-known sawtooth-shaped speed diagram of TCP, and we can simply estimate the average speed of TCP on the basis of this: 0.75*cwnd/RTT. Naturally, this estimation will only be true, if the congestion window limits transmission and the transmittable byte stream is long enough, that slow start would be negligable compared to the phase of congestion avoidance. The sawtooth diagram characterizes not only the speed, but also its shape can be corresponded to size of cwnd, and to the amount of data in the receiving puffer.

Sequence numbers and acknowledgements in TCP

We must see, that aim of management functions of TCP is to reach the highest possible throughput, exploiting the transmission capacity at our disposal, i.e. we should saturate the link, as far as it is possible. Beyond parts discussed so far, we have to talk about Window and Sequence Number, i.e. about, how can their usage affect the availability of the aim mentioned above.

We have already told about Window, that receiver can send a message to the sender in this header element, and give the number of bytes, that it can receive yet (Advertised Window), which can maximally be 64kBs, because the Window field has 16 bits. Maximum of this size and congestion window gives to the sender, how many unacknowledged data can totally be sent. Let us assume, that receiver has a proper performance, and network is unloaded, and RTT is 50 msecs. It can be simply shown, that resource can only saturate the link, if maximum of AW can be larger, than value of the so-called Bandwitdh-Delay Product (BDP). The following table gives BDP related to values of some possible bandwidths.

Bandwidth [Mbps] BDP (RTT=50 msec) [kB] Time to wrap around

T1 (1.5 Mbps) 9 6.4 hours

ADSL (5 Mbps) 31 1.9 hours

Ethernet (10 Mbps) 61 57 minutes

T3 (45 Mbps) 275 13 minutes

Fast Ethernet (100 Mbps) 610 5.7 minutes

STS-3 (155 Mbps) 946 3.7 minutes

STS-12 (622 Mbps) 3796 55.2 seconds

Gigabit Ethernet (1000 Mbps) 6104 34.4 seconds

Bandwidth [Mbps]	BDP (RTT=50 msec) [kB]	Time to wrap around
T1 (1.5 Mbps)	9	6.4 hours
ADSL (5 Mbps)	31	1.9 hours
Ethernet (10 Mbps)	61	57 minutes
T3 (45 Mbps)	275	13 minutes
Fast Ethernet (100 Mbps)	610	5.7 minutes
STS-3 (155 Mbps)	946	3.7 minutes
STS-12 (622 Mbps)	3796	55.2 seconds
Gigabit Ethernet (1000 Mbps)	6104	34.4 seconds

It can be seen, that the line of 10 Mbps can be filled in yet with Window of 64 kBs, but a link with higher rate is not. As a solution for this problem, use Window Scale (RFC 1323) from the options of TCP header: here we have a field of 8 bits, and its value will give, that with how many bits we should shift the received value to the left (and the transmittable value to the right). However, we cannot exploit the gained possibilities: value of the window scale factor can maximally be 14, that corresponds approximately to the window value of 1 GB.

Sequence Number is a rotation number of 32 bits, which has a shift compared to the initial, randomly selected sequence number value, that is equal to the position of the first data byte of a given segment in the transmittable byte stream. Receiver decides on the basis of sequence number of the received segment, whether it fits to the receiver window, or it is must be dropped. We can roughly address data of 4 GBs with 32 bits, that is not very much nowadays. We indicated rotation time measured at the link of a given bandwidth in the 3rd column of the table above (it is the time, when we use the whole set of sequence numbers - time to wrap around). Besides, we have to take into consideration, that windows of the sender and the receiver does not cover each other accurately, so we will learn after some calculation, that it it necessary to keep the SenderWindowSize < (MaxSeqNum +1)/2 condition. As we can see, it happens at larger networks respectively soon. We can search for a solution to this problem among TCP options again. We have a possibility by using Timestap option described in RFC 1323 to distinguish segments arriving with the same sequence number even with a large window size. It is important to know, that timestamp counts time units, and we can initialize and synchronize the counter during SYN-SYN/ACK messaging. We have to use in connection with the selection of timebase values between 1 msec and 1 sec according to the standard. Sometimes a timebase between 10 and 100 msecs is selected in practice. It is proper for using Timestamp option for another purpose, because we have also a possibility with sending back the timestamp to measure RTT respectively accurately. We can use result of this measurement for both the calculation of RTO, and for the forecast of congestion.

Sequence Number Plot

We can obtain an expressive image about characteristics of the given TCP stream, plotting TCP sequence numbers as a function of time. We can track not only the individual steps of congestion control on a correctly scaled diagram, but also we can read the characteristic amounts related to the connection of TCP (e.g.: MSS, RTTs, RTO, window sizes, etc.).

Closing of a TCP connection

We have to close the connection after transmission of the last byte of the TCP data stream. This closing consists of segments with FIN signal bits and their acknowledgements sent mutually by the two parties, so we use basically 4 message, which is not surprising, if we think about, that TCP provides a full-duplex connection. The following figure demonstrates a possible case. It is worth to note, that we will not get into a CLOSED state immediately after the second acknowledgement in the state graph, but we must wait for the expiration of a timer.

We can read from the figure, that it may occur sometimes, that one of the parties has already closed the connection, while it was open from the other side (half open). It may also happen, that the two parties intend to close the connection mutually, so both of them send the FIN segment without getting this kind of issue from the other party. Finally we have to mention, that it is often used in today's implementation the proper version of TWH, which means, that the acknowledgement given to the active FIN, and the passive FIN are in the same segment.

High speed TCP-variants

Continuous growing of the network bandwidth raised the problem of saturation of the link, which has already been mentioned earlier, and another problem, that time of feedback (RTT) can be comparable with time necessary for reception of the whole file in case of smaller files, however, it will be significantly longer, than it would be necessary because of the slow start. It results, that congestion control and traffic control will not be efficient, and exploitation of the link will be small. We have already demonstrated some possibilities above, with that we can exploit more transmission provided by high-speed networks, but many people think, that we need new algorithms instead of trying to improve TCP. The following table demonstrates some more known of the so-called high-speed TCP-variants.

Name of protocol Type Who and when recommended it Main features
HighSpeed TCP based on packet loss S. Floyd, International Computer Science Institute (ICSI), Berkeley University of California, 2003. AIMD
Scalable TCP based on packet loss T. Kelly, CERN & University of Cambridge, 2003. MIMD
BIC TCP / CUBIC based on packet loss I. Rhee et al., Networking Research Lab, North Carolina State University, 2004. és 2005. good throughput, fairness, and stability
FAST TCP based on delay S. Low et al., Netlab, California Institute of Technology, 2004. (ma: FastSoft Inc.) promising fairness features
TCP Westwood measurement based M. Y. Sanadidi, M. Gerla et al., High Performance Internet Lab, Network Research Lab, University of California, Los Angeles (UCLA), 2001 és 2005 között more variations, different estimation methods
Compound TCP hybrid K. Tan et al., Microsoft Research, 2005. AIMD + delay based component
XCP explicit congestion notification D. Katabi et al., Massachusetts Institute of Technology (MIT), 2002. it is necessary to modify routers

Name of protocol	Type	Who and when recommended it	Main features
HighSpeed TCP	based on packet loss	S. Floyd, International Computer Science Institute (ICSI), Berkeley University of California, 2003.	AIMD
Scalable TCP	based on packet loss	T. Kelly, CERN & University of Cambridge, 2003.	MIMD
BIC TCP / CUBIC	based on packet loss	I. Rhee et al., Networking Research Lab, North Carolina State University, 2004. és 2005.	good throughput, fairness, and stability
FAST TCP	based on delay	S. Low et al., Netlab, California Institute of Technology, 2004. (ma: FastSoft Inc.)	promising fairness features
TCP Westwood	measurement based	M. Y. Sanadidi, M. Gerla et al., High Performance Internet Lab, Network Research Lab, University of California, Los Angeles (UCLA), 2001 és 2005 között	more variations, different estimation methods
Compound TCP	hybrid	K. Tan et al., Microsoft Research, 2005.	AIMD + delay based component
XCP	explicit congestion notification	D. Katabi et al., Massachusetts Institute of Technology (MIT), 2002.	it is necessary to modify routers

UDP (User Datagram Protocol)

Elements of recommendation, standard describing this protocol:

RFC 768

Services provided by UDP

UDP is a user datagram protocol. It provides for the users to send messages without setup and tear down of a connection. It does not guarantee either delivery of messages, or keeping their order.

Header structure of the UDP segment

          bytes (20)   from IPv4 frame
              +--------------------------------+
            2 | Source Port                    |
              +--------------------------------+
            2 | Destination Port               |  Pl: 5004 (avt-profile-1) RTP media data
              +--------------------------------+
            2 | Length  (8 byte header + max.  |
              |          65527 byte data)      |
              +--------------------------------+
            2 | Checksum (header+data)         |
              +--------------------------------+

0000  70 cd 70 64 08 00 45 b8                           ..)../.`p.pd..E.
0010  00 c8 9e 11 00 00 3e 11 c1 48 98 42 f5 ab 98 42   ......>..H.B...B
0020  f5 e2 46 b0 13 8c 00 b4 00 00 80 00 3d 7c 00 28   ..F.........=|.(
0030  0c 40 d4 7b 28 21 f9 f7 fc ff 7b 7d fd 7c 7c 78   .@.{(!....{}.||x
0040  7a fc 7c 7a 7f fd fd 7e ff fd fb f8 f3 f0 f3 f7   z.|z...~........
0050  fc fb f8 fa 7d 76 78 79 77 77 78 7a 7b 7b 7c 7d   ....}vxywwxz{{|}
0060  7e fe fe fd f9 f7 fb fc f7 f9 fb fb fc fa fa fd   ~...............
0070  fd fb fb fa fd fe fd fc f8 fa fd fb fc fd fb fa   ................
0080  f8 f8 fe fe fa fa fb fd 7e 7b 79 78 79 7f fa fe   ........~{yxy...
0090  7c 7f 7e 7c 7e 7f 7b 7b 7c 7a 78 78 78 74 75 79   |.~|~.{{|zxxxtuy
00a0  7c 7c 7c 7f fc f9 fa fc fa f9 f9 fc 7d 7e fb f9   |||.........}~..
00b0  f7 f9 7f 7e fe fd fb fb fc fe 7d fc f9 fe 7e 7b   ...~......}...~{
00c0  7c 7b 7b 7a 79 7b 7a 7c 7c 79 7c 7a 79 7b 7c 7b   |{{zy{z||y|zy{|{
00d0  7b 7b 7b 7c 7b 7b                                 {{{|{{