Some basic TCP/IP concepts. Or should I say a 'simple' introduction into the workings of the Internet Protocol...
Due to popular demand and massive emails that I have received over the last month or so, I have decided to 'quickly' extend / expand information on this topic. This will be quite a bit more comprehensive and detailed (because you've asked for it!!!), so point your mouse to more TCP/IP and hope this satisfies your hunger for knowledge.
When you fire up the Web Browser and type in a location you almost always start with "HTTP". That stands for HYPERTEXT TRANSFER PROTOCOL. Occasionally you might start with "FTP". That stands for FILE TRANSFER PROTOCOL. Http and Ftp are just two of the many protocols that make up the TCP/IP protocol suite. TCP/IP stands for TRANSMISSION CONTROL PROTOCOL / INTERNET PROTOCOL. It is the underlying communications mechanism used by computers to communicate across the Internet. Protocols such as Http and Ftp use TCP/IP to achieve their communication.
Each time you click on highlighted text when using a web browser you are sending or receiving information across the Internet. This information starts life as part of the application program (your web browser in this case). It travels through HTTP and from there is passed to what is known as the Transport Layer of the TCP/IP protocol stack. At this point it is broken up into smaller, more manageable, parts and sent out across the network to a particular destination. At the destination the process is reversed. The small parts are put back together to form a message again and are passed to HTTP and eventually to a web server. TCP/IP is what breaks these network messages up and reassembles them. TCP/IP also directs the messages across the network from the source computer to the destination computer.
In this web page I will provide you with a bit of background information about TCP/IP. I hope that after reading this you can comfortably pick up a book or article on this topic and not be completely swamped with jargon and presumed knowledge.
Inevitable, Some Terminology
Before leaping into the deep end of protocols and networks here are a few words that you would have seen if you have read any network literature:
A packet is a general term referring to a message that is transported around a network. It usually refers to part of some larger message. We normally talk about data being broken into packets. Normally, a message is broken up into packets and sent at one end of a network connection. At the other end packets are reassembled into a message.
A router is a computer, or a special purpose device, which forwards packets between distinct networks. I now take this opportunity to give a plug for my company (NCSS Australia Pty Ltd) based here in Melbourne, Australia; web site NCSS WebSite (under construction) which specialises in wireless routers with a through-put of upto 8Mbps up to about 80Kms. We design, install and commission networks (LAN/WAN) for people who want remote interconnectivity to branch offices of their business or connectivity into the Internet.
A gateway platforms all the tasks a router does with the addition of being able to translate between protocols. For example, a gateway would sit between a TCP/IP network and a DECnet network and be able to pass messages between them. In other words, the gateway speaks more than one language.
By the way, in Acronyms with the letter 'P', 'P' usually stands for the word protocol (similarly, in acronyms starting with 'I', 'I' usually stands for the word Internet).
The Protocol Stack
As I mentioned above, a network message starts life as part of the application program. From there it passes through the application protocol (HTTP in the case of a web browser) and through several layers before being sent across the network. These layers are called the TCP/IP protocol stack. TCP/IP is divided into four layers, each responsible for a particular task. These four layers are:
The Application Layer (see diagram below)
Programs such as web browsers and telnet fit into this layer. This layer also includes the associated protocols such as HTTP.
The Transport layer
At this layer, the data is divided into smaller sized segments. The application programer can choose which Transport Layer protocol they will use. This is the layer that can provide end to end error detection and correction for messages.
The Internet layer
The coordination of packets across the network is handled in this layer. This is the layer concerned with reassembling received packets into a whole message and re-requesting or re-transmitting missing packets. It is at this layer that messages are broken up into datagrams to be sent and reassembled at the other end. This is the layer at which the Internet Protocol, IP, runs.
The Network Interface, (or Physical Layer)
This is where the ethernet, token ring, or other network hardware drivers are.
As a message passes to another layer, that layer may break the message into smaller sized segments and encapsulate the message inside its own envelope with special information pertaining to that layer. For example the Internet Layer will break a message into units called datagrams. Each datagram will contain a header with information such as length of the entire message, some checksums and the source and destination addresses. This header is encapsulating the message. The following diagram shows how data is encapsulated as it goes to the next layer in the protocol stack.
At the receiving end the process is reversed. The header is stripped from the data it encapsulates. The remaining data is then passed to the layer above until it reaches the application program. This is called demultiplexing.
In order to have a complete understanding of TCP/IP the datagram must be mentioned. The datagram is the base message type for IP. IP breaks all messages up into datagrams and only sends these around the network. The IP Layer is connectionless. What this means is that datagrams contain no information about the state of the connection. Each datagram has knowledge of only itself and not of any other datagrams that it may be associated with (in order to make a higher layer message). The datagram knows where it came from, where it is going, and any other information needed to deliver just itself.
The Transport Layer, TCP & UDP
As mentioned above the Transport Layer is where the programmer can make decisions about how they want their messages to be broken up to be sent across the net. There are two main Transport Layer protocols in use. These are Transmission Control Protocol, TCP, and User Datagram Protocol, UDP.
TCP is well suited to application programs, for example web browsers and FTP clients, that are interactive. TCP provides a reliable end to end connection. It ensures that messages reach their destination and are delivered in the correct sequence across the network. Every packet sent is marked with a sequence number and is acknowledged by the receiver using this sequence number.
UDP, on the other hand, provides an unreliable, connectionless service. This means that when the datagrams that make up a message are sent there is no verification that they have actually reached their destination. They may even arrive out of sequence. There are good reasons why programs would use such a service. UDP is less expensive than TCP. Sometimes the overhead of using TCP (the TCP header) is greater than the actual data being transmitted, the entire message could simply be re-transmitted. Domain Name System (DNS) resolvers mainly use UDP for name server queries. In general if no response is needed, or if a response to a query can be used as an acknowledgment (that a message arrived at the destination) then UDP can be used.
In order to achieve the demultiplexing mentioned above, the Internet Layer must know which protocol to send the datagram to. There is a field in the datagram header that refers to Transport Layer. It is the protocol field. This field is set by the sending protocol at the source host. At the destination host, IP reads this field and passes the packet to the same protocol in the Transport Layer as the one that sent it. For example, if the source of the message is TCP, then the protocol field in the datagram is set to TCP. At the destination, the Internet Layer knows to pass the datagram to the TCP protocol in the Transport Layer.
A port is the logical start or end point of a network connection. Programs such as FTP, telnet, mail and news have specially assigned port numbers. For example, a telnet server will always be on port number 23 and mail will always be on port number 25. This way the client program knows which port it should connect. The server program is usually started at the time the computer is turned on and will listen on its special port waiting for an incoming connection request from a client program
When a client is started up, say someone starting a telnet session, the client is assigned an ephemeral port number. This port number is taken from a range of numbers reserved for client programs and must be one which is not currently in use by any other client on that host.
Connecting Across the Net
A client program connecting across the network knows the address (or name) of the remote host it wants a connection to. It also knows the port number of the service it wants to use (as this is one of the predefined port numbers). With this information the client can then connect to the server on the port at the remote host. Upon connection, it gives the server program its own (the client) address and port number.
A TCP connection is defined by the combination of the sending IP address and port number and the receiving IP address and port number. This connection is also known as a socket pair.
TCP/IP is also responsible for routing datagrams. Routing is directing a datagram from its source, via intermediate computers and routers, to its destination . The Internet Layer is where the routing of datagrams is done. The routing tables in this layer contain knowledge of host and network destinations as well as default routes to send datagrams on. Routing is very simple. If a datagram is currently at a host then there are three possible places it can be sent:
1. directly to the destination host
2. to the next hop on the way to a known destination host or network
3. the default next hop
A datagram is only ever sent directly to the destination to the next hop router. That is, a router only knows to send a datagram one stop (the next hop) on its way to the destination.
The Tip of the Iceberg
There are a myriad of protocols that use (or sit on top of) TCP/IP. Two of these are ICMP, the Internet Control Message Protocol, which is used to send control messages around the net (like "destination unreachable" messages) and RIP, the Routing Information Protocol.
As well, there is an enormous amount of literature available on this topic. In particular the following books are worthwhile which can be bought from McGills here in Melbourne.
TCP/IP illustrated Volumes 1, 2 & 3, by W. Richard Stevens (Addison-Wesley),
TCP/IP Network Administration, by Craig Hunt (O'Reilly & Associates),
Please me and tell me if you liked my TCP information, or even if you have any contributing sites on similar info that I can include here.
Click here to go back to my Technical Page or click more TCP/IP for my next page.
This page has been accessed
Last revised: Saturday, 06 March 1999