The .NET Platform
Development Tools
COM & COM+
Data Access
Web Development
XML Technologies
Windows Servers
Wireless & Mobile
Security issues
Design & Process
Career Development
Analysis & Comment
Disposable Objects
You are not logged in: login here to access all areas.
Baffled by the acronyms that beset the Internet? Want to know what TCP/IP, SMTP, POP3 and HTTP actually mean? Matt Nicholson and Roy Tynan explain the more important mechanisms and protocols to give you an understanding of how the Internet works.
Author: Matt Nicholson
Last updated: Nov 1997
We already take the Internet pretty much for granted. We fire
up Microsoft Internet Explorer and Outlook and away we go. But behind the rather pretty
façade lies an extremely complex network, or is it just billions of packets of data, or
is it millions of pages of information? Or is it based on something weve got used to
hearing in slightly different circumstances lately - protocol? The TCP/IP Stack IP Addressing Domain Name System At the heart of DNS is a shallow tree-like data structure that maps
domain names to IP addresses, as shown above. At the top level of this tree
are the three-character generic group codes such as com or edu and the two-character
country codes such as uk or fr. Top level names are decided by central bodies such as
InterNIC. Country codes are usually prefixed by a two-character generic group code such as
co.uk. Winsock PPP and SLIP SMTP, POP3 and MIME POP is designed for store and forward systems where
incoming mail is stored in a POP server mailbox, perhaps at the receivers ISP, ready
for the receiver to download later. Mail is transferred to a POP server through SMTP, in
the normal way. To collect mail, the receiver uses a POP client such as Microsoft Outlook
97 or Exchange Client to contact the POP server. The receiver can then request details of
how much mail is in his or her mailbox, ask that it be transferred, or delete it from the
server mailbox. Here is that file you wanted!
=_NextPart_000_01BC6078.68C3B690 MIME Version: 1.0 0M8R4KGxGuEAAAA......
=_NextPart_000_01BC6078.68C3B690 ... The stream of characters just before the end is the encoded data. As far as SMTP is
concerned, the encoded attachment is just another part of the mail message. FTP HTTP and URLs It is made up as follows: http:// - The name of the protocol; All Web pages are prefixed with http:// although not all Web servers are labelled www.
At the hardware level the Internet is a network of networks: a series
of local area networks (LANs) connected to regional networks which are in turn connected
to country-wide networks which link up to form a global network. At another level the
Internet is a sea of data packets that flow continuously between their source and
destination addresses. At yet another it is the structure of hyperlinked pages that we
call the World Wide Web.
Key to the whole process are a number of protocols to which software
must conform if it is to take part, and in a sense it is these protocols that really
define the Internet. Anyone developing software for the Internet needs an understanding of
these protocols if they are to take full advantage of what the Internet has to offer.
Before being let loose on the Net, a data file or email message gets split into small
chunks or packets. Additional header information is added to each
packet to ensure it gets delivered to the right place and that the receiving application
can re-assemble the packets into the original file. The information stored in the header
conforms usually to two protocols called Transmission Control Protocol (TCP) and Internet
Protocol (IP).
Although all of the packets that make up a file travel between the same
two machines, they may each take very different routes along the way. Send a file from
Manchester to Milan, for example, and some packets may go via routing centres in Paris and
Turin while others go via Amsterdam, Munich and Zurich. As a result they are likely to
reach the destination out of sequence at the very least, and indeed some packets may never
arrive at all. This is the sort of problem that TCP/IP is designed to solve.

As the above diagram shows, these two protocols sit between the application
software and the hardware in a stack. Data coming from an application is
processed first by the TCP layer and then by the IP layer before it even reaches the
network. At the other end it travels back up through the TCP/IP stack before being
presented to the receiving application in (hopefully) its original form.
TCP is a relatively high-level, connection-based protocol. It is the
TCP layer that divides the data into packets and takes responsibility for them being
delivered intact and in the correct sequence. It achieves this by adding a header to each
packet which contains, amongst other things, a checksum and a sequence number.
IP, on the other hand, is a low-level, connectionless protocol. The IP
layer is not concerned about whether the packets arrive at their destination in the
correct sequence, or indeed at all. It is, however, the responsibility of the IP layer to
ensure that packets are properly labelled and addressed, which it does through a further
header that is attached to each packet. This contains a considerable amount of
information, the most important items being:
Source Address: Where the packet came from.
Destination Address: Where the packet is going.
Time to Live: Sets a limit to how long the packet can
survive in cyberspace before it reaches its destination. This prevents packets clogging up
the Internet if for some reason they cant be delivered.
At the receiving end, the IP layer strips off the IP header and
presents what remains to the TCP layer. The TCP layer uses the checksum to check the
validity of the data in the packet, and uses the sequence number to re-assemble the
packets in the right order. It then sends an acknowledgement back to the transmitter once
each packet has been successfully received. If the acknowledgement is not received within
a certain time, then the TCP layer at the transmitting end automatically re-sends the
packet.
For this to work it is of course vital that every computer on the Internet has its own
unique address. This is achieved by assigning each a 32-bit number, called its IP address.
As 32-bit numbers are particularly cumbersome to handle, each of the four bytes in an IP
address is usually represented by its decimal equivalent, separated by a dot. The IP
address of the Pixel Factorys Web server, for example, is 195.102.33.119.
So where do these numbers come from? An IP address is actually made up
of two components: the Network ID and the Host ID. Remember the Net is a network of
networks, so the Network ID identifies the network, and is assigned by a central body such
as InterNIC (the Internet Network Information Centre). The Host ID identifies an
individual machine on a network (called an IP host) and is assigned by the networks
administrator.

Shown above is a small network consisting of four machines and a single
router linking the network to the rest of the Internet. The network has a Network ID of
100.0.0.0 while the four machines have Host IDs of 1, 2, 3 and 4, so the full IP address
of the third computer is 100.0.0.3. The router has in this case been assigned a Host ID of
254.
The TCP/IP stack in each of the machines on the network has the router
assigned as the networks Default Gateway. When the TCP/IP stack encounters an IP
packet with a destination address that has a Network ID other than 100.0.0.0 then it is
sent to the router which forwards it, either to another network in the organisation or to
an Internet Service Provider (an ISP is a company that provides Internet access to
third-parties).
Exactly how packets are handled as they travel the Net is well beyond
the scope of this article. Suffice to say that it is the responsibility of the routers to
direct each packet along the next stage of its journey. This is achieved through Routing
Tables held within each router.
In dotted decimal form, IP addresses are not very memorable. Neither do they
reveal much about the nature of the host computer. For this reason we use the Domain Name
System (DNS) to map IP addresses to something more friendly.
Below this come organisation names such as microsoft or pixel-factory.
Organisation names can be anything of two characters or more, but must be registered with
the appropriate central body (InterNIC or the Local Naming Committee) to ensure they are
unique.
Domain names are read bottom-up, so the domain name microsoft.com is
managed by Microsoft and signifies the companys internal network as available to the
outside world. Domain names can be qualified further with individual host names. The name
www.microsoft.com, for example, signifies Microsofts Web server.
Domain names are translated into IP addresses by DNS servers, such as
that provided with Windows NT Server. Any program that needs to resolve a Domain name
makes a call to its nearest DNS server. If this cant resolve the name then the
request is passed on to whichever server handles that part of the tree.
A whole range of application software make use of the TCP/IP stack, including Web browsers
such as Internet Explorer, email packages, newsgroup readers, FTP (File Transfer Protocol)
products, Telnet and so on. The software industry has therefore standardised on a
programming interface called the Winsock API which provides TCP/IP services to Windows
applications. It is made available by a dynamic-link library called WINSOCK.DLL which
usually resides in the Windows directory. There are both 16 and 32-bit versions available
and the interface has even been implemented as an ActiveX control.
The sock part of the name derives from the concept of a
socket, a term that has its roots in the UNIX world. As you can imagine, a
host computer providing FTP, Web or email services has to be able to cope with many
connections at the same time. This is achieved by allocating each connection a numbered
socket. As each new connection is made it is allocated a new socket number.
Point-to-Point Protocol (PPP) and Serial Line Internet Protocol (SLIP) are two additional
protocols governing asynchronous communication over a serial link. A PC needs to be
running either PPP or SLIP before it can support a TCP/IP connection over a modem. PPP and
SLIP effectively sit between the serial port driver and the TCP/IP stack in your PC.
Of the two, PPP is the more popular. Both Windows 95 and Windows NT
support PPP which means computers running Windows can dial in to any server complying with
the PPP standard. PPP compliance also means Windows NT Server can provide network access
to any remote access software that conforms to PPP.
SLIP (Serial Line Internet Protocol) is an older standard found in UNIX
environments. Windows NT RAS (Remote Access Server) can be configured as a SLIP client
which means that Windows NT users can dial in to SLIP servers, however PPP is more
dominant in NT environments.
Now we can move up above the TCP/IP stack to look at some of the applications which make
use of the Internet. Perhaps the most prevalent is sending messages by email.
Almost all email travels around the Internet using the Simple Mail
Transfer Protocol (SMTP). To send a message using SMTP, the sending machine establishes a
connection with the receiving machine. Once the receiving machine acknowledges it is
ready, the sender sends the message, starting with the senders and receivers
email address followed by the body of the message itself. The receiver acknowledges each
portion as it arrives. Once the final portion has been sent and its receipt acknowledged,
the sender sends a Quit command and the receiver closes the connection.
An important point to note about SMPT is that although the sender
initiates the transfer and takes control of the operation, the message does not get
transferred unless the receiver acknowledges its readiness. SMPT is not much use in
dial-up situations where the receiving machine is only connected to the Internet on an
intermittent basis. This is where the Post Office Protocol (POP) comes in.
SMTP and POP work together, but POP is controlled by the receiver
rather than the sender. Microsoft Exchange Server 5 can act as both an SMTP and a POP3
server. POP3 simply indicates version 3 of the protocol.
One big limitation of SMTP is that it is a 7-bit system. SMTP is a
hangover from earlier computer days when the first seven bits were used for the character
and the eighth bit as a parity bit to check that the rest had not got corrupted. Such a
system can only be used to send straight ASCII text messages, which is not a lot of use in
todays modern multimedia age.
So the Multi-purpose Internet Mail Extension (MIME) was introduced.
This provides a protocol for encoding 8-bit binary data in a 7-bit format, suitable for
transfer by SMTP. It does this using a variety of algorithms, one of the most common being
Base64 encoding.
MIME allows you to embed binary files within text messages. If your
mail reader can understand MIME then it will extract these files and save them on your
hard disk. If not, then you will see something like this:
Content-Type: application/msword;
name=security.doc
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename=security.doc
As the name suggests, File Transfer Protocol (FTP) is concerned with the transfer of files
from one computer to another. In many ways, FTP is similar to SMTP. The FTP client
software initiates a call to the FTP server which responds by requesting the users
name and password. Most FTP servers support the user name anonymous, allowing
you limited access as a guest, and will supply anonymous users with a suitable password.
Hence the term anonymous FTP.
Once your logon has been accepted you can browse through the
servers directory and locate the file you require, although where you can go and
what you can do depends on your access rights. At any point the client can initiate a file
transfer between client and server, at which point a second TCP connection is established
through which the data is transferred. This second connection is maintained until the file
is transferred, at which point it is closed. FTP also supports commands that allow the
client to perform standard directory operations on the server, such as renaming or
deleting files and creating or removing directories.
The final protocol to cover here is the HyperText Transfer Protocol (HTTP), which is used
to transmit the HTML documents that make up the World Wide Web. Whenever you make a
request to view a Web page, your Web browser sends out a GET command with parameters
stating what file is required from which Web server. The server, once it is found,
responds with a message containing a success or failure code, some further header
information and any data that was successfully retrieved.
The file required is specified using a Universal Resource Locator
(URL). A URL provides a unique reference to any resource available on the Internet. A
typical (and real) URL address is http://www.pixel-factory.com/software/software.html
www.pixel-factory.com - The full domain name of the Web
server;
/software/ - A directory on the server;
software.html - A file in the directory.
This article attempts to describe the most important mechanisms and
protocols that make the Internet work. We have necessarily glossed over the more
complicated aspects of some of the protocols in the interests of clarity, so please regard
it as an introduction only!
Click here for our Privacy Statement. Copyright © Matt Publishing. All rights reserved. No part of this site may be reproduced without the prior consent of the copyright holder.