Return to DNJ Online home page

 

The .NET Platform
Development Tools
COM & COM+
Data Access
Web Development
XML Technologies
Windows Servers
Wireless & Mobile
Security issues
Design & Process
Career Development
Analysis & Comment
Disposable Objects

Subscribe to our RSS feed to receive notification of new articles as they are published.

Events Diary
Software Update

About Us
Advertisers

 

You are not logged in: login here to access all areas.


Inside the Internet

Baffled by the acronyms that beset the Internet? Want to know what TCP/IP, SMTP, POP3 and HTTP actually mean? Matt Nicholson and Roy Tynan explain the more important mechanisms and protocols to give you an understanding of how the Internet works.

Author: Matt Nicholson

Last updated: Nov 1997

We already take the Internet pretty much for granted. We fire up Microsoft Internet Explorer and Outlook and away we go. But behind the rather pretty façade lies an extremely complex network, or is it just billions of packets of data, or is it millions of pages of information? Or is it based on something we’ve got used to hearing in slightly different circumstances lately - protocol?
    At the hardware level the Internet is a network of networks: a series of local area networks (LANs) connected to regional networks which are in turn connected to country-wide networks which link up to form a global network. At another level the Internet is a sea of data packets that flow continuously between their source and destination addresses. At yet another it is the structure of hyperlinked pages that we call the World Wide Web.
    Key to the whole process are a number of protocols to which software must conform if it is to take part, and in a sense it is these protocols that really define the Internet. Anyone developing software for the Internet needs an understanding of these protocols if they are to take full advantage of what the Internet has to offer.

The TCP/IP Stack
Before being let loose on the Net, a data file or email message gets split into small chunks or ‘packets’. Additional ‘header’ information is added to each packet to ensure it gets delivered to the right place and that the receiving application can re-assemble the packets into the original file. The information stored in the header conforms usually to two protocols called Transmission Control Protocol (TCP) and Internet Protocol (IP).
    Although all of the packets that make up a file travel between the same two machines, they may each take very different routes along the way. Send a file from Manchester to Milan, for example, and some packets may go via routing centres in Paris and Turin while others go via Amsterdam, Munich and Zurich. As a result they are likely to reach the destination out of sequence at the very least, and indeed some packets may never arrive at all. This is the sort of problem that TCP/IP is designed to solve.



As the above diagram shows, these two protocols sit between the application software and the hardware in a ‘stack’. Data coming from an application is processed first by the TCP layer and then by the IP layer before it even reaches the network. At the other end it travels back up through the TCP/IP stack before being presented to the receiving application in (hopefully) its original form.
    TCP is a relatively high-level, connection-based protocol. It is the TCP layer that divides the data into packets and takes responsibility for them being delivered intact and in the correct sequence. It achieves this by adding a header to each packet which contains, amongst other things, a checksum and a sequence number.
    IP, on the other hand, is a low-level, connectionless protocol. The IP layer is not concerned about whether the packets arrive at their destination in the correct sequence, or indeed at all. It is, however, the responsibility of the IP layer to ensure that packets are properly labelled and addressed, which it does through a further header that is attached to each packet. This contains a considerable amount of information, the most important items being:
   Source Address: Where the packet came from.
   Destination Address: Where the packet is going.
   Time to Live: Sets a limit to how long the packet can survive in cyberspace before it reaches its destination. This prevents packets clogging up the Internet if for some reason they can’t be delivered.
    At the receiving end, the IP layer strips off the IP header and presents what remains to the TCP layer. The TCP layer uses the checksum to check the validity of the data in the packet, and uses the sequence number to re-assemble the packets in the right order. It then sends an acknowledgement back to the transmitter once each packet has been successfully received. If the acknowledgement is not received within a certain time, then the TCP layer at the transmitting end automatically re-sends the packet.

IP Addressing
For this to work it is of course vital that every computer on the Internet has its own unique address. This is achieved by assigning each a 32-bit number, called its IP address. As 32-bit numbers are particularly cumbersome to handle, each of the four bytes in an IP address is usually represented by its decimal equivalent, separated by a dot. The IP address of the Pixel Factory’s Web server, for example, is 195.102.33.119.
    So where do these numbers come from? An IP address is actually made up of two components: the Network ID and the Host ID. Remember the Net is a network of networks, so the Network ID identifies the network, and is assigned by a central body such as InterNIC (the Internet Network Information Centre). The Host ID identifies an individual machine on a network (called an IP host) and is assigned by the network’s administrator.


Shown above is a small network consisting of four machines and a single router linking the network to the rest of the Internet. The network has a Network ID of 100.0.0.0 while the four machines have Host IDs of 1, 2, 3 and 4, so the full IP address of the third computer is 100.0.0.3. The router has in this case been assigned a Host ID of 254.
    The TCP/IP stack in each of the machines on the network has the router assigned as the network’s Default Gateway. When the TCP/IP stack encounters an IP packet with a destination address that has a Network ID other than 100.0.0.0 then it is sent to the router which forwards it, either to another network in the organisation or to an Internet Service Provider (an ISP is a company that provides Internet access to third-parties).
    Exactly how packets are handled as they travel the Net is well beyond the scope of this article. Suffice to say that it is the responsibility of the routers to direct each packet along the next stage of its journey. This is achieved through Routing Tables held within each router.

Domain Name System
In ‘dotted decimal’ form, IP addresses are not very memorable. Neither do they reveal much about the nature of the host computer. For this reason we use the Domain Name System (DNS) to map IP addresses to something more friendly.

At the heart of DNS is a shallow tree-like data structure that maps domain names to IP addresses, as shown above. At the top level of this tree are the three-character generic group codes such as com or edu and the two-character country codes such as uk or fr. Top level names are decided by central bodies such as InterNIC. Country codes are usually prefixed by a two-character generic group code such as co.uk.
    Below this come organisation names such as microsoft or pixel-factory. Organisation names can be anything of two characters or more, but must be registered with the appropriate central body (InterNIC or the Local Naming Committee) to ensure they are unique.
    Domain names are read bottom-up, so the domain name microsoft.com is managed by Microsoft and signifies the company’s internal network as available to the outside world. Domain names can be qualified further with individual host names. The name www.microsoft.com, for example, signifies Microsoft’s Web server.
    Domain names are translated into IP addresses by DNS servers, such as that provided with Windows NT Server. Any program that needs to resolve a Domain name makes a call to its nearest DNS server. If this can’t resolve the name then the request is passed on to whichever server handles that part of the tree.

Winsock
A whole range of application software make use of the TCP/IP stack, including Web browsers such as Internet Explorer, email packages, newsgroup readers, FTP (File Transfer Protocol) products, Telnet and so on. The software industry has therefore standardised on a programming interface called the Winsock API which provides TCP/IP services to Windows applications. It is made available by a dynamic-link library called WINSOCK.DLL which usually resides in the Windows directory. There are both 16 and 32-bit versions available and the interface has even been implemented as an ActiveX control.
    The ‘sock’ part of the name derives from the concept of a ‘socket’, a term that has its roots in the UNIX world. As you can imagine, a host computer providing FTP, Web or email services has to be able to cope with many connections at the same time. This is achieved by allocating each connection a numbered socket. As each new connection is made it is allocated a new socket number.

PPP and SLIP
Point-to-Point Protocol (PPP) and Serial Line Internet Protocol (SLIP) are two additional protocols governing asynchronous communication over a serial link. A PC needs to be running either PPP or SLIP before it can support a TCP/IP connection over a modem. PPP and SLIP effectively sit between the serial port driver and the TCP/IP stack in your PC.
    Of the two, PPP is the more popular. Both Windows 95 and Windows NT support PPP which means computers running Windows can dial in to any server complying with the PPP standard. PPP compliance also means Windows NT Server can provide network access to any remote access software that conforms to PPP.
    SLIP (Serial Line Internet Protocol) is an older standard found in UNIX environments. Windows NT RAS (Remote Access Server) can be configured as a SLIP client which means that Windows NT users can dial in to SLIP servers, however PPP is more dominant in NT environments.

SMTP, POP3 and MIME
Now we can move up above the TCP/IP stack to look at some of the applications which make use of the Internet. Perhaps the most prevalent is sending messages by email.
    Almost all email travels around the Internet using the Simple Mail Transfer Protocol (SMTP). To send a message using SMTP, the sending machine establishes a connection with the receiving machine. Once the receiving machine acknowledges it is ready, the sender sends the message, starting with the sender’s and receiver’s email address followed by the body of the message itself. The receiver acknowledges each portion as it arrives. Once the final portion has been sent and its receipt acknowledged, the sender sends a Quit command and the receiver closes the connection.
    An important point to note about SMPT is that although the sender initiates the transfer and takes control of the operation, the message does not get transferred unless the receiver acknowledges its readiness. SMPT is not much use in dial-up situations where the receiving machine is only connected to the Internet on an intermittent basis. This is where the Post Office Protocol (POP) comes in.

POP is designed for ‘store and forward’ systems where incoming mail is stored in a POP server mailbox, perhaps at the receiver’s ISP, ready for the receiver to download later. Mail is transferred to a POP server through SMTP, in the normal way. To collect mail, the receiver uses a POP client such as Microsoft Outlook 97 or Exchange Client to contact the POP server. The receiver can then request details of how much mail is in his or her mailbox, ask that it be transferred, or delete it from the server mailbox.
    SMTP and POP work together, but POP is controlled by the receiver rather than the sender. Microsoft Exchange Server 5 can act as both an SMTP and a POP3 server. POP3 simply indicates version 3 of the protocol.
    One big limitation of SMTP is that it is a 7-bit system. SMTP is a hangover from earlier computer days when the first seven bits were used for the character and the eighth bit as a parity bit to check that the rest had not got corrupted. Such a system can only be used to send straight ASCII text messages, which is not a lot of use in today’s modern multimedia age.
    So the Multi-purpose Internet Mail Extension (MIME) was introduced. This provides a protocol for encoding 8-bit binary data in a 7-bit format, suitable for transfer by SMTP. It does this using a variety of algorithms, one of the most common being Base64 encoding.
    MIME allows you to embed binary files within text messages. If your mail reader can understand MIME then it will extract these files and save them on your hard disk. If not, then you will see something like this:

Here is that file you wanted!

    =_NextPart_000_01BC6078.68C3B690 MIME Version: 1.0
Content-Type: application/msword;
name=”security.doc”
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename=”security.doc”

0M8R4KGxGuEAAAA......

    =_NextPart_000_01BC6078.68C3B690 ...

The stream of characters just before the end is the encoded data. As far as SMTP is concerned, the encoded attachment is just another part of the mail message.

FTP
As the name suggests, File Transfer Protocol (FTP) is concerned with the transfer of files from one computer to another. In many ways, FTP is similar to SMTP. The FTP client software initiates a call to the FTP server which responds by requesting the user’s name and password. Most FTP servers support the user name ‘anonymous’, allowing you limited access as a guest, and will supply anonymous users with a suitable password. Hence the term ‘anonymous FTP’.
    Once your logon has been accepted you can browse through the server’s directory and locate the file you require, although where you can go and what you can do depends on your access rights. At any point the client can initiate a file transfer between client and server, at which point a second TCP connection is established through which the data is transferred. This second connection is maintained until the file is transferred, at which point it is closed. FTP also supports commands that allow the client to perform standard directory operations on the server, such as renaming or deleting files and creating or removing directories.

HTTP and URLs
The final protocol to cover here is the HyperText Transfer Protocol (HTTP), which is used to transmit the HTML documents that make up the World Wide Web. Whenever you make a request to view a Web page, your Web browser sends out a GET command with parameters stating what file is required from which Web server. The server, once it is found, responds with a message containing a success or failure code, some further header information and any data that was successfully retrieved.
    The file required is specified using a Universal Resource Locator (URL). A URL provides a unique reference to any resource available on the Internet. A typical (and real) URL address is http://www.pixel-factory.com/software/software.html

It is made up as follows:

http:// - The name of the protocol;
www.pixel-factory.com - The full domain name of the Web server;
/software/ - A directory on the server;
software.html - A file in the directory.

All Web pages are prefixed with http:// although not all Web servers are labelled www.
    This article attempts to describe the most important mechanisms and protocols that make the Internet work. We have necessarily glossed over the more complicated aspects of some of the protocols in the interests of clarity, so please regard it as an introduction only!

Send to a friend

Top of page

Click here for our Privacy Statement. Copyright © Matt Publishing. All rights reserved. No part of this site may be reproduced without the prior consent of the copyright holder.

Send to a friend

IP Addressing

Domain Names

Winsock

PPP and SLIP

SMTP, POP3 and MIME

FTP

HTTP and URLs