by Chris Woodford. Last updated: December 10, 2013.
You've heard of a transportation nightmare called gridlock—where a city's streets become so packed with cars and trucks that no-one can go anywhere and the entire place grinds to a halt. But have you heard how the Internet can lock up the same way? The Net is based on a distributed architecture with no single, central point of control, but lots of similar systems working in parallel; that's why it works incredibly efficiently most of the time. Even so, failures of important cables (like the ones that link countries and continents under the sea), systematic attacks by criminal hackers, or sudden surges in demand all have the power to bring the Net to its knees. As more and more people go online, the chances of Internet gridlock grow steadily greater. What's the solution? One answer is for people to make better use of the Net's distributed architecture using a superbly clever way of sharing files known as BitTorrent (or, to give it its full name, the BitTorrent protocol). Let's take a closer look at how it works!
Photo: "Traffic" congestion could drive the Net toward gridlock. Photo by Warren Gretz courtesy of US Department of Energy/National Renewable Energy Laboratory.
How client-server downloading works
Photo: KTorrent: A BitTorrent "client" for the Linux operating system.
If you've read our article about how the Internet works, you'll know that it uses two kinds of computers linked together:
- Servers are the big powerful machines that hold web pages, downloadable MP3 music files, videos, and all the rest.
- Clients are the small machines we have in our homes and offices that download data from servers.
Browsing a website involves a lengthy conversation between a client and a server: a program called a web browser, running on your computer, sends repeated requests for bits of the websites (individual web pages and the text, photo, and multimedia content they contain) to a server, which does its best to oblige.
This system works well most of the time, but if you think about it a little, you can immediately see there's a problem. Suppose you have a server that's hosting some really popular file: the latest MP3 music track from world-beating band The Uber Popular Sharks. Let's say the Sharks release their track one Monday morning after a lengthy marketing campaign telling the world that's exactly what they'll be doing.
The server hosting the Sharks' track is going to be besieged with traffic from all over the world at exactly the same time. Even if it doesn't grind to a halt, it's going to run incredibly slowly so it could take each person ages to download the track. Worse, all that music is going to congest parts of the Internet linked to the Sharks' server. Suppose the server is based on a small island like Fiji. Chances are, the whole of Fiji's Internet service will be severely degraded just because lots of people are downloading the Sharks track from a server nearby! The whole exercise is also going to cost the Sharks an absolute fortune in website hosting fees: the more data people download from their server, the more bandwidth the Sharks will use and the more they'll have to pay to their ISP (Internet Service Provider)—which is pretty crazy if they're a small band without much money.
You can see how ridiculous the whole thing gets if you consider what happens if a large number of Sharks fans all live near one another on the opposite side of the world in, say, Seattle. Vast amounts of Internet data is going to be steaming over the Internet between Fiji and Seattle, but because everyone is downloading the same track, it's going to be pretty much the same data making that same stupidly long journey over and over again. Sounds crazy? Wouldn't it be much more sensible if one person in Seattle downloaded the Sharks track and then shared it with all the other Sharks fans who live nearby? Roughly speaking, that's the idea behind BitTorrent.
How BitTorrent works using peer-to-peer
Since there's no single, central computer controlling the Internet and (in theory) every computer that's online is connected indirectly to every other one, it should be possible for any two computers to share information by communicating directly—and it is! This is called peer-to-peer (P2P) communication and it's used by some of the more popular instant messaging (IM) chat programs (as well as controversial file-sharing programs, which earned themselves a bad name when people started using them to share copyright music tracks illegally).
Photo: Transmission: Another BitTorrent "client" for the Linux operating system.
BitTorrent is a protocol (a set of rules that different computer systems agree to use) based on P2P that can be used to share large files very efficiently. Suppose the Sharks decide they want to use BitTorrent. They take their music track and make it available on their computer as a file called a torrent. The computer that hosts the original file, in its entirety, is called a seed and it splits the file up into lots of pieces.
Anyone who wants the file uses a program called a BitTorrent client to request it from a seed. The client is sent one of the pieces and gets all the remaining pieces, over a period of time, from other people's computers through P2P communication. At any given moment, each computer is downloading some parts of the file from some of these peers and uploading other parts of the file to other peers. All the computers cooperating in this way at any time are called a swarm. The more popular a file is, the more computers there are in the swarm and the quicker the process is all round.
Share and share alike is the ethos behind BitTorrent so, when people have finished downloading a file, they are encouraged to stay online for a while so they can continue uploading the file to others in the swarm—an activity known as seeding. Quitting from a swarm the minute your download is complete, without seeding, is a selfish activity that's earned itself the nickname leeching! If everyone leeched, BitTorrent wouldn't work at all.
Although BitTorrent is a decentralized P2P process very different from old, client-server-type downloading, there has to be some sort of order and control. Someone has to keep track of which computers have which bits of the file. This works in different ways with different BitTorrent clients. Some rely on centralized computers called trackers which, as their name suggests, keep track of where all the pieces of the file can be located at any moment. There is also a more decentralized version of BitTorrent where the clients manage the tracking process among themselves (sometimes called trackerless torrents or distributed torrents).