C SC 100 Lecture Notes
Spring 2008
Pete Sanderson
[
previous
|
schedule
|
next
]
major resource: Computer Confluence (Complete), Seventh Edition, Beekman and Quinn,
Pearson Prentice Hall, 2006
Chapter 9, The Evolving Internet (part 1)
The Internet
A global network of networks
Incredibly brief history:
- 1960s: begun under DARPA (Defense Advanced Research Projects Agency) in
DoD, to provide military facilities with resilient communications
- 1970s: most of the communication protocols were developed and standardized
as growth spread naturally to academia.
- 1980s: widespread academic use for email, newsgroups, document transfer,
and first major "worm"
- 1990s: the WWW sprung forth and all hell broke loose.
- diagram of 1981 ARPAnet to illustrate explosive growth since then
Supports many client/server services: WWW, email, ftp, telnet, newsgroups, audio/video
streaming, etc. Supports many peer-to-peer services also
Protocols
The word protocol means the set of rules that govern communication
(think of the protocol for making a phone call).
All network communication must follow established protocols.
Different services use different protocols: WWW, email, ftp, telnet, newsgroups,
IM, etc
The Internet is a point-to-point packet-switched network
- this means that data are
passed from router to router to form a path through the Internet forest
that lies between sender and receiver.
- like all the switches that are set to
connect a phone call.
- Each message is cut up into packets and each packet makes its way through
the Internet independent of the others.
- packets may arrive at their destination in the wrong order (or damaged or not arrive at all!)
The opposite of point-to-point is a broadcast network (like radio
or TV) in which the data are broadcast to many computers at once.
Most Internet protocols are designed to follow a client-server relationship
between sender and receiver.
Client/server means:
- service is always available at "well known" address known as server.
- client can be anywhere and its address is not known in advance to server
- client issues requests to server, using the accepted protocol for that
service.
- server responds to request, again according to protocol for that service
- one server computer can provide multiple services, each through a
different "port"
Anyone can provide a new service by
- designing custom communication protocol
- selecting an unused "port" (there are thousands, since they are not physical)
- designing and implementing client and server software
- distributing client software to potential users
- running the server software at well-known address
Distinguish peer-to-peer from client/server
- WWW is client/server
- peer-to-peer means that user's computers communicate directly with each
other without any centralized server
- peer-to-peer (P2P) transmission protocols underlay most free file sharing
services (e.g. KaZaA)
- Many are derived from Gnutella, a ground-breaking 2P2 file sharing protocol
- LimeWire and BearShare services are based on Gnutella
- Another well-known P2P protocol is BitTorrent (see BitTorrent.com)
- Don't be fooled by the term BitTorrent client, the software you install for downloading. Although
called "client" as in client-server, this is still P2P.
- some free file sharing services have been shut down as illegal. Notorious include
- Napster in 2001 (not pure P2P)
- Grokster in 2005 (not to be confused with Grokker the search engine!)
- details below
- their legal problems concern copyright violation of content transmitted
- P2P itself is completely legal and has many legitimate uses
- Original Napster was a combination of client-server and peer-to-peer. This was its
legal downfall (in 2001)
- client connects to server and requests song
- server finds match from among active peers (other clients)
- server responds to client with peer's address
- client downloads file directly from peer
- Napster has been reborn as a paid subscription service
- Grokster was shut down following a unanimous Supreme Court ruling in 2005
- pure P2P, so no server problems
- justices walked a fine line, preserving Sony Betamax precedent "significant
non-infringing uses"
- Grokster through its advertising encouraged infringing use
A technology called TCP/IP manages all Internet communication!
- TCP = Transport Control Protocol
- IP = Internet Protocol
- TCP handles the client-server protocols and the P2P protocols.
- IP handles the point-to-point packet switching (above) and Internet addressing (next topic)
- IP works at a lower level -- TCP depends on IP to deliver its packets
- If TCP/IP were a pizza service, TCP would control how you order the pizza and IP would deliver it!
Internet addresses and names
Every computer on the Internet has a unique IP address
- 4 numbers separated by dot (.) ("dotted decimal")
- each number is in range 0-255 (8 bits)
- example: 205.133.226.114
- IP addresses are allocated by the Internet Assigned Numbers Authority (details at
www.iana.org)
- internally, stored as 32 bit number
- You can use the numbers directly but they are hard to remember
- names are easier to remember!
- The machine at 205.133.226.114 is better known as www.otterbein.edu
- There is a service called NSLOOKUP that will give you the IP address for a given name.
See http://www.kloth.net/services/nslookup.php
- routers use only the number, though
- translation from name to number is done by the domain name system (DNS)
Every IP address can have a name (some do not)
- the name has at least 3 components, separated by dots (.) in this order:
- machine or server name
- organization name
- top-level domain name
- note the hierarchy of those components. Similar to hierarchy in person's address:
- house number
- street
- city
- state
- country
- read in reverse order, each component narrows down the location
- machine name
- is chosen by organization
- no outside permission required.
- must be unique within the organization
- organization name
- is chosen by organization
- authorized, ultimately, by the Internet Corporation for Assigned Names and Numbers (ICANN)
www.icann.org
- must be unique within the top-level domain
- top-level domain name
- these are fixed
- original list: .edu, .com, .gov, .mil, .net, .org
- some have been added, like .name, .biz, .info, and more
- there are also 2-letter country codes that may be appended, like .us for United States,
.jp for Japan, and so on.
- when you purchase a domain name, you get the combined organization and top-level domain name
- DNS can have multiple entries "aliases" for a given address (e.g. www.amazon.com and amazon.com)
- all the different Internet services use IP addresses and DNS
WWW - an Internet service
Most well-known Internet service, but is only one of many Internet services.
Documents are transmitted based on HTTP HyperText Transfer Protocol
HTTP and web browser first developed in the early 1990s
Tim Berners-Lee, then at CERN (European Laboratory for Nuclear Research) in Switzerland
He is still instrumental in the WWW Consortium or W3C (www.w3.org)
Web documents identified by Universal Resource Locator, URL.
URL has this general structure: protocol://hostname/pathname
- protocol identifies the specific protocol needed to transfer the
document.
- usually HTTP
- sometimes it is FTP (File Transfer Protocol)
- sometimes it is FILE (which means it is a file on local network or on
your computer)
- hostname identifies the web server which stores the document.
- hostname consists of (optional) server name followed by period (.) followed
by domain name.
- server name is usually "www"
- Domain name is 2 components, separated by period(.).
- first component identifies the organization ("osu", "otterbein", "yahoo")
- second component is top-level domain (TLD): identifies type of organization.
".com", ".edu" etc
- There is a central registry of domain names;controlled by the Internet
Corporation for Assigned Names and Numbers or ICANN (www.icann.org).
- anyone can register an unused domain name for a small fee.
- pathname identifies the document's file name along with path through
folder hierarchy to locate it
- components in pathname are separated by "/" and the hierarchy is read
from left to right
- if pathname is omitted or ends with a folder name, browser will try
several standard file names ("index.html", "default.htm", etc)
Example: http://faculty.otterbein.edu/PSanderson/index.html
Protocol is http
Hostname is faculty.otterbein.edu (domain name is otterbein.edu
and faculty is the server)
Pathname is PSanderson/index.html
(PSanderson is a folder that contains file index.html)
After fetching a document, the browser will display it.
- When sending file, server includes a description of the file type (text,
image, etc)
- Browser uses this description to decide how to display the file
- Basic web page is encoded using HyperText Markup Language (HTML)
- You can view the HTML of a web page from your browser -- View menu, select
Source (Internet Explorer) or Page Source (FireFox)
- HTML Text formatting is indicated by "tags" which are normally specified
in pairs around the applicable text.
- For example, to display the word "finally" in bold, the HTML is <B>finally</B>
- For example, to center the sentence "Are you finally home?", the
HTML is <CENTER>Are you <B>finally</B> home?</CENTER>
- We will examine HTML more closely next class
Email - an Internet service
Components are:
User agent (mail reader)
Mail server
·
holds received messages (mailbox)
·
sends messages (outgoing message queue)
Protocol between mail servers: SMTP
Protocol between user agent and mail server:
POP, IMAP, HTTP
Electronic trail of email from Alice to Bob:
- Alice invokes user agent to compose and send message
- Alice's user agent sends message to mail server
message queue
- Alice's mail server becomes client in SMTP connection
with Bob's mail server (if latter is not available, hold in queue and
try later).
- Alice's mail server transmits message to Bob mail
server via SMTP
- Bob's mail server puts message in Bob's mailbox
- Bob's invokes user agent (eventually) to read
message
SMTP : Simple Mail Transfer Protocol
- protocol between mail servers
- If done properly, you can fake a mail server into believing that you are
another mail server.
- This is possible because SMTP is exactly that: Simple.
- Security was not a concern in the early Internet days because only a small
number of trusted individuals and organizations used it.
- MIME : Multipurpose Internet Mail Extensions
- needed because SMTP assumes email message is one textual piece.
- SMTP can't handle binary data (e.g. images, audio, etc)
- SMTP can't handle attachments
- MIME at sender "packages" the message and all attachments into one piece
and encodes binary files in ASCII.
- receiver extracts the components and decodes ASCII back into binary
[
Pete Sanderson
|
Math Sciences server
|
Math Sciences home page
|
Otterbein
]
Last updated:
Pete Sanderson (PSanderson@otterbein.edu)