Lecture 3: Intro to Application Layer and HTTP

C SC 481.20 Lecture 3: Intro to Application Layer and HTTP
major resource: Computer Networking (4th Edition), Kurose and Ross, Addison Wesley, 2008

[ previous | schedule | next ]

Introducing the Application Layer

Our focus will be on

Networked applications structured for the Internet
Application layer of Internet protocol stack
Applications that rely on Transport layer services (TCP, UDP)
Network edge issues (not network core)
Client-server paradigm
Peer-to-peer (P2P) paradigm
Hybrid client-server and P2P

Some basic questions

What are some example networked applications?
What kinds of devices do networked applications run on?
How much do you need to know about the Internet's structure and protocols, to write a networked application?
What is the difference between a networked application and Application Layer protocols?

Client-Server paradigm

application consists of two major software components: client, and server
Name a couple such applications
What are the characteristics of a server?
What are the characteristics of a client?
Put another way, what concerns does a server have that a client does not? (consider scaleability and availability)

P2P paradigm

application consists of one major software component: peer
Name a couple such applications
Peers take on characteristics of both client and server when communicating with each other
Comment on scaleability of P2P system versus client-server

Client-Server vs. P2P

Advantages of client-server over P2P?
Advantages of P2P over client-server?

Hybrid of Client-Server and P2P

application consists of two major software components: servers and peers
Name a couple such applications -- structure not obvious from user view
Peers act as both client and server when communicating with each other
Peers act as client when communicating with server

Inter-process communication through sockets

process is running program
How does a client process communicate with server process to make request?
How does a server process communicate with client process to give response?
Both happen through sockets, the API to the transport layer
Sockets are message-passing mechanism between application and transport layers
Also applies to P2P because it is fundamentally client-server-based, client initiating request and server responding
Sockets API (library) is fairly easy to use in C, C++, Java and other languages
Java provides high-level support (more later)
How to address the process at the other end?

IP address (32 bits, 128 with IPv6), network layer, identifies machine
port number integer that identifies the process
There could be several Internet processes running at that IP address
Each service has standard port number, so sender knows what to use (e.g. 80 for HTTP)

Application layer protocols

recall that protocol includes message formats, sequences, and processing
Some protocols (e.g. HTTP) are public standards, see e.g. http://www.ietf.org/rfc.html
Other protocols (e.g. Skype) are proprietary
Consider advantages and disadvantages of each approach
Analogy to situation in 1980s with open IBM-PC architecture versus closed Macintosh architecture
Analogy to situation today with open MP3 format and closed iTunes DRM (digital rights management)

Transport Layer services (Application Layer protocols are built on them)

You will learn much more about this in the next chapter
TCP provides totally reliable connection-oriented stream service based on packet-switched network
UDP provides unreliable connectionless service based on same network
Neither provides throughput guarantees, although TCP has flow (local) and congestion (global) control
Neither provides timing guarantees, although TCP has flow and congestion control
Neither provides security through encryption or other means
SSL (Secure Sockets Layer) provides security as intermediate between socket and TCP
Consider transport needs of applications

What applications require reliability but not throughput, timing or security?
What applications require reliability and security but not throughput or timing?
What applications require throughput but not the other three?
What applications require throughput and timing but not reliability or security?

Why use UDP?

Application layer protocol: HTTP

HTTP: HyperText Transfer Protocol
Hypertext origins with Vannavar Bush and Ted Nelson, term coined by Nelson
See RFC 1945 http://www.ietf.org/rfc/rfc1945.txt
See RFC 2616 http://www.ietf.org/rfc/rfc2616.txt
Uses TCP transport service
HTTP requests and responses use TCP connection

"3 way handshake" to establish
sender --> REQUEST --> receiver
receiver --> ACKNOWLEDGE --> sender
sender --> ACKNOWLEDGE --> receiver

The REQUEST step sends a SYN packet, misused to create "SYN flood" for D.O.S. attack
In non-persistent HTTP, each file with web page sent using different TCP connection
In persistent HTTP, each file with web page sent using same TCP connection
HTTP is stateless, each request is seen by server as independent of all others
Stateless is separate issue from persistent -- persistent only applies to one request.

HTTP protocol specifics

HTTP commands are plain text
GET command used to request a page

First line contains GET command, subsequent lines contain header field name-value pairs
header field values give the server information about the client (which browser, preferred language, TCP persistence, etc)

Response header also plain text

First line contains response status and code (e.g. 200 is OK, 404 is not found, etc)
Followed by header lines with info from server (server type, file type, file modification time, file length)
Header followed by file contents

Other commonly-used HTTP commands include POST (request containing data from form as request body) and HEAD (request the response header but not file contents)

Maintaining State

As indicated above, HTTP does not maintain state
Clearly, "statefull" activities occur on the web, e.g. e-commerce, so how?
Here are a couple techniques

Client request contains parameter values appended to URL. Also used to pass form data to server. Example:
http://www.google.com/search?hl=en&q=http+state+maintenance&btnG=Google+Search
is generated when http state maintenance typed into Google search field.
Cookies: small files on client disk created and maintained by server to contain state info

More on cookies; they are established and used through this sequence
1. Initially, client has no cookie for networknut.com
2. Client (browser) connects to networknut.com with a regular GET command
3. networknut.com notices the GET header did not contain a Cookie: header line, so it generates a new cookie number, creates a database entry for that cookie number, and its response message contains a Set-Cookie: header line containing the number.
4. Client parses the Set-Cookie: line and, if permitted, creates the cookie file
5. User clicks a link and client sends another GET command; this time the request header contains a Cookie: line with the I.D. number provided by the server and stored in the cookie
6. Server now knows who the request came from and can proceed accordingly
7. Next time client goes to networknut.com, the initial GET request contains a Cookie: line with the same I.D. and the server immediately knows who is calling and can customize its response.

Caching and the Web

Caching, as you know, is keeping a "local" copy of something that may be requested in the near future, so that request can be serviced faster. The risk is that the local copy may become outdated, or "stale"
One technique for caching Web content is the browser cache. This retains copies of previously-downloaded files on the browser's file system. See http://www.microsoft.com/windows/ie/ie6/using/howto/customizing/clearcache.mspx for more information concerning Internet Explorer 6 browser cache.
Another technique is the Web cache, or proxy server. This is a server that sits somewhere between your browser and the desired Web server. The browser request goes to the proxy server, which responds if it has the requested object or forwards the request to the real server if it does not.
Note that in both techniques, quicker responses and less network traffic result from a "cache hit"
Client or proxy server can assure up-to-date copy by using conditional GET technique
- "conditional GET" is just a GET command with If-Modified-Since: header line. Its value is a time-stamp.
- A server receiving this get will compare that time-stamp with the modification time for the requested file.
- If the file's modification time is more recent, its contents will be included as the response body.
- Otherwise, the response will have code 304 and file contents are not included.
Successful Web caching (high hit rate) is critical to Web performance and lots of research has been directed to it. Text says typical is wide range, 20% to 70%. I've also seen 30%-50% range cited.

[ C SC 481 | Peter Sanderson | Math Sciences server | Math Sciences home page | Otterbein ]

Last updated:
Peter Sanderson (PSanderson@otterbein.edu)