Lecture 5: P2P and Sockets

C SC 481.20 Lecture 5: P2P and Sockets
major resource: Computer Networking (4th Edition), Kurose and Ross, Addison Wesley, 2008

[ previous | schedule | next ]

Peer-To-Peer (P2P) Applications and Protocols

Intro and Major Issues

A "leaderless" architecture; all participants are peers
No always-on, well-known server
Major advantages
- Resilient (no single point of failure)
- Scalable (no single bottleneck as demand rises)
- Greater freedom (not at mercy of server)
Major disadvantages
- More network overhead
- potentially less trustworthy (have to depend on peers)
- anything else?
Major applications based on P2P architecture
- File distribution
- Instant messaging
- Internet Telephony (VoIP)
Scalability example: distributing a file of size F
- client-server with 1 server and N clients
  - Want to distribute file to all clients (a software patch for instance)
  - Total number of bits leaving server serially is N * F
  - Time required is (N * F) / (server upload bandwidth)
  - Total time to distribute from server will be at least this
  - Total time to complete distribution to all clients depends on client download rates
- P2P with N peers
  - Want to distribute file among all peers
  - Analysis is more complex, and based on specific distribution technique
  - One peer initially has file, total number of bits leaving it is at least F
  - Time required by origin is at least N / (origin upload bandwidth)
  - Total number of data bits transmitted by the network of peers is N * F
  - As soon as first bits leave origin, they can be redistributed by receiver
  - Without getting into the math, many links can be active in parallel so overall distribution is faster
- In essence, as N grows:
  - the distribution time in client-server grows linearly
  - the distribution time in P2P grows logarithmically
Major P2P issues include
- Joining a network
- Finding what you need
- Downloading
- Contributing to the network by uploading
- Leaving the network

BitTorrent to Illustrate Issues

Joining a network
- Collection of peers participating in distribution of specific file is called a swarm
- The torrent is technically a .torrent file that contains information about the data you want to download, also refers to collective data and information about the desired resource
- Each torrent has a designated tracker node. Join by notifying the tracker
- Tracker responds with some IP addresses of peers in the swarm.
- Important note: the tracker does not have any of the file contents! Only peers and web sites do
- How to find the right tracker? See next topic...
Finding what you need
- In BitTorrent, this means finding the right tracker!
- There are a large number of BitTorrent Search Engines
- For example, see http://www.chinesefortunecalendar.com/Share/BTList.htm for an abridged list of BitTorrent search engines
Downloading
- Tracker provides new peer with IP addresses of selected peers
- New peer attempts TCP connections with these neighboring peers to start receiving
- Files are partitioned into chunks, typically 256KB
- Find out which chunks your neighboring peers have, then start requesting them
- Which chunks? Request the chunk that fewest neighbors have, to boost its circulation (rarest first)
- This process is repeated periodically, also touch bases with the tracker occasionally
- Now that you are a neighbor, soon requests will start coming in from others!
- Which requests do you respond to? Those from peers who have been supplying chunks to you at the highest rate (top uploader)!
- Maintains set of 4 top uploaders and trades with them (self-balancing mechanism)
Contributing to the network by uploading
- See the downloading algorithm above: if you don't upload, eventually no one will respond to your requests!
- This effectively eliminates the ability to take without giving (free-riding)
- The two-way exchanges are essential to torrent performance
- You can remain in the torrent once you've received all chunks, just to upload
Leaving the network
- Once you've received all chunks, you can leave the swarm
- It is polite to hang around, to seed the file for others just starting out
See www.bittorrent.org, a BitTorrent developer's forum that includes lots of protocol information
There is a decent BitTorrent FAQ page at dessent.net/btfaq

Locating Content in a P2P Network

This is the major P2P issue
In client-server, you know who the server is or find it using search engine
In P2P, how do you find the content? Saw the BitTorrent solution above.
There are a range of techniques
- Centralized Index
  - A well-known server contains index of who has what
    - ADV: centralized, easy to find content
    - DIS: centralized, single point of failure
    - DIS: does not scale easily (may require additional server resources)
    - For file-sharing systems that offer free downloading of copyrighted materials, server is easily-identifiable legal target
- Query Flooding
  - Index is decentralized, so flood Internet with index requests
    - ADV: completely distributed
    - DIS: generates lots of overhead packets on Internet (grows exponentially)
    - DIS: complex protocols since, unlike servers, peers come and go freely
  - Classic example is original Gnutella, introduced in 2000
    - Known more as underlying protocol than as service
    - Underlying technology for LimeWire among others
    - Brand new peer is given bootstrap list of "neighbors" in overlay network
    - Start search by asking neighbors, who recursively ask all their other neighbors in overlay network, query flooding
    - The above recurses to a maximum level (typically 7) or until search ends
    - Content holder responds back to requester - originally backwards through path but later direct
    - Gnutella has evolved to be more efficient
- Hierarchical Overlay
  - Semi-centralized hybrid: network of super peers, each is "mini server" hub for collection of ordinary peers
  - Super peers are connected to each other, and each contains indexes obtained from its children (ordinary peers)
    - ADV: less overhead than query flooding
    - ADV: scales up very nicely
  - Super peers are peers with high bandwidth connections, typically on college campuses
  - Examples include modern Gnutella (ultrapeers and leaves), FastTrack (underlying protocol for Kazaa)

Sockets Programming for TCP

Overview of Sockets

API between application and transport layers
(rephrased) Your application program uses them to communicate with network and OS
Internet sockets come in two flavors: TCP-based and UDP-based
We'll focus on TCP:
- Guaranteed sequenced and error-free delivery
- Connection-oriented stream service
- Basic protocol for TCP-based sockets
  - 3-way handshake to establish client-server connection
  - 2-way communication through connection socket
  - close connection socket when finished
Each service and socket communicates over a designated port
Client and server each follow distinct protocol

Client protocol (typical):

Request socket connection, wait for confirmation
Send request over socket
Wait for and receive server reply over socket
Repeat send-receive cycle (steps 2 and 3) as required by application protocol
Close socket connection

Server protocol (typical):

Create a server socket to handle incoming connection requests
Wait for incoming connection requests
When request arrives, create a connection socket for communication (this leaves server socket free to accept additional connection requests)
Receive client request over connection socket
Send reply to client over connection socket
Repeat receive-send cycle (steps 4 and 5) as required by application protocol
Close connection socket

Simple Example Java client and server

We will demo and examine simple Java application that uses TCP sockets
The java.net package provides great library support for sockets

End of Material for Exam #1

[ C SC 481 | Peter Sanderson | Math Sciences server | Math Sciences home page | Otterbein ]

Last updated:
Peter Sanderson (PSanderson@otterbein.edu)