C SC 481.20 Lecture 5: P2P and Sockets
major resource: Computer Networking (4th Edition),
Kurose and Ross, Addison Wesley, 2008
[ previous
| schedule
| next ]
Peer-To-Peer (P2P) Applications and Protocols
Intro and Major Issues
- A "leaderless" architecture; all participants are peers
- No always-on, well-known server
- Major advantages
- Resilient (no single point of failure)
- Scalable (no single bottleneck as demand rises)
- Greater freedom (not at mercy of server)
- Major disadvantages
- More network overhead
- potentially less trustworthy (have to depend on peers)
- anything else?
- Major applications based on P2P architecture
- File distribution
- Instant messaging
- Internet Telephony (VoIP)
- Scalability example: distributing a file of size F
- client-server with 1 server and N clients
- Want to distribute file to all clients (a software patch for instance)
- Total number of bits leaving server serially is N * F
- Time required is (N * F) / (server upload bandwidth)
- Total time to distribute from server will be at least this
- Total time to complete distribution to all clients depends on client download rates
- P2P with N peers
- Want to distribute file among all peers
- Analysis is more complex, and based on specific distribution technique
- One peer initially has file, total number of bits leaving it is at least F
- Time required by origin is at least N / (origin upload bandwidth)
- Total number of data bits transmitted by the network of peers is N * F
- As soon as first bits leave origin, they can be redistributed by receiver
- Without getting into the math, many links can be active in parallel so overall distribution is faster
- In essence, as N grows:
- the distribution time in client-server grows linearly
- the distribution time in P2P grows logarithmically
- Major P2P issues include
- Joining a network
- Finding what you need
- Downloading
- Contributing to the network by uploading
- Leaving the network
BitTorrent to Illustrate Issues
- Joining a network
- Collection of peers participating in distribution of specific file
is called a swarm
- The torrent is technically a .torrent file that contains
information about the data you want to download, also refers to collective data and information about the desired resource
- Each torrent has a designated tracker node. Join by
notifying the tracker
- Tracker responds with some IP addresses of peers in the swarm.
- Important note: the tracker does not have any of the file contents! Only peers and web sites do
- How to find the right tracker? See next topic...
- Finding what you need
- Downloading
- Tracker provides new peer with IP addresses of selected peers
- New peer attempts TCP connections with these neighboring peers to start receiving
- Files are partitioned into chunks, typically 256KB
- Find out which chunks your neighboring peers have, then start requesting them
- Which chunks? Request the chunk that fewest neighbors have, to boost its circulation (rarest first)
- This process is repeated periodically, also touch bases with the tracker occasionally
- Now that you are a neighbor, soon requests will start coming in from others!
- Which requests do you respond to? Those from peers who have been supplying chunks to you at the highest rate (top uploader)!
- Maintains set of 4 top uploaders and trades with them (self-balancing mechanism)
- Contributing to the network by uploading
- See the downloading algorithm above: if you don't upload, eventually no one will respond to your requests!
- This effectively eliminates the ability to take without giving (free-riding)
- The two-way exchanges are essential to torrent performance
- You can remain in the torrent once you've received all chunks, just to upload
- Leaving the network
- Once you've received all chunks, you can leave the swarm
- It is polite to hang around, to seed the file for others just starting out
- See www.bittorrent.org, a BitTorrent developer's forum
that includes lots of protocol information
- There is a decent BitTorrent FAQ page at dessent.net/btfaq
Locating Content in a P2P Network
- This is the major P2P issue
- In client-server, you know who the server is or find it using search engine
- In P2P, how do you find the content? Saw the BitTorrent solution above.
- There are a range of techniques
- Centralized Index
- A well-known server contains index of who has what
- ADV: centralized, easy to find content
- DIS: centralized, single point of failure
- DIS: does not scale easily (may require additional server resources)
- For file-sharing systems that offer free downloading of copyrighted materials, server
is easily-identifiable legal target
- Query Flooding
- Index is decentralized, so flood Internet with index requests
- ADV: completely distributed
- DIS: generates lots of overhead packets on Internet (grows exponentially)
- DIS: complex protocols since, unlike servers, peers come and go freely
- Classic example is original Gnutella, introduced in 2000
- Known more as underlying protocol than as service
- Underlying technology for LimeWire among others
- Brand new peer is given bootstrap list of "neighbors" in overlay network
- Start search by asking neighbors, who recursively ask all their other neighbors in overlay network, query flooding
- The above recurses to a maximum level (typically 7) or until search ends
- Content holder responds back to requester - originally backwards through path but later direct
- Gnutella has evolved to be more efficient
- Hierarchical Overlay
- Semi-centralized hybrid: network of super peers, each is "mini server" hub for collection of ordinary peers
- Super peers are connected to each other, and each contains indexes obtained from its children (ordinary peers)
- ADV: less overhead than query flooding
- ADV: scales up very nicely
- Super peers are peers with high bandwidth connections, typically on college campuses
- Examples include modern Gnutella (ultrapeers and leaves), FastTrack (underlying protocol for Kazaa)
Sockets Programming for TCP
Overview of Sockets
- API between application and transport layers
- (rephrased) Your application program uses them to communicate with network and OS
- Internet sockets come in two flavors: TCP-based and UDP-based
- We'll focus on TCP:
- Guaranteed sequenced and error-free delivery
- Connection-oriented stream service
- Basic protocol for TCP-based sockets
- 3-way handshake to establish client-server connection
- 2-way communication through connection socket
- close connection socket when finished
- Each service and socket communicates over a designated port
- Client and server each follow distinct protocol
Client protocol (typical):
- Request socket connection, wait for confirmation
- Send request over socket
- Wait for and receive server reply over socket
- Repeat send-receive cycle (steps 2 and 3) as required by application protocol
- Close socket connection
Server protocol (typical):
- Create a server socket to handle incoming connection requests
- Wait for incoming connection requests
- When request arrives, create a connection socket for communication
(this leaves server socket free to accept additional connection requests)
- Receive client request over connection socket
- Send reply to client over connection socket
- Repeat receive-send cycle (steps 4 and 5) as required by application protocol
- Close connection socket
Simple Example Java client and server
- We will demo and examine simple Java application that uses TCP sockets
- The java.net package provides great library support for sockets
End of Material for Exam #1
[ C
SC 481 | Peter
Sanderson | Math Sciences server
| Math Sciences home page
| Otterbein ]
Last updated:
Peter Sanderson (PSanderson@otterbein.edu)