Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Slides:



Advertisements
Similar presentations
FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology.
Advertisements

Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.
Peer-to-Peer and Social Networks An overview of Gnutella.
Alex Cheung and Hans-Arno Jacobsen August, 14 th 2009 MIDDLEWARE SYSTEMS RESEARCH GROUP.
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
SIMPLE Presence Traffic Optimization and Server Scalability Vishal Kumar Singh Henning Schulzrinne Markus Isomaki Piotr Boni IETF 67, San Diego.
Cognitive Publish/Subscribe for Heterogeneous Clouds Šarūnas Girdzijauskas, Swedish Institute of Computer Science (SICS) Joint work with:
Corona: A High Performance Publish-Subscribe System for the World Wide Web Authors: V. Ramasubramanian, R. Peterson and E.G. Sirer Cornell University Presenter:
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Web Content Filter: technology for social safe browsing Ilya Tikhomirov Institute for Systems Analysis of the Russian Academy of Sciences
Cobra: Content-based Filtering and Aggregation of Blogs and RSS Feeds Ian Rose 1, Rohan Murty 1, Peter Pietzuch 2, Jonathan Ledlie 1, Mema Roussopoulos.
SCAN: A Dynamic, Scalable, and Efficient Content Distribution Network Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy,
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
FeedTree: Sharing Web Micronews with Peer-to-Peer Event Notification D. Sandler, A. Mislove, A. Post, P. Druschel Presented by: Andrew Sutton.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Application Layer Anycasting: A Server Selection Architecture and Use in a Replicated Web Service Presented in by Jayanthkumar Kannan On 11/26/03.
Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan.
Rendezvous Points-Based Scalable Content Discovery with Load Balancing Jun Gao Peter Steenkiste Computer Science Department Carnegie Mellon University.
1 Drafting Behind Akamai (Travelocity-Based Detouring) AoJan Su, David R. Choffnes, Aleksandar Kuzmanovic, and Fabian E. Bustamante Department of Electrical.
An Overlay Multicast Infrastructure for Live/Stored Video Streaming Visual Communication Laboratory Department of Computer Science National Tsing Hua University.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
Wide-area cooperative storage with CFS
Application Layer  We will learn about protocols by examining popular application-level protocols  HTTP  FTP  SMTP / POP3 / IMAP  Focus on client-server.
World Wide Web Caching: Trends and Technology Greg Barish and Katia Obraczka USC Information Science Institute IEEE Communications Magazine, May 2000 Presented.
Web Cache. Introduction what is web cache?  Introducing proxy servers at certain points in the network that serve in caching Web documents for faster.
Feeds Computer Applications to Medicine NSF REU at University of Virginia July 27, 2006 Paul Lee.
© 2009 AT&T Intellectual Property. All rights reserved. Multimedia content growth: From IP networks to Medianets Cisco-IEEE ComSoc Webinar. Sept. 23, 2009.
P2P File Sharing Systems
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.
Communication (II) Chapter 4
Developing Analytical Framework to Measure Robustness of Peer-to-Peer Networks Niloy Ganguly.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Mobile Adhoc Network: Routing Protocol:AODV
Gil EinzigerRoy Friedman Computer Science Department Technion.
Overcast: Reliable Multicasting with an Overlay Network CS294 Paul Burstein 9/15/2003.
Crawling Slides adapted from
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Peer-to-Pee Computing HP Technical Report Chin-Yi Tsai.
Web 2.0 Pragith Prakash Vikram Singh By The Era of.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Network Computing Laboratory Scalable File Sharing System Using Distributed Hash Table Idea Proposal April 14, 2005 Presentation by Jaesun Han.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Strong Cache Consistency Support for Domain Name System Xin Chen, Haining Wang, Sansi Ren and Xiaodong Zhang College of William and Mary, Williamsburg,
TOMA: A Viable Solution for Large- Scale Multicast Service Support Li Lao, Jun-Hong Cui, and Mario Gerla UCLA and University of Connecticut Networking.
Implicit group messaging in peer-to-peer networks Daniel Cutting, 28th April 2006 Advanced Networks Research Group.
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
Adaptive Web Caching CS411 Dynamic Web-Based Systems Flying Pig Fei Teng/Long Zhao/Pallavi Shinde Computer Science Department.
A Peer-to-Peer Approach to Resource Discovery in Grid Environments (in HPDC’02, by U of Chicago) Gisik Kwon Nov. 18, 2002.
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
Setup and Management for the CacheRaQ. Confidential, Page 2 Cache Installation Outline – Setup & Wizard – Cache Configurations –ICP.
Push Technology Humie Leung Annabelle Huo. Introduction Push technology is a set of technologies used to send information to a client without the client.
Network Computing Laboratory 1 Vivaldi: A Decentralized Network Coordinate System Authors: Frank Dabek, Russ Cox, Frans Kaashoek, Robert Morris MIT Published.
IBM Lotus Software © 2006 IBM Corporation IBM Lotus Notes Domino Blog Template Steve Castledine.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
Peer-to-Peer Result Dissemination in High-Volume Data Filtering Shariq Rizvi and Paul Burstein CS 294-4: Peer-to-Peer Systems.
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Ad Hoc On-Demand Distance Vector Routing (AODV) ietf
Network Computing Laboratory Load Balancing and Stability Issues in Algorithms for Service Composition Bhaskaran Raman & Randy H.Katz U.C Berkeley INFOCOM.
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Client Behavior and Feed Characteristics of RSS
Early Measurements of a Cluster-based Architecture for P2P Systems
Providing Secure Storage on the Internet
Presentation transcript:

Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 2 Outline One line comment Motivation/Problem Approach Analysis of feed publishing Challenges Experiments Critique

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 3 One line comment Disseminate web feeds in a distributed (P2P) manner to increase scalability of web servers RSS reveals visitors to content providers RSS decoupled fetch operation from read RSS AB Traditional method P2P method AB

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 4 Motivation & Problem RSS/Atom feeds have become increasingly popular Published by most traditional media and blogs Feeding mechanism Update page as contents are added HTTP request HTTP response nyt.com RSS reader: Poll server to check updates … … Scalability

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 5 Approach The Approach P2P overlay + gossip based protocol P2P: Scalable growth in resources with service demand Gossip: Scalable, Robustness (Join & Leave) Feature of this overlay Don’t have to guarantee delivery or delay Challenges Overlay construction Fetching interval determination Data dissemination Free riding prevention ? content searching

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 6 Analysis of Feed Publishing Methodology 245 popular feeds monitored for 10 days Most popular feeds – information from Gmail’s web clips, Bloglines Feeds fetched every 2 minutes Measured.. Publishing rate Entry count in a feed Entry lifetime

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 7 Publishing Rate by Rank Great difference between publishers Partly zipf distribution

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 8 Entry Count High publish rate, More entry counts? – NO Lifetime of entries are short  Entries can be lost with infrequent requests

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 9 Publishing Rate by Time 4 types of publishing patterns

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 10 Challenges – Overlay Construction (1/2) – Goal: Minimize network management overhead Join 1. Well known host OR Contact previous neighbors 2. Share subscription set info 3. Update subscription set info to the network Leave Soft-state Update subscription set periodically Gateway Neighbor list Subscription set desthop CNN0 desthop YAHO0 HANI1 desthop CNN1

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 11 Challenges – Overlay Construction (1/2) – Neighbor selection Many neighbors may incur overhead Need to adapt to my resource status  select “useful” neighbors to me Whose subscription set is similar to me HANI0 CNN0 YAHOO0 DAUM0 A B NCLAB0 CNN0 HANI1 DAUM2 1 direct, 1 one-hop, 1 two-hop

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 12 Challenges – Fetching interval determination – Adaptive Fetching Problem: Little hints about the publishing rate or entry lifetime Frequent polling: overload servers, consume clients’ net bandwidth Lazy polling: increase delay or miss entries Adaptive Algorithm Intuition: Frequent fetching  few new entries Freshness rate: fraction of new entries in the fetched document If Freshness rate < target freshness  Halve the fetching rate If Freshness rate > target freshness  Double the fetching rate Fetch HANI 1.Report 1 2.Report 2 3.Report 3 4.… Entries in a feed

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 13 Challenges – Data dissemination– Goal: Minimize bandwidth consumption 1. Limit the boundary of delivery Forward only to matching neighbors (subscription set, hop_count)  reduce forwarding overhead 2. Reduce the unit of delivery Unit of delivery : Entry bundle A set of new entries (Filter out old entries)  Reduce redundant content delivery 3. Check before forwarding Exchange id of an entry bundle (ID: SHA-1 digest of the bundle) If it is an undelivered bundle  deliver it HANI2 Fetch HANI Max subset hops = 1

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 14 Challenges – Free riding prevention– Nodes may manifest selfish behavior Only receive, without forwarding Lie subscription set to become a preferred neighbor Solution: Provide a neighbor evaluation method Contribution metric Nodes who forwards feeds I subscribe, and my near neighbors subscribe Level of contribution: direct subscription, 1 hop subscription, 2 hop sub, … cm i, j += w f −hf Cut out unhelpful neighbors: I helped, but it doesn’t helped me d i,j = cm i,j − cm j,i Feature Uses local information only  Easy to implement and enforce the mechanism

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 15 Challenges – Entry searching – Overlay as a distributed storage Iterative searching Strong points: Searching latency, query traffic Recursive searching (flooding) Strong points: low overhead of a requester, caching for popular queries, reflect to neighbor evaluation ?

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 16 Benefits of FeedEx 1. Scalability 2. Archivability Storage of entries 3. Controllability Compared to web based readers : e.g. Fetch interval 4. Filtering and recommendation Share opinions on entries (e.g. voting) Feed recommendation 5. Privacy Users can fetch documents for others  anonymize actual users

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 17 Architecture of FeedEx Prototpye: python Networking: Twisted Protocol : XML-RPC Interoperability, fast-prototyping Entry Storage: SQLite (Lightweight RDB) RSS parser : feedparser.org

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 18 Experimental Setup Two modes Stand-alone mode  SLN FeedEx mode  XCH Metrics Time lag Missing entries Communication cost Experiments Use 189 PlanetLab nodes Run 22 hours on a weekday Primary factor: 6 fetching intervals Let each node subscribe 20 out of 70 feeds

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 19 Results: Time Lag Average Time Lag Average of node averages Without applying adaptive fetching algorithm  Despite of fetching interval, contents are delivered soon 15.8times

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 20 Rate of Missing entries # enrtries in a node / # of entries in a reference node  Low missing rate  despite of a problem(DNS error or routing error) in the network  Sometimes better than the reference node Results: Missing Entries

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 21 Two most frequently called precedures: check_did, put_entries Check_did call: single IP packet Put_entries: 2 calls / minute  deliver 2.67 entries / call  Low communication cost Results: Communication Cost

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 22 Critique Strong points Made an new problem from an old domain “web caching” Free from delay / failure of nodes Draw out possible benefits/extensions simple! Practically deployable Tried to find a mechanism both good for servers and clients

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 23 Critique Weak points Overload due to RSS feed delivery? Only a small text file delivery Should have considered podcasting(Multimedia RSS) Will the clients donate their resource? Is “short delay” a strong incentive? Is “low bandwidth consumption” a strong incentive? Will the subscription sets of people really overlap a lot? Net effective to SPs providing diverse RSS feeds e.g. Naver blog, egloos.. Is it really robust to frequent leave and join? Lack of server side evaluation Server load & network resource Delivering critical data (e.g. timely news) using RSS?

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 24 Supplementary slides

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 25 Entry Lifetime Generally CNN, Publishers have policies (probably)

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 26 New idea Topic based feed pub/sub system Why should we register the address of a feed? Need to find addresses providing contents I want A feed may contain contents that I don’t want Web Content providers feeds Topic based feed pub/sub (P2P based) Topic of interest (Maybe Tags?) Contents related to the topic

Korea Advanced Institute of Science and Technology Network Computing Laboratory | 27 New idea Topic based feeding services are already launched Baebo Create new feeds by keywords from the Amazon, Yahoo, eBay feeds Say4 Extract entries containing sentences in the bible from the BBC feed. But centralized server runs the service Limitation in the number of input feeds Hard to add input feed dynamically compared to P2P approach