Presentation is loading. Please wait.

Presentation is loading. Please wait.

Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Similar presentations


Presentation on theme: "Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006."— Presentation transcript:

1 Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006

2 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 2 Outline One line comment Motivation/Problem Approach Analysis of feed publishing Challenges Experiments Critique

3 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 3 One line comment Disseminate web feeds in a distributed (P2P) manner to increase scalability of web servers RSS reveals visitors to content providers RSS decoupled fetch operation from read RSS AB Traditional method P2P method AB

4 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 4 Motivation & Problem RSS/Atom feeds have become increasingly popular Published by most traditional media and blogs Feeding mechanism http://nyt.com/../feed.xml Update page as contents are added HTTP request HTTP response nyt.com RSS reader: Poll server to check updates … … Scalability

5 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 5 Approach The Approach P2P overlay + gossip based protocol P2P: Scalable growth in resources with service demand Gossip: Scalable, Robustness (Join & Leave) Feature of this overlay Don’t have to guarantee delivery or delay Challenges Overlay construction Fetching interval determination Data dissemination Free riding prevention ? content searching

6 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 6 Analysis of Feed Publishing Methodology 245 popular feeds monitored for 10 days Most popular feeds – information from Gmail’s web clips, Bloglines Feeds fetched every 2 minutes Measured.. Publishing rate Entry count in a feed Entry lifetime

7 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 7 Publishing Rate by Rank Great difference between publishers Partly zipf distribution

8 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 8 Entry Count High publish rate, More entry counts? – NO Lifetime of entries are short  Entries can be lost with infrequent requests

9 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 9 Publishing Rate by Time 4 types of publishing patterns

10 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 10 Challenges – Overlay Construction (1/2) – Goal: Minimize network management overhead Join 1. Well known host OR Contact previous neighbors 2. Share subscription set info 3. Update subscription set info to the network Leave Soft-state Update subscription set periodically Gateway Neighbor list Subscription set desthop CNN0 desthop YAHO0 HANI1 desthop CNN1

11 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 11 Challenges – Overlay Construction (1/2) – Neighbor selection Many neighbors may incur overhead Need to adapt to my resource status  select “useful” neighbors to me Whose subscription set is similar to me HANI0 CNN0 YAHOO0 DAUM0 A B NCLAB0 CNN0 HANI1 DAUM2 1 direct, 1 one-hop, 1 two-hop

12 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 12 Challenges – Fetching interval determination – Adaptive Fetching Problem: Little hints about the publishing rate or entry lifetime Frequent polling: overload servers, consume clients’ net bandwidth Lazy polling: increase delay or miss entries Adaptive Algorithm Intuition: Frequent fetching  few new entries Freshness rate: fraction of new entries in the fetched document If Freshness rate < target freshness  Halve the fetching rate If Freshness rate > target freshness  Double the fetching rate Fetch HANI 1.Report 1 2.Report 2 3.Report 3 4.… Entries in a feed

13 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 13 Challenges – Data dissemination– Goal: Minimize bandwidth consumption 1. Limit the boundary of delivery Forward only to matching neighbors (subscription set, hop_count)  reduce forwarding overhead 2. Reduce the unit of delivery Unit of delivery : Entry bundle A set of new entries (Filter out old entries)  Reduce redundant content delivery 3. Check before forwarding Exchange id of an entry bundle (ID: SHA-1 digest of the bundle) If it is an undelivered bundle  deliver it HANI2 Fetch HANI 0 0 1 Max subset hops = 1

14 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 14 Challenges – Free riding prevention– Nodes may manifest selfish behavior Only receive, without forwarding Lie subscription set to become a preferred neighbor Solution: Provide a neighbor evaluation method Contribution metric Nodes who forwards feeds I subscribe, and my near neighbors subscribe Level of contribution: direct subscription, 1 hop subscription, 2 hop sub, … cm i, j += w f −hf Cut out unhelpful neighbors: I helped, but it doesn’t helped me d i,j = cm i,j − cm j,i Feature Uses local information only  Easy to implement and enforce the mechanism

15 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 15 Challenges – Entry searching – Overlay as a distributed storage Iterative searching Strong points: Searching latency, query traffic Recursive searching (flooding) Strong points: low overhead of a requester, caching for popular queries, reflect to neighbor evaluation ?

16 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 16 Benefits of FeedEx 1. Scalability 2. Archivability Storage of entries 3. Controllability Compared to web based readers : e.g. Fetch interval 4. Filtering and recommendation Share opinions on entries (e.g. voting) Feed recommendation 5. Privacy Users can fetch documents for others  anonymize actual users

17 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 17 Architecture of FeedEx Prototpye: python Networking: Twisted Protocol : XML-RPC Interoperability, fast-prototyping Entry Storage: SQLite (Lightweight RDB) RSS parser : feedparser.org

18 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 18 Experimental Setup Two modes Stand-alone mode  SLN FeedEx mode  XCH Metrics Time lag Missing entries Communication cost Experiments Use 189 PlanetLab nodes Run 22 hours on a weekday Primary factor: 6 fetching intervals Let each node subscribe 20 out of 70 feeds

19 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 19 Results: Time Lag Average Time Lag Average of node averages Without applying adaptive fetching algorithm  Despite of fetching interval, contents are delivered soon 15.8times

20 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 20 Rate of Missing entries # enrtries in a node / # of entries in a reference node  Low missing rate  despite of a problem(DNS error or routing error) in the network  Sometimes better than the reference node Results: Missing Entries

21 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 21 Two most frequently called precedures: check_did, put_entries Check_did call: single IP packet Put_entries: 2 calls / minute  deliver 2.67 entries / call  Low communication cost Results: Communication Cost

22 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 22 Critique Strong points Made an new problem from an old domain “web caching” Free from delay / failure of nodes Draw out possible benefits/extensions simple! Practically deployable Tried to find a mechanism both good for servers and clients

23 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 23 Critique Weak points Overload due to RSS feed delivery? Only a small text file delivery Should have considered podcasting(Multimedia RSS) Will the clients donate their resource? Is “short delay” a strong incentive? Is “low bandwidth consumption” a strong incentive? Will the subscription sets of people really overlap a lot? Net effective to SPs providing diverse RSS feeds e.g. Naver blog, egloos.. Is it really robust to frequent leave and join? Lack of server side evaluation Server load & network resource Delivering critical data (e.g. timely news) using RSS?

24 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 24 Supplementary slides

25 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 25 Entry Lifetime Generally CNN, Publishers have policies (probably)

26 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 26 New idea Topic based feed pub/sub system Why should we register the address of a feed? Need to find addresses providing contents I want A feed may contain contents that I don’t want Web Content providers feeds Topic based feed pub/sub (P2P based) Topic of interest (Maybe Tags?) Contents related to the topic

27 Korea Advanced Institute of Science and Technology Network Computing Laboratory | 27 New idea Topic based feeding services are already launched Baebo Create new feeds by keywords from the Amazon, Yahoo, eBay feeds Say4 Extract entries containing sentences in the bible from the BBC feed. But centralized server runs the service Limitation in the number of input feeds Hard to add input feed dynamically compared to P2P approach


Download ppt "Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006."

Similar presentations


Ads by Google