Efficient Content Distribution on Internet
Who pays for showing a Web page to a user? Receiving side –Users pay to small ISPs, who pay to big ISPs, who pay to even bigger ISPs; –Concerns: reduce traffic / better response time Sending side –Web sites pay to ISP, or aggregators of ISPs (example: broadcast, akamai); –Concerns? It depends...
A Simplistic View: Two Kinds of Web Sites Portals: really want to be seen and want to be seen with high quality Libraries: be available
Who should pay for showing a Web page to the user? 1. Library pages: the user 2. Portal pages: the Web site Many ISPs so far are focusing on 1 (the users, cutting down traffic), and ignoring 2 (the content providers, QoS) –the ignorance creates a vacuum that lets akamai.com flourish
What should be done? A mechanism that links ISPs and Portals –Addressing the logistical difficulties: One ISP charging many portals for content delivery; One portals getting certain assurance of QoS from many ISPs Trust between the parties The mechanism should also be efficient –Caching, Replication, Routing, Differentiality
Peregrine Net Inc. To ISP: –Caching proxy –Caching proxy plus services months –Content distribution box To Portals: –Web acceleration proxy –ISP-coordinated QoS and load distribution months –URL rewrite for content distribution
The Proxy Products Runs on Linux, FreeBSD, Solaris, NT and other standard OS Scales from < 10Mb/s to over 4Gb/s Tiered pricing: –<10Mb/s: free –10Mb/s - 30Mb/s: $2K-4K per copy –30Mb/s - 60Mb/s: $10K per copy or appliance –60Mb/s - 155Mb/s: more expensive –155Mb/s and up: clustering, even more expensive
Service 1: Premium Content Management Portals pay ISP for each object delivered from cache * Optimal cache management balancing needs of users and thost of portals * Efficient hit reporting to portals Establishing trust: –Third-party inspection of software –statistical analysis of hit reports Estimated time: 2 months
Service 2: Active Cache Proxy Caching objects instead of datagrams –Web servers provide a piece of code (cache applet) that is associated with an object –The object and code are cached at the proxy; upon cache hit, the code is executed to generate responses Example: user-customized Web pages Benefit: Scale the Web server! Estimated time: 3-6 months
Service 3: Content Distribution Routing methods: –At Web server: rewrite URLs for image objects what Sandpiper and Akamai are doing –At proxy: redirect URLs to content distribution box, or mark objects as permanently cached * Optimal load balancing –Constant server + network load-monitoring * Efficient content authentication Estimated time: 3-6 months
Service 4: Rent-A-Server Rent when needed: –Web accelerator monitoring the load –If load exceeds limit, redirect or route requests to “rental” servers Rental servers capable of handling dynamic contents via process migration technology * Optimal server selection algorithm Estimated time: 6 months
How are we different from everyone else? We do what everyone else does We return part of the profits to ISPs, who carry the bits But, in addition Everyone uses our proxies Proxies control the routing Proxies can do arbitrary transformation on the URLs
Additional Service: Mining the Log Web server performance data User auto-rating of search results –Which item really answered the user’s question? Guess it from user’s surfing –Technique: build the user’s surfing graph with the search result as root User-profiling and feedback to Ad servers
Looking Forward: Efficient Video Content Distribution Caching proxy capable of handling video streams A hierarchy of caching proxies for video distribution –* Efficient “Prefix-Caching” algorithm –* Object popularity probing, and optional Satellite distribution for very popular objects
Steps to get there 1. proxy product and sales –Current beta testers: Siemens AG, Union Bank of Swiss, NetOne (Japan), a medium-size ISP in UK, JANET in UK 2. Premium content management, active caching proxy, content distribution service 3. Rent-A-Server After sufficient proxy deployment: log-mining and video distribution