Chapter 12.6 Consistency and Replication

Slides:



Advertisements
Similar presentations
PHP I.
Advertisements

Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.
Amazon CloudFront An introductory discussion. What is Amazon CloudFront? 5/31/20122© e-Zest Solutions Ltd. Amazon CloudFront is a web service for content.
1 Content Delivery Networks iBAND2 May 24, 1999 Dave Farber CTO Sandpiper Networks, Inc.
A Taxonomy and Survey of Content Delivery Networks Meng-Huan Wu 2011/10/26 1.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
EEC-484/584 Computer Networks Discussion Session for HTTP and DNS Wenbing Zhao
What’s a Web Cache? Why do people use them? Web cache location Web cache purpose There are two main reasons that Web cache are used:  to reduce latency.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
Internet Networking Spring 2002 Tutorial 13 Web Caching Protocols ICP, CARP.
Squirrel: A decentralized peer- to-peer web cache Paul Burstein 10/27/2003.
Implementing ISA Server Caching. Caching Overview ISA Server supports caching as a way to improve the speed of retrieving information from the Internet.
Caching And Prefetching For Web Content Distribution Presented By:- Harpreet Singh Sidong Zeng ECE Fall 2007.
1 Web Content Delivery Reading: Section and COS 461: Computer Networks Spring 2007 (MW 1:30-2:50 in Friend 004) Ioannis Avramopoulos Instructor:
Web Caching Schemes For The Internet – cont. By Jia Wang.
1 ENHANCHING THE WEB’S INFRASTUCTURE: FROM CACHING TO REPLICATION ECE 7995 Presented By: Pooja Swami and Usha Parashetti.
Caching and Content Distribution Networks. Web Caching r As an example, we use the web to illustrate caching and other related issues browser Web Proxy.
Web Cache. Introduction what is web cache?  Introducing proxy servers at certain points in the network that serve in caching Web documents for faster.
Towards Autonomic Hosting of Multi-tier Internet Services Swaminathan Sivasubramanian, Guillaume Pierre and Maarten van Steen Vrije Universiteit, Amsterdam,
1 Content Distribution Networks. 2 Replication Issues Request distribution: how to transparently distribute requests for content among replication servers.
{ Content Distribution Networks ECE544 Dhananjay Makwana Principal Software Engineer, Semandex Networks 5/2/14ECE544.
Web Caching: Replication on the World Wide Web Jonathan Bulava CSC8530 – Distributed Systems Dr. Paul Schragger.
Design and Implement an Efficient Web Application Server Presented by Tai-Lin Han Date: 11/28/2000.
Performance of Web Applications Introduction One of the success-critical quality characteristics of Web applications is system performance. What.
5 Chapter Five Web Servers. 5 Chapter Objectives Learn about the Microsoft Personal Web Server Software Learn how to improve Web site performance Learn.
CH2 System models.
Distributed File Systems
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
Web Caching By Neeraj Agrawal. Caching Caching is widely used for improving performance in many context( e.g processor caches in hardware, buffer pool.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
Adaptive Web Caching CS411 Dynamic Web-Based Systems Flying Pig Fei Teng/Long Zhao/Pallavi Shinde Computer Science Department.
Ch 10 Shared memory via message passing Problems –Explicit user action needed –Address spaces are distinct –Small Granularity of Transfer Distributed Shared.
Computer Science Lecture 14, page 1 CS677: Distributed OS Last Class: Concurrency Control Concurrency control –Two phase locks –Time stamps Intro to Replication.
HTTP evolution - TCP/IP issues Lecture 4 CM David De Roure
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Web Prefetching Lili Qiu Microsoft Research March 27, 2003.
Content Delivery Networks: Status and Trends Speaker: Shao-Fen Chou Advisor: Dr. Ho-Ting Wu 5/8/
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
/ Fast Web Content Delivery An Introduction to Related Techniques by Paper Survey B Li, Chien-chang R Sung, Chih-kuei.
WHAT'S THE DIFFERENCE BETWEEN A WEB APPLICATION STREAMING NETWORK AND A CDN? INSTART LOGIC.
4.01 How Web Pages Work.
Authors: Jiang Xie, Ian F. Akyildiz
Web Server Load Balancing/Scheduling
Presentation on Distributed Web Based Systems Submitted by WWW
Coral: A Peer-to-peer Content Distribution Network
Web Server Load Balancing/Scheduling
Content Distribution Networks
Caching Temporary storage of frequently accessed data (duplicating original data stored somewhere else) Reduces access time/latency for clients Reduces.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Web Caching? Web Caching:.
Processes The most important processes used in Web-based systems and their internal organization.
Utilization of Azure CDN for the large file distribution
Internet Networking recitation #12
Internet Applications
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
Distributed Systems CS
7.1. CONSISTENCY AND REPLICATION INTRODUCTION
DotSlash: An Automated Web Hotspot Rescue System
AWS Cloud Computing Masaki.
Outline Announcements Lab2 Distributed File Systems 1/17/2019 COP5611.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
EE 122: Lecture 22 (Overlay Networks)
Database System Architectures
4.01 How Web Pages Work.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Presentation transcript:

Chapter 12.6 Consistency and Replication Berkay Aydin & Zhuoli Lin November 11th, 2015

Consistency and Replication Web-Proxy Caching Replication for Web Hosting Systems Metric Estimation Adaptation Triggering Adjustment Measures Replication of Web Applications Consistency and Replication Part I Perhaps one of the most important systems-oriented developments in Web- based distributed systems is ensuring that access to Web documents meets stringent performance and availability requirements. These requirements have led to numerous proposals for caching and replicating Web content, of which various ones will be discussed in this section. Where the original schemes (which are still largely deployed) have been targeted toward supporting static content, much effort is also being put into support dynamic content, that is, supporting docu- ments that are generated as the result of a request, as well as those containing scripts and such. An excellent and complete picture of Web caching and replica- tion is provided by Rabinovich and Spatscheck (2002).

Web Proxy Caching Client-side caching Browsers - simple caching facility (local cache) Store documents in the client’s browser cache Web proxy on client side (shared cache) Client-side proxy can cache the documents Sends the document to the client if another response comes in Hierarchical caching (regions, countries etc.) Reduce network traffic Possible to have higher latency, multiple cache checks

Web Proxy Caching Cooperative (distributed) caching When a cache-miss occurs, check neighboring proxies If neighbor has it, it sends the document. Else request is forwarded to web server Trade-offs between hierarchical and cooperative caching Cooperative -> lower transmission time, storage requirements are less strict Hierarchical -> expected latency is lower Image taken from [1]

Web Proxy Caching Cache-consistency Conditional HTTP request (if-modified-since header) If the document is modified since the associated header value, server returns Else web proxy returns Proxy contacts the server each time Squid Web proxy Expiration time (shows how long ago the document was last modified) Until Texpire the document is considered as valid (practically α can be set to 0.2)

Web proxy caching Problems with Squid Alternatives Less consistency Proxy may return an invalid document There is no way to detect Alternatives Server notifies proxies by sending an invalidation Downside -> scalability It can outperform in terms of bandwidth and perceived latency Web-proxy is for static content Cache replacement strategy -> LRU

Replication for Web Hosting Systems Content Delivery Network (CDN) Maintaining the content of Web Ensuring that the site is accessible CDNs act as web hosting service replicate the content in different sites self-managing system -> automatic distribution and replication Three aspects of CDNs Metric estimation Adaptation triggering Taking appropriate measures {Replica replacement, consistency enforcement, client-request routing}

Replication for Web Hosting Systems Content Delivery Network (CDN) Maintaining the content of Web Ensuring that the site is accessible CDNs act as web hosting service replicate the content in different sites self-managing system -> automatic distribution and replication Three aspects of CDNs Metric estimation Adaptation triggering Taking appropriate measures {Replica replacement, consistency enforcement, client-request routing}

Replication - Metric Estimation Trade-offs (access time vs. cost) Latency metrics Time spent for fetching a document Available bandwidth (bandwidth between two nodes) Important for large document transfers Spatial metrics Distance between nodes (number of network level routing-hops) Network usage metrics Consumed bandwidth, number of bytes to transfer Consistency metrics To what extent a replica is deviating from its master copy Financial metrics Financial performance

Replication - Adaptation Triggering When and how adaptations are triggered Simple Approach: Periodically estimate metrics and take measures as needed Responding to a flash crowd Flash crowd predicting Use a window Linear regression Warn when # of requests passes a pre-determined period Hard to get threshold, and windows size

Replication - Adjustment Measures Deciding how and when to redirect client requests Embedded documents Get base document DNS lookup from regular DNS system DNS lookup from CDN DNS system Get embedded documents from CDN server If cached - use cache Else - use origin server Perceived performance

Replication of Web Applications Edge-server stores replicated data Replication can be Partial Full Full replication - works well when low update ratio, frequent joins Partial replication which data to be stored? content-aware caches well with repeated queries consistency problem content-blind caching caching the query results

Consistency and Replication Introduction Related Work Techniques to Scale Web Application Consistency and Replication Part II

Introduction Developers often use relocation and caching mechanisms to enhance Web application performance. This paper present a qualitative and quantitative analysis of state–of–the-art replication and caching techniques used to host Web application.

Related Work Web sites can be slow for many reasons, but the most prevalent one is the dynamic generation of Web documents. Dynamic generation of a Web page typically requires issuing one or more queries to a database, so access time to the database can easily get out of hand when the request load is high. There are several techniques to overcome this problem. The most straightforward one is Web page caching. This technique works well if the same cached HTML page can answer many requests to a particular Web site. With the growing drive toward personalized Web content, generated pages tend to be unique for each user, thereby reducing the benefits of page – caching techniques.

Techniques to Scale Web Application Instead of caching the dynamic pages generated by a central web server, various techniques aim to replicate the means of generating pages over multiple edge servers. They typically provide “read–your–writes” consistency, which guarantees an application at an edge server performs an update, any subsequent reads from the same edge server will return that update’s effect.

Techniques to Scale Web Application Edge Computing The simplest way to generate user–specific pages is to replicate the application code at multiple edge servers and keep the data centralized Drawbacks If the edge servers are located worldwide, each data access incurs wide – area network latency. The central database quickly becomes a performance bottleneck because it needs to serve the entire system’s database requests.

Techniques to Scale Web Application Data Replication To solve the database bottleneck problem, data replication places the data at each edge server so that generating a page requires only local computation and data access. This technique helps maintain identical copies of the database at multiple locations. Drawbacks If a Web application generates many database updates, each update must be propagated to all the other replicas to maintain the consistency. Potentially introduce a huge network traffic and performance overhead.

Techniques to Scale Web Application Content-Aware Data Caching Instead of maintaining full copies of the database at each edge server, content – aware caching systems cache database query results as the application code issues them. Each edge server maintains a partial copy of the database, and each time the application running at the edge issues a query, the edge – server database checks of it contains enough data locally to answer the query correctly. Drawbacks This method can reduce the cache hit rate. The update queries always execute at the origin server.

Techniques to Scale Web Application Blind Data-Caching Edge servers do not need to run a database at all. Servers store the results of remote database queries independently, so the cache replacement is simple. Can apply many popular replacement algorithms.

Comparison RUBBos benchmark TPC – W brosing TPC – W ordering

Consistency and Replication Future Work Consistency and Replication Part III

Future Work Plan to build and evaluate a prototype system that enables dynamic provisioning and reconfiguration of multitier Web applications. A combination of end–to–end analytical model and virtual caches will determine the optimal resource configuration for a given application.

References [1] Tanenbaum, A., & Steen, M. (2007). Distributed systems: Principles and paradigms (2nd ed.). Upper Saddle River, NJ: Pearson Prentice Hall. [2] Sivasubramanian, S., Pierre, G., van Steen, M., & Alonso, G. (2007). Analysis of caching and replication strategies for web applications. Internet Computing, IEEE, 11(1), 60-66. [3] Sivasubramanian, S., Pierre, G., Van Steen, M., & Alonso, G. (2006).GlobeCBC: Content-blind result caching for dynamic web applications. Technical Report IR-CS-022, Vrije Universiteit, Amsterdam, The Netherlands.