Building a Scalable Web Server with Global Object Space Support on Heterogeneous Clusters Ph.D Annual Talk Ge CHEN CSIS HKU.

Slides:



Advertisements
Similar presentations
A KTEC Center of Excellence 1 Cooperative Caching for Chip Multiprocessors Jichuan Chang and Gurindar S. Sohi University of Wisconsin-Madison.
Advertisements

Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
LANs and WANs Network size, vary from –simple office system (few PCs) to –complex global system(thousands PCs) Distinguish by the distances that the network.
Scalable Content-aware Request Distribution in Cluster-based Network Servers Jianbin Wei 10/4/2001.
1 Content Delivery Networks iBAND2 May 24, 1999 Dave Farber CTO Sandpiper Networks, Inc.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.
Cooperative Caching Middleware for Cluster-Based Servers Francisco Matias Cuenca-Acuna Thu D. Nguyen Panic Lab Department of Computer Science Rutgers University.
Locality-Aware Request Distribution in Cluster-based Network Servers 1. Introduction and Motivation --- Why have this idea? 2. Strategies --- How to implement?
3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.
3.5 Interprocess Communication
Caching And Prefetching For Web Content Distribution Presented By:- Harpreet Singh Sidong Zeng ECE Fall 2007.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
Web Caching Schemes For The Internet – cont. By Jia Wang.
Application Layer  We will learn about protocols by examining popular application-level protocols  HTTP  FTP  SMTP / POP3 / IMAP  Focus on client-server.
NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.
World Wide Web Caching: Trends and Technology Greg Barish and Katia Obraczka USC Information Science Institute IEEE Communications Magazine, May 2000 Presented.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 34 – Media Server (Part 3) Klara Nahrstedt Spring 2012.
Firewall and Proxy Server Director: Dr. Mort Anvari Name: Anan Chen Date: Summer 2000.
Locality-Aware Request Distribution in Cluster-based Network Servers Presented by: Kevin Boos Authors: Vivek S. Pai, Mohit Aron, et al. Rice University.
p-Jigsaw: A Cluster-based Web Server with Cooperative Caching Supports Ge Chen, Cho-Li Wang, Francis C.M. Lau (Presented by Cho-Li Wang) The Systems Research.
File Systems and N/W attached storage (NAS) | VTU NOTES | QUESTION PAPERS | NEWS | VTU RESULTS | FORUM | BOOKSPAR ANDROID APP.
Supporting Strong Cache Coherency for Active Caches in Multi-Tier Data-Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan, S. Krishnamoorthy,
Server Load Balancing. Introduction Why is load balancing of servers needed? If there is only one web server responding to all the incoming HTTP requests.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
1 The Google File System Reporter: You-Wei Zhang.
1 3 Web Proxies Web Protocols and Practice. 2 Topics Web Protocols and Practice WEB PROXIES  Web Proxy Definition  Three of the Most Common Intermediaries.
1 Lecture 20: Parallel and Distributed Systems n Classification of parallel/distributed architectures n SMPs n Distributed systems n Clusters.
Page 19/17/2015 CSE 30341: Operating Systems Principles Optimal Algorithm  Replace page that will not be used for longest period of time  Used for measuring.
Local Area Networks (LAN) are small networks, with a short distance for the cables to run, typically a room, a floor, or a building. - LANs are limited.
INSTALLING MICROSOFT EXCHANGE SERVER 2003 CLUSTERS AND FRONT-END AND BACK ‑ END SERVERS Chapter 4.
Firewall and Internet Access Mechanism that control (1)Internet access, (2)Handle the problem of screening a particular network or an organization from.
Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.
Distributed File Systems
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
® IBM Software Group © 2007 IBM Corporation J2EE Web Component Introduction
CHEN Ge CSIS, HKU March 9, Jigsaw W3C’s Java Web Server.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data- Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan,
Building a Scalable Web Server with Global Object Space Support on Heterogeneous Clusters Ge Chen, Cho-Li Wang, Francis. C. M. Lau Department Of Computer.
Copyright © cs-tutorial.com. Overview Introduction Architecture Implementation Evaluation.
1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
Disk Farms at Jefferson Lab Bryan Hess
Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.
Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Full and Para Virtualization
CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
MiddleMan: A Video Caching Proxy Server NOSSDAV 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Soam Acharya Inktomi Corporation.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.
Overview on Web Caching COSC 513 Class Presentation Instructor: Prof. M. Anvari Student name: Wei Wei ID:
Background Computer System Architectures Computer System Software.
1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.
A Classification for Access Control List To Speed Up Packet-Filtering Firewall CHEN FAN, LONG TAN, RAWAD FELIMBAN and ABDELSHAKOUR ABUZNEID Department.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
Network - definition A network is defined as a collection of computers and peripheral devices (such as printers) connected together. A local area network.
Ad-blocker circumvention System
Web Caching? Web Caching:.
Processes The most important processes used in Web-based systems and their internal organization.
Memory Management for Scalable Web Data Servers
Advanced Operating Systems
Page Replacement.
Presentation transcript:

Building a Scalable Web Server with Global Object Space Support on Heterogeneous Clusters Ph.D Annual Talk Ge CHEN CSIS HKU

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU The Need for Speed Internet user is growing very fast According to United States Internet Council’s report, regular Internet user has increased from less then 9,000,000 in 1993 to more than 300,000,000 in the summer of 2000, and is still growing fast Broadband becomes popular According IDG’s report, 57% of the workers in U.S access Internet via broadband in office. The figure will be more than 90% by Home broadband user will also increase from less than 9,000,000 now to over 55,000,000 by 2005 HTTP requests account for larger portion of Internet traffic now One study shows that HTTP activity has grown to account for 75%~80% of all Internet traffic

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU The Need for Speed  Growing user number  Faster connection speed  Increasing portion of HTTP requests accounts for all Internet traffic Require a more powerful Web server architecture

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Cluster Based Solution Clustering is one of the promising solutions It is widely adopted by industry and academic projects Google.com AltaVista.com Swala Project at UCSB DC-Apache at the University of Arizona

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Why Another Cluster-based Web Server? Previous researches mainly focus on: Load Distribution/Balancing [H. Bryhni et al. 2000] Scalability [Trevor Schroeder et al. 2000] High Availability [Guillaume Pierre et al. 2000] Caching policies on a single web server [Martin Arlitt et al. 2000] But not much on: Cooperative Object Caching in a tightly coupled cluster- based web server Impacts of object caching on the performance and scalability of a cluster-based web server

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Web Caching Web caching is one of the most effective ways to improve internet access quality Web caching is implemented at different levels of the Web hierarchy client-side (Web browsers) Web proxies Reverse proxies …

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Cooperative Caching Cooperative caching is an enhancement of traditional caching mechanisms Cooperative caching has been used in Web proxy caching systems and distributed file systems Cooperative Web proxy caching is usually employed in wide area networks Long network latency Relative unstable connection High communication cost It is impractical to adopt complicated caching algorithms in a WAN environment

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Cooperative Caching File systems usually cache data at fixed-size blocks, while Web object are of various size File system access pattern usually exhibits strong temporal locality of references LRU or its derivatives are usually well-enough to maintain the blocks in the cache Cooperative caching in distributed file system usually has complicated and sophisticated consistency model

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Global Object Space We propose a Global Object Space (GOS) in cluster based Web server On top of the physical memory of all the cluster nodes GOS is used to cache object content All Web objects in system are visible and accessible to every node through GOS Clusters are tightly connected by high-speed LANs, communication cost is relatively small It is possible to maintain more accurate system- wide information about the cached objects

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Hot Objects Web object access pattern exhibits strong concentration Some study shows that around 10% of the distinct documents are responsible for % of all requests received by a Web server We use “hot objects” to refer to objects that are frequently requested in a short interval

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Global Object Space Support Two tables are maintained in GOS at each node: Global Object Table (GOT) and Local Object Table (LOT) Home node of an object refers to the node holds the persistent copy of that object GOT keeps system wide information for all objects in system Home node of an object Location of cached copies of local objects The approximated global access counter

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Global Object Space Support LOT keeps access records for objects cached in current node’s hot object cache The approximated global access counter Local access counter Home node of the cached object

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Global Object Space Support GOS GOT for Node 1 Object URL/Partial URLAGACHN /node3/dir31/doc3.html3 /node4/dir414 /node1/dir11/doc1.html1051 ……… LOT for Node 1 Object URLLACAGACHN /node3/dir32/pic3.jpg /node1/dir11/doc1.html ……… GOT for Node 3 Object URLAGACHN /node3/dir31/doc3.html503 /node3/dir32/pic3.jpg2003 ………  AGAC: Approximated Global Access Counter  LAC : Local Access Counter  HN : Home Node 1234 HOC =

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Global Object Space Support Approximate global access information is maintained for every object cached in GOS Every object cached in GOS has a global access counter in the LOT of its home node Every cached object copy is associated with a local access counter in its caching node Local access counter is periodically sent back to objects’ home node to maintain an approximate global access counter for every cached object The approximate global object access counter will be fed back to the cached copy node after the global access counter updated

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Cache Replacement Policy A LFU-based algorithm, LFU-Aging, is used as cache replacement algorithm The global access counter is used to decide which object to be replaced Aging effect applies to the global access counter The cached objects are those most frequently accessed system wide, these are “hottest” ones

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Global Object Space Support “Hot objects” may have duplicated copies because The only criteria for whether caching an object or not is the global access counter returned or updated by the home node Cached objects’ life time is set according to HTTP specification’s definition Once an object becomes invalid before the predefined time, invalidation message is sent by home node to nodes caching the object Duplicated copies can keep consistent by using above consistency protocol

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Global Object Space Support Duplicated hot object copies will lead to Improved local cache hit rate in the request servicing node by caching objects with largest approximate global access counters Improved global hit rate by caching the “hottest objects” in global object space Void excessive inter-node object fetching with higher local cache hit rate Share the busty requests for hot objects to alleviate the workload of the home nodes holding many hot objects

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Proposed System Architecture Hardware OS JVM RHD GOSD Hardware OS JVM RHD GOSD HOC GOT … /dir323 LOT … … /dir32/pic3.jpg

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Proposed System Architecture RHD will accept all incoming HTTP request Parsed request is transferred to GOSD GOSD will try to find a cached copy in its own local memory If failed, GOSD will enquiry the home node for cached copy’s node number After fetching an object, GOSD will try to cache the object in its local memory according the caching policy

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Experiment Results Experiment Setup 32-node PC cluster. Each node consists of a 733MHz Pentium III PC running Linux The nodes are connected with an 80-port Cisco Catalyst 2980G Fast Ethernet Switch. 16 nodes acts as clients, and the rest as Web servers. Each of the server nodes has 392MB physical memory installed

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Experiment Results A preliminary prototype system has been implemented by modifying the W3C’s Jigsaw, version The clients are modified version of httperf. It performs stress test on the designated Web server based on a collected Web server log from a well- known academic site.

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Portability Pure-Java implementation makes our system portable to any system support Java Experiment was carried on a 4-node heterogeneous cluster consists of different hardware and software platform One Celeron PC running Microsoft Windows 2000 Professional One SMP PC with two Pentium Pro CPUs running RedHat Linux 6.1 Two PCs each with a Pentium II CPU running RedHat Linux 6.2

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Experiment Results Effects of Scaling the Cluster Size

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Experiment Results Effects of Scaling the Cluster Size The performance with GOS enabled is almost 3.7 times of that without GOS in the sixteen-node case Speedup is 4.77 when system scales from 2 nodes to 16 nodes with GOS support Speedup is only 2.02 without GOS support GOS has very positive impacts on the whole system performance and scalability.

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Experiment Results Effects of Scaling the Cache Size ~13% ~90%

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Experiment Results Effects of Scaling the Cache Size

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Experiment Results Effects of Scaling the Cache Size

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Experiment Results Effects of Scaling the Cache Size The largest total cache size is about 13% of the data set size, the cache hit rate reaches around 90% This confirms the early research observation that around 10% of the distinct objects account for 80%~95% of all the requests the server received The approximated global LFU algorithm is effective With a relatively small amount of memory in each node used for caching hot objects, we are able to obtain a high cache hit rate which increases the whole system’s performance considerably

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Experiment Results Requests Handle Pattern Three types of replied objects Local Cache Object Home Node Cache Object Home Node Disk Object

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Experiment Results Requests Handle Pattern 70%

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Experiment Results Requests Handle Pattern ~3.3% ~37%

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Experiment Results Requests Handle Pattern Local cache hit rate can reach as high as 70%, this greatly reduces response time, therefore, the whole system throughput It proves again that our approximated global LFU algorithm is very effective As the number of cluster nodes increases, the home node cache objects account for a larger portion of the total global cache hit rate It shows the advantage of adopting the cooperative caching technique in our system for improving the performance by reducing the disk operations

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Related Works DC-Apache (DCWS) [Quanzhong Li et al. 2001] Dynamically manipulate the hyperlinks stored within the Web documents Dynamically migrates documents to cooperative Web servers which are dedicated server nodes for sharing the load of the home server Localtion Aware Request Distribution in CS, Rice University Uses a front-end server as request dispatcher Request from a new client will be dispatched to one of the backend server nodes according to the LARD policy

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Related Works KAIST’s work [ Woo Hyun Ahn et al.: 2000] Cluster-based file system with cooperative caching designed for supporting parallel Web server systems Based on some hint-based file system cooperative cache Positive results are obtained from the simulation experiments indicating good load balancing and reduced disk access rates

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Conclusions Use of cluster wide physical memory as object cache can lead to improved performance and scalability of Web server systems With relatively small amount of memory dedicated for object content caching, we are able to achieve a high hit rate Cooperative caching can further improve the global hit rate

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Conclusions It is possible to provide extension to dynamic content caching by implementing more sophisticated object consistency protocol within the global object space Although our approach is proposed for Web server systems, the proposed caching mechanism is also suitable for building cluster based proxy servers

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Future Work Analytic Model of Cluster-based Web Server The existing model for a single web server based on the M/M/N/Q Markovian queuing system [Paul Reeser et al. 2000] A HTTP request is divided into four phases: TCP Phase, HTTP Phase, SE Phase, Network I/O Phase Extension of the above model to cluster-based Web server As so far, we concern with static contents only, SE phase can be removed from the model Instead, we should add anther phase after HTTP phase to reflect the possible inter-node communication

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Future Work Analytic Model of Cluster-based Web Server: The original model TCP HTTP SE Backend System IOC π

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Future Work Cache Consistency Currently, the prototype system use a combined approach of HTTP specification’s validation model and home-node-driven invalidation Some type of pushing caching mechanism can be adopted to avoid redundant inter- node object fetch

Ph.D Annual Talk CSIS HKU 19 th Nov., 2001 Copyright © 2001 Ge Chen, CSIS, HKU Future Work Further study of cooperative object caching Possible analytic model for cooperative object caching in a cluster environment Analysis of the overheads occurred in the proposed cooperative caching