p-Jigsaw: A Cluster-based Web Server with Cooperative Caching Supports Ge Chen, Cho-Li Wang, Francis C.M. Lau (Presented by Cho-Li Wang) The Systems Research.

Slides:

Advertisements

Similar presentations

IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.

Advertisements

Scheduling in Web Server Clusters CS 260 LECTURE 3 From: IBM Technical Report.

Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.

High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.

2. Computer Clusters for Scalable Parallel Computing

Beowulf Supercomputer System Lee, Jung won CS843.

Dinker Batra CLUSTERING Categories of Clusters. Dinker Batra Introduction A computer cluster is a group of linked computers, working together closely.

1 Content Delivery Networks iBAND2 May 24, 1999 Dave Farber CTO Sandpiper Networks, Inc.

1 Routing and Scheduling in Web Server Clusters. 2 Reference The State of the Art in Locally Distributed Web-server Systems Valeria Cardellini, Emiliano.

How’s My Network (HMN)? A Java approach to Home Network Measurement Alan Ritacco, Craig Wills, and Mark Claypool Computer Science Department Worcester.

1 Virtual Machine Resource Monitoring and Networking of Virtual Machines Ananth I. Sundararaj Department of Computer Science Northwestern University July.

Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.

Web Server Administration

1 A Comparison of Load Balancing Techniques for Scalable Web Servers Haakon Bryhni, University of Oslo Espen Klovning and Øivind Kure, Telenor Reserch.

OCT1 Principles From Chapter One of “Distributed Systems Concepts and Design”

Squirrel: A decentralized peer- to-peer web cache Paul Burstein 10/27/2003.

Caching And Prefetching For Web Content Distribution Presented By:- Harpreet Singh Sidong Zeng ECE Fall 2007.

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.

World Wide Web Caching: Trends and Technology Greg Barish and Katia Obraczka USC Information Science Institute IEEE Communications Magazine, May 2000 Presented.

Dynamic Load Balancing on Web-server Systems Valeria Cardellini, Michele Colajanni, and Philip S. Yu Presented by Sui-Yu Wang.

Locality-Aware Request Distribution in Cluster-based Network Servers Presented by: Kevin Boos Authors: Vivek S. Pai, Mohit Aron, et al. Rice University.

RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.

Hands-On Microsoft Windows Server 2008 Chapter 8 Managing Windows Server 2008 Network Services.

Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.

Chapter 2 Computer Clusters Lecture 2.1 Overview.

1 Content Distribution Networks. 2 Replication Issues Request distribution: how to transparently distribute requests for content among replication servers.

Department of Computer Science Southern Illinois University Edwardsville Dr. Hiroshi Fujinoki and Kiran Gollamudi {hfujino,

1 Proceeding the Second Exercises on Computer and Systems Engineering Professor OKAMURA Laboratory. Othman Othman M.M.

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.

MySQL and PHP Internet and WWW. Computer Basics A Single Computer.

Scalable Web Server on Heterogeneous Cluster CHEN Ge.

G-JavaMPI: A Grid Middleware for Distributed Java Computing with MPI Binding and Process Migration Supports Lin Chen, Cho-Li Wang, Francis C. M. Lau and.

Peer-to-Peer Distributed Shared Memory? Gabriel Antoniu, Luc Bougé, Mathieu Jan IRISA / INRIA & ENS Cachan/Bretagne France Dagstuhl seminar, October 2003.

Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.

A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.

Heavy and lightweight dynamic network services: challenges and experiments for designing intelligent solutions in evolvable next generation networks Laurent.

Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data- Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan,

Srihari Makineni & Ravi Iyer Communications Technology Lab

Building a Scalable Web Server with Global Object Space Support on Heterogeneous Clusters Ge Chen, Cho-Li Wang, Francis. C. M. Lau Department Of Computer.

A P2P-Based Architecture for Secure Software Delivery Using Volunteer Assistance Purvi Shah, Jehan-François Pâris, Jeffrey Morgan and John Schettino IEEE.

Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University.

DYNAMIC LOAD BALANCING ON WEB-SERVER SYSTEMS by Valeria Cardellini Michele Colajanni Philip S. Yu.

 Cachet Technologies 1998 Cachet Technologies Technology Overview February 1998.

1 MSRBot Web Crawler Dennis Fetterly Microsoft Research Silicon Valley Lab © Microsoft Corporation.

A Grid-enabled Multi-server Network Game Architecture Tianqi Wang, Cho-Li Wang, Francis C.M.Lau Department of Computer Science and Information Systems.

I NTRODUCTION TO N ETWORK A DMINISTRATION. W HAT IS A N ETWORK ? A network is a group of computers connected to each other to share information. Networks.

Building a Scalable Web Server with Global Object Space Support on Heterogeneous Clusters Ph.D Annual Talk Ge CHEN CSIS HKU.

MiddleMan: A Video Caching Proxy Server NOSSDAV 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Soam Acharya Inktomi Corporation.

COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.

/ Fast Web Content Delivery An Introduction to Related Techniques by Paper Survey B Li, Chien-chang R Sung, Chih-kuei.

CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.

1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.

John S. Otto Mario A. Sánchez John P. Rula Fabián E. Bustamante Northwestern, EECS.

SYSTEM MODELS FOR ADVANCED COMPUTING Jhashuva. U 1 Asst. Prof CSE

The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.

Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING

Memory COMPUTER ARCHITECTURE

Netscape Application Server

Distributed Network Traffic Feature Extraction for a Real-time IDS

CHAPTER 3 Architectures for Distributed Systems

Memory Management for Scalable Web Data Servers

Advanced Operating Systems

Outline Midterm results summary Distributed file systems – continued

EE 122: Lecture 22 (Overlay Networks)

CSE 542: Operating Systems

Presentation transcript:

p-Jigsaw: A Cluster-based Web Server with Cooperative Caching Supports Ge Chen, Cho-Li Wang, Francis C.M. Lau (Presented by Cho-Li Wang) The Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

2 What ’ s a cluster ?  A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand- alone/complete computers cooperatively working together as a single, integrated computing resource – IEEE TFCC.

3 Rich Man ’ s Cluster  Computational Plant (C-Plant cluster)  Rank: 30 at TOP500 (11/2001)  1536 Compaq DS10L 1U servers (466 MHz Alpha (EV6) microprocessor, 256 MB ECC SDRAM)  Each node contains a 64-bit, 33 MHz Myrinet network interface card (1.28 Gbps/s) connected to a 64-port Mesh64 switch. 48 cabinets, each of which contains 32 nodes (48x32=1536)

4 Poor Man ’ s Cluster  HKU Linux Cluster  MHz Pentium III PCs, 392MB Memory  Hierarchical Ethernet-based network : four 24-port Fast Ethernet switches + one 8-port Gigabit Ethernet backbone switch)  Additional 80-port Cisco Catalyst 2980G Fast Ethernet Switch

5 Cluster vs. Supercomputer  Supercomputer : a HPC system capable of efficiently processing large- scale technical computing problems. E.g., ASCI machines, IBM Blue Gene.  Cluster : a HPC system that integrates mainstream commodity components to process large-scale general problems, including technical computing, business applications, and networking services.

6 Cluster Computer Architecture High-Speed LAN (Fast/Gigabit Ethernet, SCI, Myrinet) Availability Infrastructure Single System Image Infrastructure Programming Environment (Java, C, MPI, HPF) Web Windows User Interface Other Subsystems (Database, Web server OLTP, etc.) OS Node OS Node OS Node OS Node

7 Talk Outline  Motivation -- The Need for Speed  Cluster-based Solutions  System Architecture of p-Jigsaw  Performance Evaluation  Conclusion and Future Work  Other SRG Projects

8 The Challenges  Netscape Web site in November 1996: 120 million hits per day  Microsoft Corp. Web site received more than 100 M hits per day. (1,200 hits per second)  Olympic Winter Games 1998 (Japan): 634.7M (16 days), peak day 57M, peak minute 110K  Winbledon July 1999, 942 M hits (14 days), peak day 125M, peak minute 430K  Olympic Games 2000 : peak day M, peak minute 600K hits. (10K hits per second)

9 The Need for Speed  Internet user popularity is growing very fast According to United States Internet Council ’ s report, regular Internet user has increased from less then 9M in 1993 to more than 300M in the summer of 2000, and is still growing fast  Broadband becomes popular According IDG ’ s report, 57% of the workers in U.S access Internet via broadband in office. The figure will be more than 90% by Home broadband user will also increase from less than 9M now to over 55M by 2005  HTTP requests account for larger portion of Internet traffic now One study shows that HTTP activity has grown to account for 75%~80% of all Internet traffic

10 Internet Still Growing Dramatically  Source: Dr. Lawrence Roberts, Caspian Networks; August 15th, 2001 Internet traffic growth rate increased from 2.8 to 4 per year in Faster than the average of 2.8 times per year since the Internet began aggressively in 1997 Internet traffic growth rate continued to grow at 4 per year through Q  Service providers have capital problems They can only avoid equipment purchases for a short period They must buy equipment soon to hold market share

11 The Need for Speed  The Need for Speed Growing user number Faster last-mile connection speed Increasing portion of HTTP requests accounts for all Internet traffic  Require a more powerful Web server architecture

Cluster-Based Solution Cluster -- A Low-cost yet efficient parallel computing architecture

13 Cluster-based Solutions  Cluster-based Web Server System DNS-based Dispatcher-based (More references: V. Cardellini, M. Colajanni, and P.S. Yu, "Dynamic Load Balancing on Web-Server Systems", IEEE Internet Computing, May/June 1999, pp )

14 DNS-based Approach 2 Cluster DNS client Intermediate name servers 1 Step 1: Address request (URL) Step 1’: Address request reaches the DNS Step 2: (Web-server IP address, TTL) selection Step 3: Address mapping (URL --> IP address 1) Step 3’: Address mapping (URL --> IP address 1) Step 4: Document request (IP address 1) Step 5: Document response (IP address 1) 1’ 3’3 Server 1 (IP address 1) Server N (IP address N) 4 5 5

15 Dispatcher-based Dispatcher A network component of the Web-server system acts as a dispatcher which routes the requests to one of the Web servers to fulfill the load balancing. Each Web server works individually. Internet client Layer 4 switching with level 2 address translation : One-IP, IBM eNetwork, WebMux, LVS in DR mode Layer 4 switching with level 3 address translation : Cisco LocalDirector, Alteon ACEDirector, F5 Big/IP, LVS in NAT mode. Layer 7 switching (Content- based): LARD, IBM Web Accelerator, Zeus Load Balancer (ZLB)

16 Previous Researches Focus  Previous researches mainly focus on: Load Distribution/Balancing [H. Bryhni et al. 2000] Scalability [Trevor Schroeder et al. 2000] High Availability [Guillaume Pierre et al. 2000] Caching policies on a single web server [Martin Arlitt et al. 2000]

17 p-Jigsaw -- Goals  High Efficiency: Explore aggregate power of cluster resources (CPU, memory, disk, network bandwidth). Explore in-memory Web caching on cluster-based Web servers  High Scalability Maintain high cache hit rate and high throughput as cluster size grows Eliminate potential bottleneck in the overall design  High Portability Multi-platform support Heterogeneous cluster

18 Main Features of p-Jigsaw Web servers  Global Object Space (GOS)  Hot Objects Caching  Cooperative Object Caching  Distributed Cache Replacement Algorithms

19 Global Object Space (All Web objects in system are visible and accessible to every node through GOS) OSOS JVMJVM p-Jigsawp-Jigsaw OSOS JVMJVM p-Jigsawp-Jigsaw OSOS JVMJVM p-Jigsawp-Jigsaw OSOS JVMJVM p-Jigsawp-Jigsaw Server Node High-Speed LAN Global Object Space Memory Cache Hot Object Cache

20 Hot Objects  Web object access pattern exhibits strong concentration Some study shows that around 10% of the distinct documents are responsible for 80-95% of all requests received by a Web server [Arlitt et al. 1997]  “ hot objects ” refer to objects that are frequently requested in a short time interval – or “ focus of interests ”  “ Hot objects ” may have duplicated copies  Improve hit rate -  Avoid excessive inter-node object fetching  Share the busty requests for hot objects to alleviate the workload of the home nodes holding many hot objects  Realized by distributed cache replacement algorithms (To be discussed)

21 Construction of GOS  Two tables are maintained at each node: Global Object Table (GOT) and Local Object Table (LOT)  GOT keeps system wide information for mapping object URL or partial URL to their home node ( HN ) or servers with the cached copy ( CCNN : Cache Copy Node Number ), and approximated global access counter ( AGAC ) for cache replacement.  Home node of an object refers to the node holds the persistent copy of that object GOT for Node 1 Object URL/Partial URLAGACHNCCNN /node3/dir31/doc3.html3nil /node4/dir414nil /node1/dir11/doc1.html105 11,4 ………

22 Construction of GOS  LOT keeps access records for objects cached in current node ’ s hot object cache A copy of the approximated global access counter Local access counter (LAC) Home node of the cached object (HN) Local Object Table for Node 1 Object URL AGACLACHN /node3/dir31/fig311.jpg /node1/dir12/fig …………

23 Incoming Request: Hard disk Hot Object Cache Node 1 11 Local Object Table for Node 1 Object URL AGACLACHN /node3/dir31/fig311.jpg /node1/dir12/fig ………… 22 Node 2 Node 3 Node 4 Global Object Table for Node 1 Object URL HNAGACCCNN /node4/dir2 4 ………… /node1/dir12/fig121.bmp Hard disk Hot Object Cache Hard disk Hot Object Cache Hard disk Hot Object Cache Miss! 33 Global Object Table for Node 4 Object URL HNAGACCCNN ………… /node4/dir12/pic1.jpg HN : Home Node AGAC : Approximated Global Access Counter CCNN : Cache Copy Node Number LAC : Local Access Counter Redirect the request to node 4 (Home node of the requested page) Search on GOT1 (Hashing) Cached copy is forwarded from node 2,3, or 4, depends on the server workload 55 Cached in node 1

24 Distributed Cache Replacement  Two LFU-based Algorithms are Implemented: LFU-Aging : AGAC/ 2 ; every △ t Weighted-LFU : AGAC/ (file size) Global LRU (GLRU) is implemented for comparison  Try to cache the “ hottest objects ” in global object space  Cached object ’ s life time is set according to HTTP timestamp. Cache consistency is maintained by invalidation scheme.

25 Update of Access Counters GOS GOT for Node 1 Object URL/Partial URLAGACHN /node3/dir31/doc3.html3 /node4/dir414 /node1/dir11/doc1.html1051 ……… LOT for Node 1 Object URLLACAGACHN /node3/dir32/pic3.jpg /node1/dir11/doc1.html ……… GOT for Node 3 Object URLAGACHN /node3/dir31/doc3.html503 /node3/dir32/pic3.jpg2003 ……… 1234 HOC = LAC is periodically sent back to objects ’ HN to maintain an approximate global access counter for every cached object

26 p-Jigsaw Implementation  A preliminary prototype system has been implemented by modifying the W3C ’ s Jigsaw, version – all written in Java.  The clients are modified version of httperf. It performs stress test on the designated Web server  Test Data: Web server logs from Berkeley CS Total size: GB Number of files : 89,689 Average file size: 80,912 bytes Number of requests: ~640,000 Data transferred: ~35GB. Distinct files requested: 52,347

27 Experiment Setup  32-node PC cluster. Each node consists of a 733MHz Pentium III PC running Linux  The nodes are connected with an 80-port Cisco Catalyst 2980G Fast Ethernet Switch.  A NFS server (2-way SMP) with Gigabit Ethernet link to the switch.  16 nodes acts as clients, and the rest as Web servers.  Each of the server nodes has 392MB physical memory installed 32-node PC cluster

28 Experiment Results Effects of Scaling the Cluster Size

29 Experiment Results Effects of Scaling the Cache Size Aggregated Cache size for 16 nodes = 1.8% (8 MB per node), 3.6%, 7.2%, and 14.4% (64 MB per node) of the size of the data set

30 Effects of Scaling the Cache Size  The largest total cache size is about 14.4% of the data set size, the cache hit rate reaches around 88%  This confirms the early research observation that around 10% of the distinct objects account for 80%~95% of all the requests the server received  The approximated global LFU algorithm with cooperative caching support is effective  With a relatively small amount of memory in each node used for caching hot objects, we are able to obtain a high cache hit rate which increases the whole system ’ s performance considerably

31 Analysis of Requests Handle Patterns  Local Cache Object (in local memory) The server that receives the request has the requested object in its local hot object cache.  Peer Node Cache Object (in remote memory) The server that receives the request does not have the requested object in its local hot object cache. The object is fetched from either the home node or a or other peer nodes.  Disk Object (local or remote disk) The requested object is not in the global object space, and has to be fetched from the file server. This has the longest serving time.

32 Analysis of Requests Handle Patterns ~60% LFU-based algorithms show high local cache hit rates. With 64 MB cache per node, the local cache hit rate is around 60% for both Weighted-LFU and LFU-Aging,

33 Analysis of Requests Handle Patterns ~6.7% ~35.2% ~50% ~25% With small cache size (8MB), the cooperative cache can improve the global cache hit rate and reduce the costly file server disk access, which is a common bottleneck for a website.

34 Analysis of Requests Handle Patterns  GLRU shows a much lower local cache hit rate than the LFU-based algorithms.  GLRU does achieve nearly the same global cache hit rate as the LFU-based algorithms when the cache space is large (64 MB): 52% vs. 60% in 16-node case  Local cache hit rate for LRU-based algorithms drops much faster than that for LFU-based - 20% vs. 40% in 16-node case ~52% ~20%

35 Experiment Results For the 16-node case, the local cache hit rate for GLRU drops from around 52% to around 20%, while that for Weighted-LFU with CC only drops from around 60% to around 40%. ~60% ~40%

36 Conclusions  Use of cluster wide physical memory as object cache can lead to improved performance and scalability of Web server systems  With relatively small amount of memory dedicated for object content caching, we are able to achieve a high hit rate with the cooperative caching  Favor replicating more hot objects rather than squeezing more different objects into the global object space.

37 Future Work  The HKU “ Hub2World ” Project Build a giant proxy cache server on a large PC cluster with HKU ’ s 300- node Gideon cluster based on p- Jigsaw Cache hot objects on 150 GB in- memory cache (0.5GB x 300) + 12 Terabytes disk space (40GB x 300) Design of new caching algorithms

Other SRG Projects Welcome to download our software packages and test them on your clusters. URL:

39 Current SRG Clusters

40 JESSICA2 – A Distributed JVM JVM Thread Migration Global Object Space JVM A Multithreaded Java Program

41 JUMP Software DSM  Allows programmers to assume a globally shared virtual memory, even they execute program on nodes that do not physically share memory  The DSM system will maintain the memory consistency among different machines. Data faulting, location, and movement are handled by the DSM. Proc 1 Mem 1 Proc N Mem N Proc N-1 Mem N-1 Proc 2 Mem 2 Network Globally Shared Virtual Memory

42 HKU DP-II on Gigabit Ethernet Single-trip Latency Test (Min: 16.3 µs ) Bandwidth Test (Max: 79.5 MB/s) RWCP GigaE PM : 48.3 us round-trip latency and 56.7 MB/s on Essential Gigabit Ethernet NIC Pentium II 400 MHz. RWCP GigaE PM II : 44.6 us round trip time MB/s bandwidth on Packet Engines G-NIC II for connecting Compaq XP-1000 (Alpha at 500 MHz.),.

43 SPARKLE Project  A Dynamic Software Architecture for Pervasive Computing – “ Computing in Small ” Won’t Fit Application Applications distributed as monolithic blocks Our component-based solution Facets

44 SPARKLE Project Facet Servers Execution Servers Computational Grid Co-operative Caching (User Mobility) Intelligent proxies Facet Query Facet Retrieval Service Providers Delegation/ Mobiie code Peer-to-Peer Interaction Clients (Linux + JVM) Overview of the proposed software architecture

45 ClusterProbe : Cluster Monitoring Tool

Q&A For more information, please visit