(C) 2003 Mulitfacet ProjectUniversity of Wisconsin-Madison Simulating a $2M Commercial Server on a $2K PC Alaa Alameldeen, Milo Martin, Carl Mauer, Kevin.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Full-System Timing-First Simulation Carl J. Mauer Mark D. Hill and David A. Wood Computer Sciences Department University of Wisconsin—Madison.
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Coherence Ordering for Ring-based Chip Multiprocessors Mike Marty and Mark D. Hill University of Wisconsin-Madison.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Variability in Architectural Simulations of Multi-threaded Workloads Alaa R. Alameldeen and David A. Wood University of Wisconsin-Madison
1 Web Server Performance in a WAN Environment Vincent W. Freeh Computer Science North Carolina State Vsevolod V. Panteleenko Computer Science & Engineering.
Continuously Recording Program Execution for Deterministic Replay Debugging.
G Robert Grimm New York University Disco.
(C) 2002 Milo MartinHPCA, Feb Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.
Chapter 17 Parallel Processing.
(C) 2003 Milo Martin Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors Milo Martin, Pacia Harper,
Evaluating Non-deterministic Multi-threaded Commercial Workloads Computer Sciences Department University of Wisconsin—Madison
(C) 2004 Daniel SorinDuke Architecture Using Speculation to Simplify Multiprocessor Design Daniel J. Sorin 1, Milo M. K. Martin 2, Mark D. Hill 3, David.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
February 11, 2003Ninth International Symposium on High Performance Computer Architecture Memory System Behavior of Java-Based Middleware Martin Karlsson,
Interactions Between Compression and Prefetching in Chip Multiprocessors Alaa R. Alameldeen* David A. Wood Intel CorporationUniversity of Wisconsin-Madison.
Presented by Deepak Srinivasan Alaa Aladmeldeen, Milo Martin, Carl Mauer, Kevin Moore, Min Xu, Daniel Sorin, Mark Hill and David Wood Computer Sciences.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
Computer System Architectures Computer System Software
9/13/20151 Threads ICS 240: Operating Systems –William Albritton Information and Computer Sciences Department at Leeward Community College –Original slides.
DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
(C) 2003 Mulitfacet ProjectUniversity of Wisconsin-Madison Evaluating a $2M Commercial Server on a $2K PC and Related Challenges Mark D. Hill Multifacet.
1 CS503: Operating Systems Spring 2014 Dongyan Xu Department of Computer Science Purdue University.
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
Multi-core architectures. Single-core computer Single-core CPU chip.
1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.
SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
Simulating a $2M Commercial Server on a $2K PC Alaa R. Alameldeen, Milo M.K. Martin, Carl J. Mauer, Kevin E. Moore, Min Xu, Daniel J. Sorin, Mark D. Hill.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
(C) 2003 Daniel SorinDuke Architecture Dynamic Verification of End-to-End Multiprocessor Invariants Daniel J. Sorin 1, Mark D. Hill 2, David A. Wood 2.
1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi
Virtual Hierarchies to Support Server Consolidation Mike Marty Mark Hill University of Wisconsin-Madison ISCA 2007.
1 Admission Control and Request Scheduling in E-Commerce Web Sites Sameh Elnikety, EPFL Erich Nahum, IBM Watson John Tracey, IBM Watson Willy Zwaenepoel,
Simics: A Full System Simulation Platform Synopsis by Jen Miller 19 March 2004.
© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov.
Full and Para Virtualization
CMP/CMT Scaling of SPECjbb2005 on UltraSPARC T1 (Niagara) Dimitris Kaseridis and Lizy K. John The University of Texas at Austin Laboratory for Computer.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.
An Architectural Evaluation of Java TPC-W Harold “Trey” Cain, Ravi Rajwar, Morris Marden, Mikko Lipasti University of Wisconsin-Madison
Sunpyo Hong, Hyesoon Kim
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Evaluation – Metrics, Simulation, and Workloads Copyright 2004 Daniel.
Dynamic Verification of Sequential Consistency Albert Meixner Daniel J. Sorin Dept. of Computer Dept. of Electrical and Science Computer Engineering Duke.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Interactions with Microarchitectures and I/O Copyright 2004 Daniel.
Computer Sciences Department University of Wisconsin-Madison
OPERATING SYSTEMS CS 3502 Fall 2017
Lecture 2: Performance Evaluation
Chapter 9: The Client/Server Database Environment
Current Generation Hypervisor Type 1 Type 2.
Applying Control Theory to Stream Processing Systems
Processes and Threads Processes and their scheduling
Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors Milo Martin, Pacia Harper, Dan Sorin§, Mark.
The Client/Server Database Environment
Improving java performance using Dynamic Method Migration on FPGAs
Improving Multiple-CMP Systems with Token Coherence
Admission Control and Request Scheduling in E-Commerce Web Sites
Simulating a $2M Commercial Server on a $2K PC
Outline Chapter 2 (cont) OS Design OS structure
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Co-designed Virtual Machines for Reliable Computer Systems
Dynamic Verification of Sequential Consistency
Presentation transcript:

(C) 2003 Mulitfacet ProjectUniversity of Wisconsin-Madison Simulating a $2M Commercial Server on a $2K PC Alaa Alameldeen, Milo Martin, Carl Mauer, Kevin Moore, Min Xu, Daniel Sorin, Mark D. Hill, & David A. Wood Multifacet Project ( Computer Sciences Department University of Wisconsin—Madison January 2003

Wisconsin Multifacet Project Methods 2 Dangers of a Sabbatical Mark on Sabbatical at Universidad Politecnica de Catalunya (UPC) Mark Normally

Wisconsin Multifacet Project Methods 3 Context –Commercial server design is important –Multifacet project seeks improved designs –Must evaluate alternatives Commercial Servers –Processors, memory, disks  $2M –Run large multithreaded transaction-oriented workloads –Use commercial applications on commercial OS To Simulate on $2K PC –Scale & tune workloads –Manage simulation complexity –Cope with workload variability Summary Keep L2 miss rates, etc. Separate timing & function Use randomness & statistics

Wisconsin Multifacet Project Methods 4 Outline Context –Commercial Servers –Multifacet Project Workload & Simulation Methods Separate Timing & Functional Simulation Cope with Workload Variability Summary

Wisconsin Multifacet Project Methods 5 Why Commercial Servers? Many (Academic) Architects –Desktop computing –Wireless appliances We focus on servers – (Important Market) – Performance Challenges – Robustness Challenges – Methodological Challenges

Wisconsin Multifacet Project Methods 6 3-Tier Internet Service PCs w/ “soft” state Servers running databases for “hard” state Servers running applications for “business” rules LAN / SAN LAN / SAN Multifacet Focus

Wisconsin Multifacet Project Methods 7 Multifacet: Commercial Server Design Wisconsin Multifacet Project –Directed by Mark D. Hill & David A. Wood –Sponsors: NSF, WI, Compaq, IBM, Intel, & Sun –Current Contributors: Alaa Alameldeen, Brad Beckman, Pacia Harper, Milo Martin, Carl Mauer, Kevin Moore, Daniel Sorin, & Min Xu –Past Contributors: Anastassia Ailamaki, Ender Bilir, Ross Dickson, Ying Hu, Manoj Plakal, & Anne Condon Analysis –Want 4-64 processors –Many cache-to-cache misses –Neither snooping nor directories ideal Multifacet Designs –Snooping w/ multicast [ISCA99] or unordered network [ASPLOS00] – Bandwidth-adaptive [HPCA02] & token coherence [ISCA03]

Wisconsin Multifacet Project Methods 8 Outline Context Workload & Simulation Methods –Select, scale, & tune workloads –Transition workload to simulator –Specify & test the proposed design –Evaluate design with simple/detailed processor models Separate Timing & Functional Simulation Cope with Workload Variability Summary

Wisconsin Multifacet Project Methods 9 Multifacet Simulation Overview Virtutech Simics ( Rest is Multifacet software Full System Functional Simulator (Simics) Pseudo-Random Protocol Checker Memory Timing Simulator (Ruby) Processor Timing Simulator (Opal) Commercial Server (Sun E6000) Scaled WorkloadsFull Workloads Memory Protocol Generator (SLICC) Timing SimulatorProtocol Development Workload Development

Wisconsin Multifacet Project Methods 10 Select Important Workloads Online Transaction Processing: DB2 w/ TPC-C-like Java Server Workload: SPECjbb Static web content serving: Apache Dynamic web content serving: Slashcode Java-based Middleware: (soon) Full Workloads

Wisconsin Multifacet Project Methods 11 Setup & Tune Workloads (on real hardware) Tune workload, OS parameters Measure transaction rate, speed-up, miss rates, I/O Compare to published results Commercial Server (Sun E6000) Full Workloads

Wisconsin Multifacet Project Methods 12 Scale & Re-tune Workloads Scale-down for PC memory limits Retaining similar behavior (e.g., L2 cache miss rate) Re-tune to achieve higher transaction rates (OLTP: raw disk, multiple disks, more users, etc.) Commercial Server (Sun E6000) Scaled Workloads

Wisconsin Multifacet Project Methods 13 Transition Workloads to Simulation Create disk dumps of tuned workloads In simulator: Boot OS, start, & warm application Create Simics checkpoint (snapshot) Full System Functional Simulator (Simics) Scaled Workloads

Wisconsin Multifacet Project Methods 14 Specify Proposed Computer Design Coherence Protocol (control tables: states X events) Cache Hierarchy (parameters & queues) Interconnect (switches & queues) Processor (later) Memory Timing Simulator (Ruby) Memory Protocol Generator (SLICC)

Wisconsin Multifacet Project Methods 15 Test Proposed Computer Design Randomly select write action & later read check Massive false-sharing for interaction Perverse network stresses design Transient error & deadlock detection Sound but not complete Memory Timing Simulator (Ruby) Pseudo-Random Protocol Checker

Wisconsin Multifacet Project Methods 16 Simulate with Simple Blocking Processor Warm-up caches or sometimes sufficient (SafetyNet) Run for fixed number of transactions –Some transaction partially done at start –Other transactions partially done at end Cope with workload variability (later) Full System Functional Simulator (Simics) Memory Timing Simulator (Ruby) Scaled Workloads

Wisconsin Multifacet Project Methods 17 Simulate with Detailed Processor Accurate (future) timing & (current) function Simulation complexity decoupled (discussed soon) Same transaction methodology & work variability issues Full System Functional Simulator (Simics) Memory Timing Simulator (Ruby) Processor Timing Simulator (Opal) Scaled Workloads

Wisconsin Multifacet Project Methods 18 Simulation Infrastructure & Workload Process Select important workloads: run, tune, scale, & re-tune Specify system & pseudo-randomly test Create warm workload checkpoint Simulate with simple or detailed processor Fixed #transactions, manage simulation complexity (next), cope with workload variability (next next) Full System Functional Simulator (Simics) Memory Timing Simulator (Ruby) Processor Timing Simulator (Opal) Commercial Server (Sun E6000) Scaled WorkloadsFull Workloads Pseudo-Random Protocol Checker Memory Protocol Generator (SLICC)

Wisconsin Multifacet Project Methods 19 Outline Context Simulation Infrastructure & Workload Process Separate Timing & Functional Simulation –Simulation Challenges –Managing Simulation Complexity –Timing-First Simulation –Evaluation Cope with Workload Variability Summary

Wisconsin Multifacet Project Methods 20 Challenges to Timing Simulation Execution driven simulation is getting harder Micro-architecture complexity –Multiple “in-flight” instructions –Speculative execution –Out-of-order execution Thread-level parallelism –Hardware Multi-threading –Traditional Multi-processing

Wisconsin Multifacet Project Methods 21 Challenges to Functional Simulation Commercial workloads have high functional fidelity demands (Simulated) Target System Target Application Database Operating System SPEC Benchmarks Kernels Web Server Application complexity RAM Processor PCI Bus Ethernet Controller Fiber Channel Controller Graphics Card SCSI Controller CD- ROM SCSI Disk … DMA Controller Terminal I/O MMU Controller IRQ Controller Status Registers Serial PortMMU Real Time Clock

Wisconsin Multifacet Project Methods 22 Managing Simulator Complexity Functional Simulator Timing Simulator Functional-First (Trace-driven) - Timing feedback + Timing feedback - Tight Coupling - Performance? Timing and Functional Simulator Integrated (SimOS) - Complex Timing-Directed Functional Simulator Timing Simulator  Complete Timing  No? Function  No Timing  Complete Function Timing-First (Multifacet) Functional Simulator Timing Simulator  Complete Timing  Partial Function  No Timing  Complete Function + Timing feedback + Using existing simulators + Software development advantages

Wisconsin Multifacet Project Methods 23 Timing-First Simulation Timing Simulator –does functional execution of user and privileged operations –does speculative, out-of-order multiprocessor timing simulation –does NOT implement functionality of full instruction set or any devices Functional Simulator –does full-system multiprocessor simulation –does NOT model detailed micro-architectural timing Timing Simulator Functional Simulator Commit Verify CPU System RAM Network add load Cache CPU Execute

Wisconsin Multifacet Project Methods 24 Timing-First Operation As instruction retires, step CPU in functional simulator Verify instruction’s execution Reload state if timing simulator deviates from functional –Loads in multi-processors –Instructions with unidentified side-effects –NOT loads/store to I/O devices Timing Simulator Functional Simulator CPU System RAM Network add load Cache CPU Execute Commit Reload Verify

Wisconsin Multifacet Project Methods 25 Benefits of Timing-First Supports speculative multi-processor timing models Leverages existing simulators Software development advantages –Increases flexibility and reduces code complexity –Immediate, precise check on timing simulator However: –How much performance error is introduced in this approach? –Are there simulation performance penalties?

Wisconsin Multifacet Project Methods 26 Evaluation Our implementation, TFsim uses: –Functional Simulator: Virtutech Simics –Timing simulator: Implemented less than one-person year Evaluated using OS intensive commercial workloads –OS Boot: > 1 billion instructions of Solaris 8 startup –OLTP: TPC-C-like benchmark using a 1 GB database –Dynamic Web: Apache serving message board, using code and data similar to slashdot.org –Static Web: Apache web server serving static web pages –Barnes-Hut: Scientific SPLASH-2 benchmark

Wisconsin Multifacet Project Methods 27 Measured Deviations Less than 20 deviations per 100,000 instructions (0.02%)

Wisconsin Multifacet Project Methods 28 If the Timing Simulator Modeled Fewer Events

Wisconsin Multifacet Project Methods 29 Sensitivity Results

Wisconsin Multifacet Project Methods 30 Analysis of Results Runs full-system workloads! Timing performance impact of deviations –Worst case: less than 3% performance error ‘Overhead’ of redundant execution –18% on average for uniprocessors –18% (2 processors) up to 36% (16 processors) Total Execution Time Timing Simulator Functional Simulator

Wisconsin Multifacet Project Methods 31 Performance Comparison Absolute simulation performance comparison –In kilo-instructions committed per second (KIPS) –RSIM Scaled: 107 KIPS –Uniprocessor TFsim: 119 KIPS (Simulated) Target System Target Application Host Computer Out-of-Order MP SPARC V9 SPLASH-2 Kernels 400 MHz SPARC running Solaris Out-of-Order MP Full-system SPARC V9 SPLASH-2 Kernels 1.2 GHz Pentium running Linux match close different TFsimRSIM

Wisconsin Multifacet Project Methods 32 Bundled Retires

Wisconsin Multifacet Project Methods 33 Timing-First Conclusions Execution-driven simulators are increasingly complex How to manage complexity? Our answer: –Introduces relatively little performance error (worst case: 3%) –Has low-overhead (18% uniprocessor average) –Rapid development time Timing-First Simulation Functional Simulator Timing Simulator  Complete Timing  Partial Function  No Timing  Complete Function

Wisconsin Multifacet Project Methods 34 Outline Context Workload Process & Infrastructure Separate Timing & Functional Simulation Cope with Workload Variability –Variability in Multithreaded Workloads –Coping in Simulation –Examples & Statistics Summary

Wisconsin Multifacet Project Methods 35 What is Happening Here? OLTP

Wisconsin Multifacet Project Methods 36 What is Happening Here? How can slower memory lead to faster workload? Answer: Multithreaded workload takes different path –Different lock race outcomes –Different scheduling decisions (1) Does this happen for real hardware? (2) If so, what should we do about it?

Wisconsin Multifacet Project Methods 37 One Second Intervals (on real hardware) OLTP

Wisconsin Multifacet Project Methods Second Intervals (on real hardware) 16-day simulation OLTP

Wisconsin Multifacet Project Methods 39 Coping with Workload Variability Running (simulating) long enough not appealing Need to separate coincidental & real effects Standard statistics on real hardware –Variation within base system runs vs. variation between base & enhanced system runs –But deterministic simulation has no “within” variation Solution with deterministic simulation –Add pseudo-random delay on L2 misses –Simulate base (enhanced) system many times –Use simple or complex statistics

Wisconsin Multifacet Project Methods 40 Coincidental (Space) Variability

Wisconsin Multifacet Project Methods 41 Wrong Conclusion Ratio WCR (16,32) = 18% WCR (16,64) = 7.5% WCR (32,64) = 26%

Wisconsin Multifacet Project Methods 42 More Generally: Use Standard Statistics As one would for a measurement of a “live” system Confidence Intervals –95% confidence intervals contain true value 95% of the time –Non-overlapping confidence intervals give statistically significant conclusions Use ANOVA or Hypothesis Testing – even better!

Wisconsin Multifacet Project Methods 43 Confidence Interval Example Estimate #runs to get non-overlapping confidence intervals ROB

Wisconsin Multifacet Project Methods 44 Also Time Variability (on real hardware) Therefore, select checkpoint(s) carefully

Wisconsin Multifacet Project Methods 45 Workload Variability Summary Variability is a real phenomenon for multi-threaded workloads –Runs from same initial conditions are different Variability is a challenge for simulations –Simulations are short –Wrong conclusions may be drawn Our solution accounts for variability –Multiple runs, confidence intervals –Reduces wrong conclusion probability

Wisconsin Multifacet Project Methods 46 Talk Summary Simulations of $2M Commercial Servers must –Complete in reasonable time (on $2K PCs) –Handle OS, devices, & multithreaded hardware –Cope with variability of multithreaded software Multifacet –Scale & tune transactional workloads –Separate timing & functional simulation –Cope w/ workload variability via randomness & statistics References ( ) –Simulating a $2M Commercial Server on a $2K PC [Computer03] –Full-System Timing-First Simulation [Sigmetrics02] –Variability in Architectural Simulations … [HPCA03]

Wisconsin Multifacet Project Methods 47 Other Multifacet Methods Work Specifying & Verifying Coherence Protocols –[SPAA98], [HPCA99], [SPAA99], & [TPDS02] Workload Analysis & Improvement –Database systems [VLDB99] & [VLDB01] –Pointer-based [PLDI99] & [Computer00] –Middleware [HPCA03] Modeling & Simulation –Commercial workloads [Computer02] & [HPCA03] –Decoupling timing/functional simulation [Sigmetrics02] –Simulation generation [PLDI01] –Analytic modeling [Sigmetrics00] & [TPDS TBA] –Micro-architectural slack [ISCA02]

Wisconsin Multifacet Project Methods 48 Backup Slides

Wisconsin Multifacet Project Methods 49 One Ongoing/Future Methods Direction Middleware Applications –Memory system behavior of Java Middleware [HPCA 03] –Machine measurements –Full-system simulation Future Work: Multi-Machine Simulation –Isolate middle-tier from client emulators and database Understand fundamental workload behaviors –Drives future system design

Wisconsin Multifacet Project Methods 50 ECPerf vs. SpecJBB Different cache-to-cache transfer ratios!

Wisconsin Multifacet Project Methods 51 Online Transaction Processing (OLTP) DB2 with a TPC-C-like workload. The TPC-C benchmark is widely used to evaluate system performance for the on-line transaction processing market. The benchmark itself is a specification that describes the schema, scaling rules, transaction types and transaction mix, but not the exact implementation of the database. TPC-C transactions are of five transaction types, all related to an order-processing environment. Performance is measured by the number of “New Order” transactions performed per minute (tpmC). Our OLTP workload is based on the TPC-C v3.0 benchmark. We use IBM’s DB2 V7.2 EEE database management system and an IBM benchmark kit to build the database and emulate users. We build an 800 MB 4000-warehouse database on five raw disks and an additional dedicated database log disk. We scaled down the sizes of each warehouse by maintaining the reduced ratios of 3 sales districts per warehouse, 30 customers per district, and 100 items per warehouse (compared to 10, 30,000 and 100,000 required by the TPC-C specification). Each user randomly executes transactions according to the TPC-C transaction mix specifications, and we set the think and keying times for users to zero. A different database thread is started for each user. We measure all completed transactions, even those that do not satisfy timing constraints of the TPC-C benchmark specification.

Wisconsin Multifacet Project Methods 52 Java Server Workload (SPECjbb) Java-based middleware applications are increasingly used in modern e- business settings. SPECjbb is a Java benchmark emulating a 3-tier system with emphasis on the middle tier server business logic. SPECjbb runs in a single Java Virtual Machine (JVM) in which threads represent terminals in a warehouse. Each thread independently generates random input (tier 1 emulation) before calling transaction- specific business logic. The business logic operates on the data held in binary trees of java objects (tier 3 emulation). The specification states that the benchmark does no disk or network I/O. We used Sun’s HotSpot Server JVM and Solaris’s native thread implementation. The benchmark includes driver threads to generate transactions. We set the system heap size to 1.8 GB and the new object heap size to 256 MB to reduce the frequency of garbage collection. Our experiments used 24 warehouses, with a data size of approximately 500 MB.

Wisconsin Multifacet Project Methods 53 Static Web Content Serving: Apache Web servers such as Apache represent an important enterprise server application. Apache is a popular open-source web server used in many internet/intranet settings. In this benchmark, we focus on static web content serving. We use Apache for SPARC/Solaris 8 configured to use pthread locks and minimal logging at the web server. We use the Scalable URL Request Generator (SURGE) as the client. SURGE generates a sequence of static URL requests which exhibit representative distributions for document popularity, document sizes, request sizes, temporal and spatial locality, and embedded document count. We use a repository of 20,000 files (totalling ~500 MB), and use clients with zero think time. We compiled both Apache and Surge using Sun’s WorkShop C 6.1 with aggressive optimization.

Wisconsin Multifacet Project Methods 54 Dynamic Web Content Serving: Slashcode Dynamic web content serving has become increasingly important for web sites that serve large amount of information. Dynamic content is used by online stores, instant news, and community message board systems. Slashcode is an open-source dynamic web message posting system used by the popular slashdot.org message board system. We used Slashcode 2.0, Apache , and Apache’s mod_perl module 1.25 (with perl 5.6) on the server side. We used MySQL as the database engine. The server content is a snapshot from the slashcode.com site, containing approximately 3000 messages with a total size of 5 MB. Most of the run time is spent on dynamic web page generation. We use a multi-threaded user emulation program to emulate user browsing and posting behavior. Each user independently and randomly generates browsing and posting requests to the server according to a transaction mix specification. We compiled both server and client programs using Sun’s WorkShop C 6.1 with aggressive optimization.