Simulating a $2M Commercial Server on a $2K PC Alaa R. Alameldeen, Milo M.K. Martin, Carl J. Mauer, Kevin E. Moore, Min Xu, Daniel J. Sorin, Mark D. Hill.

Slides:



Advertisements
Similar presentations
Database Tuning Principles, Experiments and Troubleshooting Techniques Baseado nos slides do tutorial com o mesmo nome da autoria de: Dennis Shasha
Advertisements

Full-System Timing-First Simulation Carl J. Mauer Mark D. Hill and David A. Wood Computer Sciences Department University of Wisconsin—Madison.
Erhan Erdinç Pehlivan Computer Architecture Support for Database Applications.
VSphere vs. Hyper-V Metron Performance Showdown. Objectives Architecture Available metrics Challenges in virtual environments Test environment and methods.
Variability in Architectural Simulations of Multi-threaded Workloads Alaa R. Alameldeen and David A. Wood University of Wisconsin-Madison
Mohammed Abouzour, Kenneth Salem, Peter Bumbulis Presentation by Mohammed Abouzour SMDB2010.
NUMA Tuning for Java Server Applications Mustafa M. Tikir.
Meanwhile RAM cost continues to drop Moore’s Law on total CPU processing power holds but in parallel processing… CPU clock rate stalled… Because.
(C) 2003 Mulitfacet ProjectUniversity of Wisconsin-Madison Simulating a $2M Commercial Server on a $2K PC Alaa Alameldeen, Milo Martin, Carl Mauer, Kevin.
(C) 2002 Milo MartinHPCA, Feb Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.
CS 491B Project Web Galaxy Wendy Tan Web Galaxy Project Introduction Demo Analysis.
(C) 2003 Milo Martin Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors Milo Martin, Pacia Harper,
Evaluating Non-deterministic Multi-threaded Commercial Workloads Computer Sciences Department University of Wisconsin—Madison
Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 1: The new mainframe.
(C) 2004 Daniel SorinDuke Architecture Using Speculation to Simplify Multiprocessor Design Daniel J. Sorin 1, Milo M. K. Martin 2, Mark D. Hill 3, David.
February 11, 2003Ninth International Symposium on High Performance Computer Architecture Memory System Behavior of Java-Based Middleware Martin Karlsson,
Using Standard Industry Benchmarks Chapter 7 CSE807.
Interactions Between Compression and Prefetching in Chip Multiprocessors Alaa R. Alameldeen* David A. Wood Intel CorporationUniversity of Wisconsin-Madison.
Presented by Deepak Srinivasan Alaa Aladmeldeen, Milo Martin, Carl Mauer, Kevin Moore, Min Xu, Daniel Sorin, Mark Hill and David Wood Computer Sciences.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
Selling the Database Edition for Oracle on HP-UX November 2000.
Selecting and Implementing An Embedded Database System Presented by Jeff Webb March 2005 Article written by Michael Olson IEEE Software, 2000.
Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz.
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
Continuous resource monitoring for self-predicting DBMS Dushyanth Narayanan 1 Eno Thereska 2 Anastassia Ailamaki 2 1 Microsoft Research-Cambridge, 2 Carnegie.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Windows 2000 Advanced Server and Clustering Prepared by: Tetsu Nagayama Russ Smith Dale Pena.
(C) 2003 Mulitfacet ProjectUniversity of Wisconsin-Madison Evaluating a $2M Commercial Server on a $2K PC and Related Challenges Mark D. Hill Multifacet.
Database Edition for Sybase Sales Presentation. Market Drivers DBAs are facing immense time pressure in an environment with ever-increasing data Continuous.
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
Computer Measurement Group, India Optimal Design Principles for better Performance of Next generation Systems Balachandar Gurusamy,
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.
1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.
SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
QoS Enabled Application Server The Controller Service Bologna, February 19 th 2004.
Oracle RAC and Linux in the real enterprise October, 02 Mark Clark Director Merrill Lynch Europe PLC Global Database Technologies October, 02 Mark Clark.
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
DONE-08 Sizing and Performance Tuning N-Tier Applications Mike Furgal Performance Manager Progress Software
Authors: Stavros HP Daniel J. Yale Samuel MIT Michael MIT Supervisor: Dr Benjamin Kao Presenter: For Sigmod.
(C) 2003 Daniel SorinDuke Architecture Dynamic Verification of End-to-End Multiprocessor Invariants Daniel J. Sorin 1, Mark D. Hill 2, David A. Wood 2.
Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Srik Raghavan Principal Lead Program Manager Kevin Cox Principal Program Manager SESSION CODE: DAT206.
Providing Differentiated Levels of Service in Web Content Hosting Jussara Almeida, etc... First Workshop on Internet Server Performance, 1998 Computer.
1 Admission Control and Request Scheduling in E-Commerce Web Sites Sameh Elnikety, EPFL Erich Nahum, IBM Watson John Tracey, IBM Watson Willy Zwaenepoel,
Scaling up analytical queries with column-stores Ioannis Alagiannis Manos Athanassoulis Anastasia Ailamaki École Polytechnique Fédérale de Lausanne.
08-Nov Database TEG workshop, Nov 2011 ATLAS Oracle database applications and plans for use of the Oracle 11g enhancements Gancho Dimitrov.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 1 Main Frame Computing Objectives Explain why data resides on mainframe.
An Architectural Evaluation of Java TPC-W Harold “Trey” Cain, Ravi Rajwar, Morris Marden, Mikko Lipasti University of Wisconsin-Madison
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Evaluation – Metrics, Simulation, and Workloads Copyright 2004 Daniel.
Dynamic Verification of Sequential Consistency Albert Meixner Daniel J. Sorin Dept. of Computer Dept. of Electrical and Science Computer Engineering Duke.
Computer Performance. Hard Drive - HDD Stores your files, programs, and information. If it gets full, you can’t save any more. Measured in bytes (KB,
Providing Differentiated Levels of Service in Web Content Hosting J ussara Almeida, Mihaela Dabu, Anand Manikutty and Pei Cao First Workshop on Internet.
Taeho Kgil, Trevor Mudge Advanced Computer Architecture Laboratory The University of Michigan Ann Arbor, USA CASES’06.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Computer Sciences Department University of Wisconsin-Madison
Lecture 2: Performance Evaluation
Abhinav Kamra, Vishal Misra CS Department Columbia University
Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors Milo Martin, Pacia Harper, Dan Sorin§, Mark.
HPE Persistent Memory Microsoft Ignite 2017
Moodle Scalability What is Scalability?
Improving Multiple-CMP Systems with Token Coherence
Admission Control and Request Scheduling in E-Commerce Web Sites
Simulating a $2M Commercial Server on a $2K PC
Performance And Scalability In Oracle9i And SQL Server 2000
Dynamic Verification of Sequential Consistency
Chapter 1: Introduction
Presentation transcript:

Simulating a $2M Commercial Server on a $2K PC Alaa R. Alameldeen, Milo M.K. Martin, Carl J. Mauer, Kevin E. Moore, Min Xu, Daniel J. Sorin, Mark D. Hill and David A. Wood IEEE Computer – November 22, 2002

Commercial Workloads Business and Communication Infrastructure –DBMS –Web Servers Designed to run on High End Servers –TPC-C leader 128 Processors, 256 GB RAM, 29 TB Disk, $13M 100M Transactions in 25min warm-up + 2h Simulation on Standard PC 1-2 Processors, 1GB RAM, 120GB Disk, $2K

Simulation of Commercial Workloads Challenges –Size of Workload –Running Time –Requires Full System Simulation Highly dependent on OS, I/O Goals –Representative Approximation –Tractable Simulation Times –Sufficient Level of Detail

Wisconsin Commercial Workload Suite Online Transaction Processing (OLTP) –DB2 with TPC-C like workload SPECjbb –3-tier Java-based Middleware Static Web Content: Apache –SURGE generated requests Dynamic Web Content Serving: Slashcode –OpenSource dynamic web message posting –Perl, Apache, MySQL

Workload Scaling and Tuning Tuning of all workloads on real MP-Server –TPC-C on Sun E CPUs (167Mhz), 2GB RAM Disk images of real system used for Sim –Allows Validation of Results –Faster Benchmark Setup Initial Setup 10 Warehouses, 100MB each –Lower Throughput than expected

TPC-C Tuning Kernel and Database Configuration –Kernel limits on number of threads, semaphores, etc. –DB on raw disk Multiple Disks –DB spread over 5 disks Table Contention Reduction –More and smaller warehouses Same total DB size –Against TPC-C rule Size per warehouse is fixed

TPC-C Tuning Additional Concurrency –Number of simulated clients increased from 24 to 96 –No think or keying times Overall –Throughput increased by factor 12 Close to published results –More representative of real OLTP workload

OLTP Throughput

Workload Runtime Simulation slow-down around –Full TPC-C run (2h real) infeasible Long warm-up periods Short Simulation introduces high variability

Simulation Improvements Starting with Warm Workloads –Start from snapshot of warmed-up system Fixed Transaction Count –Simulate fixed number of transactions –Applications must notify simulator, when transactions complete

Variability Simulation executes 1 deterministic path –Path could favor certain configurations Average over multiple short simulation runs –Introduce artificial variability in memory access times Can run multiple short simulations in parallel –Preferable to one long simulation run

Timing Simulation Complex for full system simulation –Functional Simulation with Simics –Timing Simulation with 2 additional Sims CPU Timing Memory Timing Timing-First Simulation –Timing Simulator Controls when functional simulator can advance Solves races Validates functional simulator –Average Timing Error < 0.001%

Conclusion Commercial Workloads are essential for MP design –Biggest Market for MP systems Simulation on low-cost PC is hard Wisconsin Commercial Workload Suite approximates behaviour

Questions If TPC-C has to run 2h for official results, how reliable is an average over a couple of seconds? Should disk timing be simulated?