An Architectural Evaluation of Java TPC-W Harold “Trey” Cain, Ravi Rajwar, Morris Marden, Mikko Lipasti University of Wisconsin-Madison

Slides:



Advertisements
Similar presentations
Welcome to Middleware Joseph Amrithraj
Advertisements

Full-System Timing-First Simulation Carl J. Mauer Mark D. Hill and David A. Wood Computer Sciences Department University of Wisconsin—Madison.
PERFORMANCE ANALYSIS OF MULTIPLE THREADS/CORES USING THE ULTRASPARC T1 (NIAGARA) Unique Chips and Systems (UCAS-4) Dimitris Kaseridis & Lizy K. John The.
Erhan Erdinç Pehlivan Computer Architecture Support for Database Applications.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Capacity Planning and Predicting Growth for Vista Amy Edwards, Ezra Freeloe and George Hernandez University System of Georgia 2007.
Computer Monitoring System for EE Faculty By Yaroslav Ross And Denis Zakrevsky Supervisor: Viktor Kulikov.
Web Server Hardware and Software
Fundamentals, Design, and Implementation, 9/e Chapter 14 JDBC, Java Server Pages, and MySQL.
Introduction to Web Database Processing
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
Lecture 2 Web application architecture. Themes Architecture : The large scale structure of a system, especially a computer system Design choice: The need.
Web Server Software Architectures Author: Daniel A. Menascé Presenter: Noshaba Bakht.
Progress Report 11/1/01 Matt Bridges. Overview Data collection and analysis tool for web site traffic Lets website administrators know who is on their.
Architectural Impact of SSL Processing Jingnan Yao.
Evaluating Non-deterministic Multi-threaded Commercial Workloads Computer Sciences Department University of Wisconsin—Madison
LYU9901 TravelNet Final Presentation Supervisor: Prof. Michael R. Lyu Members: Ho Chi Ho Malcolm Lau Chi Ho Arthur On
EECC722 - Shaaban #1 Lec # 4 Fall Operating System Impact on SMT Architecture The work published in “An Analysis of Operating System Behavior.
Yaksha: A Self-Tuning Controller for Managing the Performance of 3-Tiered Web Sites Abhinav Kamra, Vishal Misra CS Department Columbia University Erich.
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Electronic Commerce Last Week Internet utility programs
February 11, 2003Ninth International Symposium on High Performance Computer Architecture Memory System Behavior of Java-Based Middleware Martin Karlsson,
Using Standard Industry Benchmarks Chapter 7 CSE807.
Client/Server Architectures
1 Web Servers (IIS and Apache) Outline 9.1 Introduction 9.2 HTTP Request Types 9.3 System Architecture 9.4 Client-Side Scripting versus Server-Side Scripting.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
Supporting Strong Cache Coherency for Active Caches in Multi-Tier Data-Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan, S. Krishnamoorthy,
Introduction to ASP.NET. Prehistory of ASP.NET Original Internet – text based WWW – static graphical content  HTML (client-side) Need for interactive.
Chapter 4: Core Web Technologies
DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.
Performance of Web Applications Introduction One of the success-critical quality characteristics of Web applications is system performance. What.
1 Design and Performance of a Web Server Accelerator Eric Levy-Abegnoli, Arun Iyengar, Junehwa Song, and Daniel Dias INFOCOM ‘99.
Global NetWatch Copyright © 2003 Global NetWatch, Inc. Factors Affecting Web Performance Getting Maximum Performance Out Of Your Web Server.
5 Chapter Five Web Servers. 5 Chapter Objectives Learn about the Microsoft Personal Web Server Software Learn how to improve Web site performance Learn.
Workload-driven Analysis of File Systems in Shared Multi-Tier Data-Centers over InfiniBand K. Vaidyanathan P. Balaji H. –W. Jin D.K. Panda Network-Based.
Software Performance Testing Based on Workload Characterization Elaine Weyuker Alberto Avritzer Joe Kondek Danielle Liu AT&T Labs.
Scaling Dynamic Content Applications through Data Replication - Opportunities for Compiler Optimizations Cristiana Amza UofT.
1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.
1 Specification and Implementation of Dynamic Web Site Benchmarks Sameh Elnikety Department of Computer Science Rice University.
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
DONE-08 Sizing and Performance Tuning N-Tier Applications Mike Furgal Performance Manager Progress Software
Simulating a $2M Commercial Server on a $2K PC Alaa R. Alameldeen, Milo M.K. Martin, Carl J. Mauer, Kevin E. Moore, Min Xu, Daniel J. Sorin, Mark D. Hill.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data- Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan,
Srihari Makineni & Ravi Iyer Communications Technology Lab
Lecture 22: Client-Server Software Engineering
Copyright © cs-tutorial.com. Overview Introduction Architecture Implementation Evaluation.
TPC BENCHMARK W (Web Commerce) SeungLak Choi Dept. of Computer Science, KAIST.
INTRODUCTION TO WEB APPLICATION Chapter 1. In this chapter, you will learn about:  The evolution of the Internet  The beginning of the World Wide Web,
1 Admission Control and Request Scheduling in E-Commerce Web Sites Sameh Elnikety, EPFL Erich Nahum, IBM Watson John Tracey, IBM Watson Willy Zwaenepoel,
Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.
MEMORY SYSTEM CHARACTERIZATION OF COMMERCIAL WORKLOADS Authors: Luiz André Barroso (Google, DEC; worked on Piranha) Kourosh Gharachorloo (Compaq, DEC;
27.1 Chapter 27 WWW and HTTP Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
CMP/CMT Scaling of SPECjbb2005 on UltraSPARC T1 (Niagara) Dimitris Kaseridis and Lizy K. John The University of Texas at Austin Laboratory for Computer.
WEB SERVER SOFTWARE FEATURE SETS
ASP-2-1 SERVER AND CLIENT SIDE SCRITPING Colorado Technical University IT420 Tim Peterson.
UNIT-3 Performance Evaluation UNIT-3 IT2031. Web Server Hardware and Performance Evaluation Key question is whether a company should host their own Web.
Sunpyo Hong, Hyesoon Kim
WebScan: Implementing QueryServer 2.0 Karl Geiger, Amgen Inc. BRS NA UG August 1999.
Taeho Kgil, Trevor Mudge Advanced Computer Architecture Laboratory The University of Michigan Ann Arbor, USA CASES’06.
TPC Benchmark™ W 2002 년 6 월 이상호 교수 숭실대학교 데이터베이스 연구실
Computer Sciences Department University of Wisconsin-Madison
Abhinav Kamra, Vishal Misra CS Department Columbia University
WWW and HTTP King Fahd University of Petroleum & Minerals
/ Computer Architecture and Design
Capacity Analysis, cont. Realistic Server Performance
Admission Control and Request Scheduling in E-Commerce Web Sites
Simulating a $2M Commercial Server on a $2K PC
Web Servers (IIS and Apache)
Presentation transcript:

An Architectural Evaluation of Java TPC-W Harold “Trey” Cain, Ravi Rajwar, Morris Marden, Mikko Lipasti University of Wisconsin-Madison Seventh International Symposium on High Performance Computer Architecture January 2001

HPCA-7 January 2001Cain/Rajwar/Marden/Lipasti Introduction Why do workload characterization? Java: gaining widespread use in server-side middleware applications Very little known about the architectural requirements server-side Java TPC-W: a mixed transaction processing/web serving benchmark Web application middleware implemented in Java

HPCA-7 January 2001Cain/Rajwar/Marden/Lipasti Outline TPC-W Overview Our Java-based implementation of TPC-W Native Execution Results Collected using performance counters on an IBM RS/6000 S80 Server Results for TPC-W, SPECjbb2000, SPECweb99 Simulation Results Coarse Grained Multithreading Evaluation

HPCA-7 January 2001Cain/Rajwar/Marden/Lipasti What is TPC-W? New benchmark specified by the Transaction Processing Council (in February 2000), targeting transactional web systems Web Serving of static and dynamic content On-line transaction processing (OLTP) Some decision support (DSS) Models an on-line bookstore Consists of 14 browser/web server interactions

HPCA-7 January 2001Cain/Rajwar/Marden/Lipasti TPC-W System Under Test 3-Tier Application Web Browsing Users Web Server(s)Database Server(s)

HPCA-7 January 2001Cain/Rajwar/Marden/Lipasti Web Interaction Characteristics Dynamic HTML required: 11/14 interactions DB connectivity required: 11/14 interactions Query complexity varies Read-only and Read/Write Number of images per page: Varies from 3 to 9, 6 on average Maximum response time: Varies from 3 to 20 seconds

HPCA-7 January 2001Cain/Rajwar/Marden/Lipasti Web Interaction Mixes Different web sites have different usage patterns TPC-W models variance using three different transaction mixes Browsing Mix 95% browsing, 5% ordering Shopping Mix (Primary performance metric) 80% browsing, 20% ordering Ordering Mix (business to business) 50% browsing, 50% ordering

HPCA-7 January 2001Cain/Rajwar/Marden/Lipasti Java Implementation of TPC-W All 14 TPC-W web interactions implemented as Java Servlets JDBC used to communicate to a database back-end (DB2) Did not implement Secure Transactions using secure sockets layer (SSL) Communication with payment gateway authority

HPCA-7 January 2001Cain/Rajwar/Marden/Lipasti Outline TPC-W Specification Our implementation of TPC-W Native Execution Results Collected using performance counters on an IBM RS/6000 S80 Server TPC-W, SPECweb99, SPECjbb2000 Simulation Results Coarse Grained Multithreading Evaluation

HPCA-7 January 2001Cain/Rajwar/Marden/Lipasti System Parameters Hardware 6 processor IBM RS/6000 S80, AIX 4.3 RS-64 III (Pulsar) processors 8 GB memory 8 MB 4-way set associative L2 caches 128 KB I-Cache, 128 KB D-Cache, 2-way set associative Software: Zeus Web Server v Apache JServ Servlet Engine 1.0, Java w/ JIT DB2 Universal Database 6.1 Database Size: 205 MB Image Set Size: 250 MB

HPCA-7 January 2001Cain/Rajwar/Marden/Lipasti CPU Time by Application Component Java Servlet Engine Dominates CPU Usage

HPCA-7 January 2001Cain/Rajwar/Marden/Lipasti CPI Breakdown  Most processor stalls due to L2 cache misses

HPCA-7 January 2001Cain/Rajwar/Marden/Lipasti L2 Miss Breakdown

HPCA-7 January 2001Cain/Rajwar/Marden/Lipasti Cache-to-Cache Transfers

HPCA-7 January 2001Cain/Rajwar/Marden/Lipasti Coherence Protocols: To E or not to E  Removing E state would necessitate an extra bus transaction for 9%-28% of all L2 Misses.

HPCA-7 January 2001Cain/Rajwar/Marden/Lipasti Outline TPC-W Specification Our implementation of TPC-W Native Execution Results Collected using performance counters on an IBM RS/6000 S80 Server TPC-W, SPECweb99, SPECjbb2000 Simulation Results Coarse Grained Multithreading Evaluation

HPCA-7 January 2001Cain/Rajwar/Marden/Lipasti Full System Simulation Due to the large amount of time spent in system code, full system simulation is necessary. SimOS-PowerPC Runs modified version of AIX System configuration occurs on real system, then a disk snapshot is created Snapshot used by SimOS-PPC We simulate a three second snapshot of steady-state behavior

HPCA-7 January 2001Cain/Rajwar/Marden/Lipasti Simulated Machine Parameters Single-issue, in-order 500 MHZ processor L1 I-Cache : 128-Kbyte, 2-way associative L1 D-Cache: 128-Kbyte, 2-way associative L2 Cache: 8 MB, 4-way associative Bus models the Sun Gigaplane-XB System configuration is considerably different from IBM S80

HPCA-7 January 2001Cain/Rajwar/Marden/Lipasti Coarse Grained Multithreading Processor contains logic for switching among several threads of execution and maintaining multiple thread contexts. Switch thread when: Cache miss occurs in primary thread, and a suspended thread is in the ready state. The primary thread is in a spin loop or the idle loop, and a suspended thread in the ready state. A suspended thread has a pending interrupt or exception. A suspended ready thread has not retired an instruction in the last 1000 cycles. 3 cycle thread switch penalty

HPCA-7 January 2001Cain/Rajwar/Marden/Lipasti CGMT Results 2 threads: increases throughput as much as 41% 4 threads: increases throughput as much as 60%

HPCA-7 January 2001Cain/Rajwar/Marden/Lipasti Conclusions Java servlet engine is performance critical L2 cache miss stalls to unshared data are primary contributor to memory system stalls The exclusive state successfully reduces memory bus traffic for these commercial workloads. Coarse grained multithreading: Decreases cache hit rates Decreases branch prediction accuracy However, total system throughput improves due to CGMT’s memory latency tolerance.

HPCA-7 January 2001Cain/Rajwar/Marden/Lipasti Questions?

Web Interaction Characteristics

HPCA-7 January 2001Cain/Rajwar/Marden/Lipasti Online Bookstore Functionality: Searching Browsing Shopping carts and secure purchasing Rotating advertisements Best seller and new product lists Customer registration Administrative updates

HPCA-7 January 2001Cain/Rajwar/Marden/Lipasti Remote Browser Emulator Emulates web users interacting through browsers Non-deterministic walk over web pages Send HTTP request Parse HTTP response for images and other URLs Wait for think time (~7 seconds) Repeat

HPCA-7 January 2001Cain/Rajwar/Marden/Lipasti Database Scaling Database size depends on two factors: Number of items in bookstore inventory Number of bookstore customers ~5MB in DB Tables per active user (like TPC-C) ~1 KB per item in DB tables (like TPC-D) Also ~25KB of static images per item Images may be stored in database or standard file system