SPECpower_ssj2008** Characterization

Slides:



Advertisements
Similar presentations
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
Advertisements

Computer Abstractions and Technology
VSphere vs. Hyper-V Metron Performance Showdown. Objectives Architecture Available metrics Challenges in virtual environments Test environment and methods.
KnightShift: Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity Daniel WongMurali Annavaram University of Southern California MICRO-2012.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
An Adaptable Benchmark for MPFS Performance Testing A Master Thesis Presentation Yubing Wang Advisor: Prof. Mark Claypool.
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
MCITP Guide to Microsoft Windows Server 2008 Server Administration (Exam #70-646) Chapter 14 Server and Network Monitoring.
HS06 on the last generation of CPU for HEP server farm Michele Michelotto 1.
Module 2: Planning to Install SQL Server. Overview Hardware Installation Considerations SQL Server 2000 Editions Software Installation Considerations.
Capacity Planning in SharePoint Capacity Planning Process of evaluating a technology … Deciding … Hardware … Variety of Ways Different Services.
Q4’07 Desktop Competitive SKU Matrix Non-NDA Version.
Bob Thome, Senior Director of Product Management, Oracle SIMPLIFYING YOUR HIGH AVAILABILITY DATABASE.
Key Perf considerations & bottlenecks Windows Azure VM characteristics Monitoring TroubleshootingBest practices.
AUTHORS: STIJN POLFLIET ET. AL. BY: ALI NIKRAVESH Studying Hardware and Software Trade-Offs for a Real-Life Web 2.0 Workload.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
Designing and Deploying a Scalable EPM Solution Ken Toole Platform Test Manager MS Project Microsoft.
DONE-08 Sizing and Performance Tuning N-Tier Applications Mike Furgal Performance Manager Progress Software
Srihari Makineni & Ravi Iyer Communications Technology Lab
Investigating the Effects of Using Different Nursery Sizing Policies on Performance Tony Guan, Witty Srisa-an, and Neo Jia Department of Computer Science.
Performance Counters on Intel® Core™ 2 Duo Xeon® Processors Michael D’Mello
OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.
Consolidation and Optimization Best Practices: SQL Server 2008 and Hyper-V Dandy Weyn | Microsoft Corp. Antwerp, March
Ó 1998 Menascé & Almeida. All Rights Reserved.1 Part V Workload Characterization for the Web.
Full and Para Virtualization
CMP/CMT Scaling of SPECjbb2005 on UltraSPARC T1 (Niagara) Dimitris Kaseridis and Lizy K. John The University of Texas at Austin Laboratory for Computer.
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
Chapter 1 — Computer Abstractions and Technology — 1 Uniprocessor Performance Constrained by power, instruction-level parallelism, memory latency.
E-MOS: Efficient Energy Management Policies in Operating Systems
1 Chapter Overview Monitoring Access to Shared Folders Creating and Sharing Local and Remote Folders Monitoring Network Users Using Offline Folders and.
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice ProLiant G5 to G6 Processor Positioning.
1© Copyright 2015 EMC Corporation. All rights reserved. NUMA(YEY) BY JACOB KUGLER.
1 Automated Power Management Through Virtualization Anne Holler, VMware Anil Kapur, VMware.
CERN IT Department CH-1211 Genève 23 Switzerland Benchmarking of CPU servers Benchmarking of (CPU) servers Dr. Ulrich Schwickerath, CERN.
WLCG Collaboration Workshop CMS online/offline replication
Computer Architecture & Operations I
Measuring Performance II and Logic Design
Getting the Most out of Scientific Computing Resources
CSCI206 - Computer Organization & Programming
Lecture 2: Performance Today’s topics:
OPERATING SYSTEMS CS 3502 Fall 2017
Getting the Most out of Scientific Computing Resources
OPERATING SYSTEMS CS 3502 Fall 2017
Processes and threads.
CS161 – Design and Architecture of Computer Systems
Current Generation Hypervisor Type 1 Type 2.
40% More Performance per Server 40% Lower HW costs and maintenance
The Multikernel: A New OS Architecture for Scalable Multicore Systems
MCTS Guide to Microsoft Windows 7
Uniprocessor Performance
Morgan Kaufmann Publishers
Installation and database instance essentials
COSC 3406: Computer Organization
Lecture 1 Runtime environments.
Intel’s Core i7 Processor
Comparing dual- and quad-core performance
Software Architecture in Practice
Frequency Governors for Cloud Database OLTP Workloads
TCB 9 RK/MSFT Benchmark Results
Boost Linux Performance with Enhancements from Oracle
Migration Strategies – Business Desktop Deployment (BDD) Overview
CSCI206 - Computer Organization & Programming
So far…. Firmware identifies hardware devices present
Hardware Counter Driven On-the-Fly Request Signatures
A Scalable Approach to Virtual Switching
Lecture 1 Runtime environments.
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Summer 2002 at SLAC Ajay Tirumala.
SharePoint 2013 Best Practices
Presentation transcript:

SPECpower_ssj2008** Characterization November 10, 2018 SPECpower_ssj2008** Characterization Anil Kumar, Larry Gray and Harry Li Intel® Corporation SPEC Workshop – January 27, 2008 * Other names and brands may be claimed as the property of others ** SPEC and the benchmark names are trademarks of the Standard Performance Evaluation Corporation SPEC Confidential

Agenda SPECpower_ssj2008 quick overview November 10, 2018 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources utilization Impact of JVM Optimizations Frequency scaling Processor scaling Platform generation scaling General observation Summary November 10, 2018 SPEC Workshop January 2008 SPEC Confidential

SPECpower_ssj2008 Quick overview November 10, 2018 SPECpower_ssj2008 Quick overview November 10, 2018 SPEC Workshop January 2008 SPEC Confidential

SPECpower* - A “Graduated” Workload November 10, 2018 build slide SPECpower* - A “Graduated” Workload First: A Calibration Phase: Run to Peak Transaction Throughput # warehouses or threads = # cores, scheduling is “ungated” Next: Load Levels: Gradations Based on Calibrated Throughput Average of last two calibration levels = peak calibrated throughput Example Below is x10 or 10% increments – the benchmark Actual Average Per Cent of Calibrated Peak Throughput 99.8% 90.1% 79.6% 69.7% 60.2% 49.9% 40.1% 30.6% 20.0% 9.9% SPECpower_ssj2008 – Transactions Dual-Core Intel Xeon 3.0, 4x1GB, 1x HDD, Pwr Mgmt On ~ Peak Throughput ~40% ~20% ~100% Active Idle calibration levels ~80% SPECpower_ssj2008 – Power and Processor % Dual-Core Intel Xeon 3.0, 4x1GB, 1x HDD, Pwr Mgmt On Runs and Reports a “Load Line” November 10, 2018 SPEC Workshop January 2008 SPEC Confidential

Controlling Measurements November 10, 2018 Controlling Measurements Active Idle Each load level is a 240 second “measurement interval” plus, “inter” (delay between load level), ramp up (pre-measurement) ramp down (post-measurement) Settle time and proper synchronization are essential Consistent Power and Performance Measurement Calibrations Graduated Load Levels SSJ_2008 Initialization SSJ_2008 Reporter Calibration 1 Calibration 2 Calibration n SSJ@100% SSJ@90% Active idle Exit SSJ@80% SSJ@70% SSJ@20% SSJ@10% Load level GO Power STOP Measurement Delay between load level Delay between load level Pre-measurement measurement interval Post-measurement operations per second 10 secs 30 secs 30 secs 10 secs 240 seconds time November 10, 2018 SPEC Workshop January 2008 SPEC Confidential

November 10, 2018 SPECjbb2005 vs. SSJ_OPS@100% SSJ_2008 derived from SPECjbb2005 - But different! Base code and transaction types from SPECjbb2005 Substantive changes! The two are not comparable: Notable Differences Different transaction mix Transaction scheduling and timing Modified throughput accounting Data collection via network – TCP/IP More logging increases disk I/O Plus others November 10, 2018 SPEC Workshop January 2008 SPEC Confidential

Performance to Power Ratio November 10, 2018 SPECpower_ssj2008 - Metric Definition The Primary Metric for SPECpower_ssj2008: “overall ssj_ops/watt” = ∑ 11 avg-trans-rate pts / ∑ 11 power pts Table from SPECpower_ssj2008 Full Disclosure Report (includes power at the active idle state) Performance Power Performance to Power Ratio Target Load Actual Load ssj ops Average Power (W) 100% 99.10% 220,306 276 799 90% 90.40% 200,860 269 746 80% 79.50% 176,684 261 677 70% 70.30% 156,344 254 616 60% 59.60% 132,525 245 541 50% 49.60% 110,222 237 465 40% 40.20% 89,388 229 390 30% 30.10% 66,875 221 302 20% 19.90% 44,157 213 207 10% 10.20% 22,649 206 110 Active Idle 198 ∑ssj_ops / ∑power = 468 ssj_ops@100% average power each level performance / power each level overall ssj_ops/watt SPECpower_ssj2008 Intel publication http://www.spec.org/power_ssj2008/results/res2007q4/power_ssj2008-20071129-00017.html Lots of data in rest of the report ! November 10, 2018 SPEC Workshop January 2008 SPEC Confidential

Initial characterization of SPECpower_ssj2008 November 10, 2018 Initial characterization of SPECpower_ssj2008 November 10, 2018 SPEC Workshop January 2008 SPEC Confidential

Hardware and Software SUT: Intel® “White Box” November 10, 2018 Hardware and Software SUT: Intel® “White Box” Dual and Quad Core Intel® Xeon® 2.0 & 3.0 GHz Supermicro* X7DB8/ Main Board, Super Micro 5000P (Blackford chipset) 4x 2GB FBDIMMs 1x 700W PSU 5U Tower Platform Microsoft* Windows Server 2003 64 bit Power Options: Server Balanced Processor Power and Performance JVM: BEA* JRockit* P27.4.0 64 bit JVM Command Line similar to published results Sampling Rates: Power: 1 second (average from meter) SPECpower_ssj2008 setup SSJ Director on SUT load levels 120 seconds Platform upon which the counters were collected is not same platform used in SPECpower public reports. November 10, 2018 SPEC Workshop January 2008 SPEC Confidential

Collecting OS Counters November 10, 2018 Collecting OS Counters Intel Written Daemon “OSctrD.exe” Counters defined in ccs.props Daemon runs on SUT, Data to CCS via TCP/IP Can run on CCS CCS logs counters along with watts, trans, etc. “Integrated” Log advantage Windows Only Linux port under consideration November 10, 2018 SPEC Workshop January 2008 SPEC Confidential

SSJ_2008 Memory Usage Code footprint: Data footprint: JVMs November 10, 2018 SSJ_2008 Memory Usage Code footprint: ~1.5M (total of all methods JIT’ed and optimized) Data footprint: ~50MB per warehouse “database” size ~8KB of transient objects per transaction JVMs 32 bit JVM - Max. 4GB heap 64 bit JVM - much larger heap (max. 264 Bytes) Multiple instances can/will increase memory footprint Optimal memory size is throughput capacity dependent Platform and configuration specific Example: Quad-Core Intel Xeon based Dual Processor system ~8GB optimal for SPECpower_ssj2008 All above specific to BEA JRockit JVM November 10, 2018 SPEC Workshop January 2008 SPEC Confidential

Transactions (SSJ OPS) November 10, 2018 Transactions (SSJ OPS) CPU % tracks load CPU % expected to track on Intel Core 2 architecture Other architectures will vary (SMT etc.) Load level targets are % of SSJ_OPS@calibrated CPU utilization is no part of the benchmark November 10, 2018 SPEC Workshop January 2008 SPEC Confidential

Power and Processor Utilization November 10, 2018 Power and Processor Utilization Average SSJ OPS tracking as expected per level Throughput per sec showing expected variability within load level Negative Exponential inter-arrival time batch scheduling Power consumption varies with load November 10, 2018 SPEC Workshop January 2008 SPEC Confidential

All Three (SSJ OPS, CPU% and Power) November 10, 2018 All Three (SSJ OPS, CPU% and Power) At all load levels including active idle: All three, SSJ OPS, CPU % utilization and Average Watts tracking as expected November 10, 2018 SPEC Workshop January 2008 SPEC Confidential

November 10, 2018 Memory Utilization With typical tuning (Xmx==Xms), Java heap allocated remains same throughout the run committed memory in use remains constant at all load levels including active idle November 10, 2018 SPEC Workshop January 2008 SPEC Confidential

November 10, 2018 Network I/O ~1500 Bytes/sec of network I/O at all load levels including active idle: Network I/O from per sec request/response between Control & Collect (CCS) and SSJ_2008 Director November 10, 2018 SPEC Workshop January 2008 SPEC Confidential

Disk I/O Disk I/O – Regular bursts of ~140Kbyte writes, November 10, 2018 Disk I/O Disk I/O – Regular bursts of ~140Kbyte writes, ~3.3Kbytes/sec average for all load levels Most disk writes related to SSJ_2008 logging Disk reads average zero November 10, 2018 SPEC Workshop January 2008 SPEC Confidential

C1 state % Time in C1 State – Inverse of CPU % November 10, 2018 C1 state % Time in C1 State – Inverse of CPU % C1/C1E Time contributes to power saving Varies with architecture, OS and policies Intel EIST and C1E “enabled” in BIOS November 10, 2018 SPEC Workshop January 2008 SPEC Confidential

Basic system events Interrupts: ~700 /sec at all load levels November 10, 2018 Basic system events Interrupts: ~700 /sec at all load levels Context switches ~800 /sec Below 50% declining to ~400 at active idle Rates OS and platform dependent More Investigation Needed Here November 10, 2018 SPEC Workshop January 2008 SPEC Confidential

Impact of JVM Optimizations November 10, 2018 Impact of JVM Optimizations Experiment with JVM Options JAVAOPTIONS_SSJ=“ “ (None, default heap and optimization) JAVAOPTIONS_SSJ=“-Xms3000m -Xmx3000m -Xns2400m -XXaggressive -XXlargePages -XXthroughputCompaction -XXcallprofiling -XXlazyUnlocking -Xgc:genpar -XXtlasize:min=12k,preferred=1024k” Performance Loss ~50% Power Less by 0 to 3% less Your Results dependent on JVM and options JVM Version: BEA JRockit(R) 1.6.0_02 (build P27.4.0-10-90053-1.6.0_02-20071009-1827-windows-x86_64, compiled mode) JVM Command-line Options: -Xms3000m -Xmx3000m -Xns2400m -XXaggressive -XXlargePages -XXthroughputCompaction -XXcallprofiling -XXlazyUnlocking -Xgc:genpar -XXtlasize:min=12k,preferred=1024k JVM Affinity: None JVM Instances: 2 November 10, 2018 SPEC Workshop January 2008 SPEC Confidential

Dual Core to Quad Core (Intel Xeon) November 10, 2018 Processor scaling Dual Core Intel Xeon  Quad Core Intel Xeon: (2.0GHz / 4MBL2) (2.0GHz 2x4MBL2) SSJ_OPS@100% increased by ~77% Similar power@100% Overall SSJ_OPS/Watt improved by ~73% Dual Core to Quad Core (Intel Xeon) % increase SSJ_OPS@100% 77% Power@100% 1% Overall SSJ_OPS/Watt 73% November 10, 2018 SPEC Workshop January 2008 SPEC Confidential

Frequency scaling 2.0 to 3.0GHz Quad Core Intel Xeon (2x6MBL2): November 10, 2018 Frequency scaling Quad Core Intel Xeon % increase 2GHz-->3GHz 50% SSJ_OPS@100% 24% Power@100% 10% Overall SSJ_OPS/Watt 16% 2.0 to 3.0GHz Quad Core Intel Xeon (2x6MBL2): Frequency increase of 50% SSJ_OPS@100% increases by ~24% Power@100% increased by ~10% Overall SSJ_OPS/Watt improved by ~16% November 10, 2018 SPEC Workshop January 2008 SPEC Confidential

Platform generation scaling November 10, 2018 Platform generation scaling Quad Core Intel Xeon 2.0GHz vs. Single Core Intel Xeon 3.6GHz: SSJ_OPS@100% improves by ~5.4x Power@100% less by ~20% for newer generation Overall SSJ_OPS/Watt improves by ~5.4x   Performance Power(W) Overall Announced Processor in 2 Socket Platform SSJ_OPS@100% @100% ssj_ops/Watt 2005 Single Core Intel Xeon 3.6GHz / 1MB L2 with HT 40,852 336 87 Q2 2006 Dual Core Intel Xeon 3.0GHz/4MB L2 163,768 291 338 Q4 2006 Quad Core Intel Xeon 2.0GHz/2x4MB L2 220,306 276 468 November 10, 2018 SPEC Workshop January 2008 SPEC Confidential

November 10, 2018 General observation CPU Utilization – follows the load line (architecture dependent) % Time in C1 State – Inverse of CPU % C1 Transitions per second – highest at idle Memory % Committed – constant across load line Disk I/O – Regular bursts of ~140K byte writes, ~3.3K bytes/sec for all load levels Network I/O - ~2.5K Bytes/sec, ~constant across load line Basic system events require more investigation Benchmark metric and other data do effectively show scaling with frequency, cores and across platform generation November 10, 2018 SPEC Workshop January 2008 SPEC Confidential

Summary First look, more refinements required November 10, 2018 Summary First look, more refinements required More measurements planned for in-depth characterization Results are specific to the platform and OS measured, etc SPEC FDR contains unprecedented amount of data Some system resources track graduated loads Benchmark metric and data fairly reflect configuration and OS Setting changes We are just getting started. November 10, 2018 SPEC Workshop January 2008 SPEC Confidential

November 10, 2018 END SPEC Confidential