From A to E: Analyzing TPCs OLTP Benchmarks Pınar Tözün Ippokratis Pandis* Cansu Kaynak Djordje Jevdjic Anastasia Ailamaki École Polytechnique Fédérale.

Slides:

Advertisements

Similar presentations

1 Utility-Based Partitioning of Shared Caches Moinuddin K. Qureshi Yale N. Patt International Symposium on Microarchitecture (MICRO) 2006.

Advertisements

Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION

MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

Gennady Pekhimenko Advisers: Todd C. Mowry & Onur Mutlu

Re-examining Instruction Reuse in Pre-execution Approaches By Sonya R. Wolff Prof. Ronald D. Barnes June 5, 2011.

BT Wholesale October Creating your own telephone network WHOLESALE CALLS LINE ASSOCIATED.

Warm-Up Methodology for HW/SW Co-Designed Processors A. Brankovic, K. Stavrou, E. Gibert, A. Gonzalez.

1 Lecture 2: Metrics to Evaluate Performance Topics: Benchmark suites, Performance equation, Summarizing performance with AM, GM, HM Video 1: Using AM.

SE-292 High Performance Computing

Virtual Hierarchies to Support Server Consolidation Michael Marty and Mark Hill University of Wisconsin - Madison.

Database Tuning Principles, Experiments and Troubleshooting Techniques Baseado nos slides do tutorial com o mesmo nome da autoria de: Dennis Shasha

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide

Sweet Storage SLOs with Frosting Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Ion Stoica, Randy Katz.

© 2011 IBM Corporation 1 IBM Internal Use Only Freakish Database Performance With Flash Storage.

Critical Sections: Re-emerging Concerns for DBMS Ryan JohnsonIppokratis Pandis Anastasia Ailamaki Carnegie Mellon University École Polytechnique Féderale.

Taha Rafiq MMath Thesis Presentation 24/04/2013

@ Carnegie Mellon Databases Data-oriented Transaction Execution VLDB 2010 Ippokratis Pandis Ryan Johnson Nikos Hardavellas Anastasia Ailamaki Carnegie.

CRUISE: Cache Replacement and Utility-Aware Scheduling

Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a cache for secondary (disk) storage – Managed jointly.

Bypass and Insertion Algorithms for Exclusive Last-level Caches

© 2010 Ippokratis Pandis Aether: A Scalable Approach to Logging VLDB 2010 Ryan Johnson Ippokratis Pandis Radu Stoica Manos Athanassoulis Anastasia Ailamaki.

Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications Adwait Jog 1, Evgeny Bolotin 2, Zvika Guz 2,a, Mike Parker.

Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.

Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN

Past Tense Probe. Past Tense Probe Past Tense Probe – Practice 1.

SIMOCODE-DP Software.

RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Addition 1’s to 20.

25 seconds left…...

Test B, 100 Subtraction Facts

Improving OLTP scalability using speculative lock inheritance Ryan Johnson, Ippokratis Pandis, Anastasia Ailamaki.

SE-292 High Performance Computing

We will resume in: 25 Minutes.

Memory Management in NUMA Multicore Systems: Trapped between Cache Contention and Interconnect Overhead Zoltan Majo and Thomas R. Gross Department of Computer.

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan

Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani.

Rethinking Database Algorithms for Phase Change Memory

OLTP on Hardware Islands Danica Porobic, Ippokratis Pandis*, Miguel Branco, Pınar Tözün, Anastasia Ailamaki Data-Intensive Application and Systems Lab,

Chapter 3 General-Purpose Processors: Software

To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Nikos Hardavellas, Northwestern University

1 Database Servers on Chip Multiprocessors: Limitations and Opportunities Nikos Hardavellas With Ippokratis Pandis, Ryan Johnson, Naju Mancheril, Anastassia.

DBMSs on a Modern Processor: Where Does Time Go? Anastassia Ailamaki Joint work with David DeWitt, Mark Hill, and David Wood at the University of Wisconsin-Madison.

PERFORMANCE ANALYSIS OF MULTIPLE THREADS/CORES USING THE ULTRASPARC T1 (NIAGARA) Unique Chips and Systems (UCAS-4) Dimitris Kaseridis & Lizy K. John The.

Erhan Erdinç Pehlivan Computer Architecture Support for Database Applications.

Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.

Improving Database Performance on Simultaneous Multithreading Processors Jingren Zhou Microsoft Research John Cieslewicz Columbia.

S TRex Boosting Instruction Cache Reuse in OLTP Workloads Through Stratified Transaction Execution Islam Atta Pınar Tözün* Xin Tong Islam Atta Pınar Tözün*

DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Task Scheduling for Highly Concurrent Analytical and Transactional Main-Memory Workloads Iraklis Psaroudakis (EPFL), Tobias Scheuer (SAP AG), Norman May.

Pınar Tözün Anastasia Ailamaki SLICC Self-Assembly of Instruction Cache Collectives for OLTP Workloads Islam Atta Andreas Moshovos.

Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.

Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.

Storage Manager Scalability on CMPs Ippokratis Pandis CIDR Gong Show.

Scaling up analytical queries with column-stores Ioannis Alagiannis Manos Athanassoulis Anastasia Ailamaki École Polytechnique Fédérale de Lausanne.

CMP/CMT Scaling of SPECjbb2005 on UltraSPARC T1 (Niagara) Dimitris Kaseridis and Lizy K. John The University of Texas at Austin Laboratory for Computer.

Sunpyo Hong, Hyesoon Kim

Mark Wong Rilson Nascimento PGCon May, Ottawa Digesting an Open-Source Fair-Use TPC-E Implementation: DBT-5.

Computer Sciences Department University of Wisconsin-Madison

Reducing OLTP Instruction Misses with Thread Migration

Scaling the Memory Power Wall with DRAM-Aware Data Management

Database Servers on Chip Multiprocessors: Limitations and Opportunities Nikos Hardavellas With Ippokratis Pandis, Ryan Johnson, Naju Mancheril, Anastassia.

Presentation transcript:

From A to E: Analyzing TPCs OLTP Benchmarks Pınar Tözün Ippokratis Pandis* Cansu Kaynak Djordje Jevdjic Anastasia Ailamaki École Polytechnique Fédérale de Lausanne *IBM Almaden Research Center The obsolete, the ubiquitous, the unknown

OLTP Benchmarks of TPC 2 Allow fair product comparisons Drive innovations for better performance TPC-E: Unknown – Results from one DBMS vendor TPC-C: Ubiquitous – Most common TPC-A, TPC-B: Obsolete TPC-C TPC-B TPC-E TPC-A Banking Wholesale supplier Brokerage house

How is TPC-E different? 3 Hardware Storage Manager Workload Micro-architectural behavior Where does time go? Characteristics/Statistics Under-utilization due to instruction stalls Fewer cache misses and higher IPC Harder to partition requests Logical lock contention More page re-use Complex schema & transactions Longer held locks

Outline Preview Setup & Methodology Micro-architectural behavior Within the storage manager Conclusions 4

Experimental Setup ServerFat (Intel Xeon X5660)Lean (Sun Niagara T2) #Sockets21 #Cores per Socket6 (OoO)8 (in-order) #HW Contexts2464 Clock Speed2.80GHz1.40GHz Memory48GB64GB L312MB (shared)– L2256KB (per core)4MB (shared) L1-D32KB (per core)8KB (per core) L1-I32KB (per core)16KB (per core) OSUbuntu Linux kernel SunOS 5.10 Generic_

Methodology 6 Shore-MT –Scalable open-source storage manager Shore-Kits –Application layer for Shore-MT –Workloads: TPC-B, TPC-C, TPC-E, ++ Micro-architectural –Xeon X5660: Vtune, Niagara T2: cputrack –Measured at peak throughput Storage manager profiling –Niagara T2: dtrace * * *

Outline Preview Setup & Methodology Micro-architectural behavior Within the storage manager Conclusions 7

IPC on Fat & Lean Cores 8 Intel Xeon X5660Sun Niagara T2 Maximum OLTP utilizes lean cores better TPC-E has higher IPC

Execution Cycles and Stalls 9 Intel Xeon X5660 More than half of execution time goes to stalls Instruction stalls are the main problem

Cache Misses 10 TPC-E has lower data miss ratio (MPKI) L1-I misses dominate Intel Xeon X KB L1-I & 32 KB L1-D Sun Niagara T2 16KB L1-I & 8KB L1-D

Why TPC-E has lower miss ratio? 11 More scans of TPC-E Increased page reuse Average per transaction

Outline Preview Setup & Methodology Micro-architectural behavior Within the storage manager Conclusions 12

From A to E: Schema 13 branch warehouse Fixed Scaling Growing customer Increasing schema complexity TPC-BTPC-CTPC-E

From A to E: Transactions 14 TPC-BTPC-CTPC-E #Transactions1512 Transaction MixRW 100%RW 92%RW 23% Secondary Indexes None2 transactions10 transactions Transaction Input includes Branch IDWarehouse IDCustomer ID or Broker ID or Trade ID or … Harder to partition More complexity & variety in transaction mix

Within the Storage Manager 15 Sun Niagara T2 64 HW Contexts SF 64 – 0.6GB Spread SF 64 – 8.2GB Spread SF 1 – 20GB No-Spread

Within the Storage Manager 16 Sun Niagara T2 64 HW Contexts SF 64 – 0.6GB Spread SF 64 – 8.2GB Spread SF 1 – 20GB No-Spread Lock manager is the main bottleneck for TPC-E

SF 64 – 8.2GB Spread Inside the Lock Manager 17 SF 64 – 0.6GB Spread SF 1 – 20GB No-Spread Logical contention even for a large DB

Conclusions Modern hardware is still highly under-utilized –TPC-E: fewer misses, less stall time, higher IPC –OLTP utilizes less aggressive cores better Instruction footprint is too large to fit in L1-I –Spread instructions, (software guided) prefetching –Code/Compiler optimizations Logical lock contention due to hotspots –Increased complexity in schema and transactions –TPC-E: harder to physically partition –Logical partitioning, OCC 18

The obsolete The ubiquitous The unexplored Directed byProduced by Also starring: Shore-MT, Xeon X5660, Niagara T2 TPC-B TPC-C TPC-E