C-Store: Introduction to TPC-H Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar 20, 2009.

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

BY LECTURER/ AISHA DAWOOD DW Lab # 4 Overview of Extraction, Transformation, and Loading.
The Relational Model and Relational Algebra Nothing is so practical as a good theory Kurt Lewin, 1945.
Erhan Erdinç Pehlivan Computer Architecture Support for Database Applications.
Intelligent Environments1 Computer Science and Engineering University of Texas at Arlington.
Distributed databases
Management Information Systems, Sixth Edition
Transaction.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 13 (Web): Distributed Databases
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
Database Management: Getting Data Together Chapter 14.
CSCI 150 Database Applications Chapter 1 – Getting Started.
Getting Started (Excerpts) Chapter One DAVID M. KROENKE’S DATABASE CONCEPTS, 2 nd Edition.
BUSINESS DRIVEN TECHNOLOGY
Chapter 4: Organizing and Manipulating the Data in Databases
Cloud Computing Lecture Column Store – alternative organization for big relational data.
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida Distributed Databases Business needs.
Project Implementation for COSC 5050 Distributed Database Applications Lab2.
1 DATABASE TECHNOLOGIES BUS Abdou Illia, Fall 2012 (September 5, 2012)
1 Intro to Info Tech Database Management Systems Copyright 2003 by Janson Industries This presentation can be viewed on line at:
6-1 DATABASE FUNDAMENTALS Information is everywhere in an organization Information is stored in databases –Database – maintains information about various.
Krerk Piromsopa. Advance Net-Centric Computing Technology Krerk Piromsopa. Department of Computer Engineering. Chulalongkorn University.
CSC271 Database Systems Lecture # 30.
Chapter 2 CIS Sungchul Hong
2440: 141 Web Site Administration Database Management Using SQL Professor: Enoch E. Damson.
1 INTRODUCTION TO DATABASE MANAGEMENT SYSTEM L E C T U R E
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
Database Management System Module 5 DeSiaMorewww.desiamore.com/ifm1.
Communicating with the Outside. Overview Package several SQL statements within one call to the database server Embedded procedural language (Transact.
Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
1 THEORACTIC E TTA Project THEORACTICE OCT. 25, 2005.
I Information Systems Technology Ross Malaga 4 "Part I Understanding Information Systems Technology" Copyright © 2005 Prentice Hall, Inc. 4-1 DATABASE.
Module 11: Programming Across Multiple Servers. Overview Introducing Distributed Queries Setting Up a Linked Server Environment Working with Linked Servers.
1 Invitation to Join the TPC Kim Shanley Chief Operating Officer TPC.
1 The Relational Database Model. 2 Learning Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical.
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
Chapter 5 Database Processing. Neil uses software to query a database, but it has about 25 standard queries that don’t give him all he needs. He imports.
C-Store: How Different are Column-Stores and Row-Stores? Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 8, 2009.
Storing Organizational Information - Databases
9/7/2012ISC329 Isabelle Bichindaritz1 The Relational Database Model.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Intro – Part 2 Introduction to Database Management: Ch 1 & 2.
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.
CHAPTER 3 DATABASES AND DATA WAREHOUSES. 2 OPENING CASE STUDY Chrysler Spins a Competitive Advantage with Supply Chain Management Software Chapter 2 –
Database Management COP4540, SCS, FIU Physical Database Design (ch. 16 & ch. 3)
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
C-Store: Data Model and Data Organization Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May 17, 2010.
Chapter 3 Databases and Data Warehouses: Building Business Intelligence Copyright © 2010 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
1 Database Management Systems (DBMS). 2 Database Management Systems (DBMS) n Overview of: ä Database Management Components ä Database Systems Architecture.
SQL Server 2005 Implementation and Maintenance Chapter 12: Achieving High Availability Through Replication.
Ing. Erick López Ch. M.R.I. Replicación Oracle. What is Replication  Replication is the process of copying and maintaining schema objects in multiple.
GLOBEX INFOTEK Copyright © 2013 Dr. Emelda Ntinglet-DavisSYSTEMS ANALYSIS AND DESIGN METHODSINTRODUCTORY SESSION EFFECTIVE DATABASE DESIGN for BEGINNERS.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
3/6: Data Management, pt. 2 Refresh your memory Relational Data Model
DATABASE REPLICATION DISTRIBUTED DATABASE. O VERVIEW Replication : process of copying and maintaining database object, in multiple database that make.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 1 Database Systems.
1 Database Fundamentals Introduction to SQL. 2 SQL Overview Structured Query Language The standard for relational database management systems (RDBMS)
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Concepts of Database Management, Fifth Edition Chapter 3: The Relational Model 2: SQL.
11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.
TPC Benchmarks: TPC-A and TPC-B
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Advanced SQL: Views & Triggers
MANAGING DATA RESOURCES
Presentation transcript:

C-Store: Introduction to TPC-H Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar 20, 2009

Overview of TPC-H What’s TPC?  Transaction Processing Performance Council.  TPC-H is an ad-hoc, decision support benchmark.  business oriented ad-hoc queries  concurrent data modifications

So Called “ What if ” Query: An Example Tell me  the amount of revenue increase that would have resulted from eliminating certain company-wide discounts in a given percentage range in a given year.

The Example Query in SQL -- $ID$ -- TPC-H/TPC-R Forecasting Revenue Change Query (Q6) -- Functional Query Definition -- Approved February 1998 :x :o select sum(l_extendedprice * l_discount) as revenue from lineitem where l_shipdate >= date ':1' and l_shipdate < date ':1' + interval '1' year and l_discount between : and : and l_quantity < :3; :n -1

The History In April 1999, TPC-R and TPC-H replaced TPC-D. TPC-R is for a reporting workload.  Queries are well known in advance.  Obsolete as of 1/1/2005 TPC-H is for an ad-hoc querying workload.  Queries are not known in advance. TPC-H (Now) 

Business Environment TPC-H and TPC-R model  any industry which manages, sells, or distributes products worldwide  Such as parts, food distribution Business Environment is divided into two areas:  A Business Operation area  A Decision Support area

Purpose of Benchmarks To reduce the diversity of operations found in a typical decision support application While retaining the application’s essential performance characteristics:  The level of system utilization  And the complexity of operations.

The Core of TPC-H/R A set of business queries designed to exercise system functionalities in complex decision support applications. These queries portray the activity of a wholesale supplier to help the audience relate intuitively to the components of the benchmarks.

Target Domain of Business Analysis Pricing and Promotions; Supply and Demand Management; Profit and Revenue Management; Customer Satisfication Study; Market Share Study; Shipping Management.

Schema Both TPC-H and TPC-R use 3 rd Normal Form.  8 base tables

dbgen: the Data Generator Generates data for all base tables  Depending on a scale factor (SF). The scale factor determines the size of raw data inside the databse  SF=100 means that the sum of all base tables equals 100 GB.  Fixed choices of SF: 1, 10, 30, 100, 300, 1000, 3000, The size of each table scales up with the SF.  Except for nation and region

Workload A database load The execution of 22 read-only queries in both single and multi-user mode. The execution of 2 refresh functions

Database Load Is the process of building the test database. The database load time includes all of the elapsed time  to create the tables, load data,  ceate indices, define and validate constraints,  gather statistics, configure the system,  and ensure that the test database meets the ACID requirements.

22 read-only queries: Characterized by 4 components A business question  illustrates the business context in which the query is used. A functional query definition  Defines the function to be performed by the query.  Each query is defined as a query template. Substitution parameters  Generated by the supplied program qgen. A query validation  Describes how to validate each query against a 1 GB database (qualification database)

2 refresh functions RF1:  Insert new rows into the tables lineitem and orders. RF2:  Delete the same number of rows from the tables lineitem and orders.

Implementation Rules (1): Partitioning Scheme In TPC-H, horizontal partitioning is allowed with some restrictions. The partitioning field must be one and only one of the following:  A primary key column as defined in the benchmark specification;  A foreign key as defined in the benchmark specification;  A single date column.

Implementation Rules (2): Auxiliary Structures The physical implementation of auxiliary data structures (such as B-Tree) to the tables may involve data replication of selected data from the tables provided that:  All replicated data are managed by the DBMS, the OS, or the hardware;  All replications are transparent to all data manipulation operations;  Data modifications are reflected in all logical copies when the updating transaction is committed;  All copies of replicated data maintain full ACID properties at all time.

Primary Performance Metric The Composite Performance Metric  QphH: the number of queries the system can perform per hour. In order to compute QphH for a test system at a given scale factor, one needs to run a power test followed by a throughput test.  The results are then combined to compute QphH.

The Processing Power  The geometric mean of the elapsed times for all queries and both refresh functions obtained from the power test.  The unit is queries per hour.

Computation of

The Throughput Power  The ratio of the total number of queries executed over the length of the measurement interval of the multi-stream run.  The unit is queries per hour.

Computation of

The Composite Query-Per-Hour Performance Metric

Price/Performance Metric The ratio of the total system price divided by the composite metric

Top Ten TPC-H by Performance: Version 2 Results As of 19-Mar :48 AM

Top Ten TPC-H by Price/Performance: Version 2 Results As of 19-Mar :51 AM

References M. Poess, C. Floyd. New TPC Benchmarks for Decision Support and Web Commerce. ACM SIGMOD Record, 29(4) December TPC-H Official Site: TPC-H Version :