March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.

Slides:



Advertisements
Similar presentations
1 DynaMat A Dynamic View Management System for Data Warehouses Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan.
Advertisements

Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Composite Subset Measures Lei Chen, Paul Barford, Bee-Chung Chen, Vinod Yegneswaran University of Wisconsin - Madison Raghu Ramakrishnan Yahoo! Research.
Multidimensional Data
Chapter 18: Data Analysis and Mining Kat Powell. Chapter 18: Data Analysis and Mining ➔ Decision Support Systems ➔ Data Analysis and OLAP ➔ Data Warehousing.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
The Vuel Concept: Towards a new way to manage Multiple Representations in Spatial Databases ISPRS / ICA Workshop Multi-Scale Representations of Spatial.
Panoptes: A Scalable Architecture for Video Sensor Networking Applications Wu-chi Feng, Brian Code, Ed Kaiser, Mike Shea, Wu-chang Feng (OGI: The Oregon.
MIS DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS MBNA
CS 333 Introduction to Operating Systems Class 12 - Virtual Memory (2) Jonathan Walpole Computer Science Portland State University.
Interactive Dynamic Aggregate Queries Kenneth A. Ross Junyan Ding Columbia University.
Manajemen Basis Data Pertemuan 8 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Chap8: Trends in DBMS 8.1 Database support for Field Entities 8.2 Content-based retrieval 8.3 Introduction to spatial data warehouses 8.4 Summary.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
User Interfaces for DGRC Steven Feiner Surabhan Temiyabutr Department of Computer Science Columbia University New York, NY 10027
Attribute databases. GIS Definition Diagram Output Query Results.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates
Professor Michael J. Losacco CIS 1150 – Introduction to Computer Information Systems Databases Chapter 11.
Computer Science & Engineering 2111 CSE 2111 Lecture Querying a Database 1CSE 2111 Lecture- Querying a Database.
Chapter 13 – Data Warehousing. Databases  Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age  Information,
31 January 2007Craig E. Ward1 Large-Scale Simulation Experimentation and Analysis Database Programming Using Java.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
CS 345: Topics in Data Warehousing Thursday, October 28, 2004.
Chapter 4: Organizing and Manipulating the Data in Databases
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
IST722 Data Warehousing Business Intelligence Development with SQL Server Analysis Services and Excel 2013 Michael A. Fudge, Jr.
CS333 Intro to Operating Systems Jonathan Walpole.
Multi-Dimensional Databases & Online Analytical Processing This presentation uses some materials from: “ An Introduction to Multidimensional Database Technology,
An Integration Framework for Sensor Networks and Data Stream Management Systems.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
1 Cube Computation and Indexes for Data Warehouses CPS Notes 7.
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters H.Yang, A. Dasdan (Yahoo!), R. Hsiao, D.S.Parker (UCLA) Shimin Chen Big Data.
1 Fast Computation of Sparse Datacubes Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung.
Online aggregation Joseph M. Hellerstein University of California, Berkley Peter J. Haas IBM Research Division Helen J. Wang University of California,
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Chapter 4c, Database H Definition H Structure H Parts H Types.
United Nations Economic Commission for Europe Statistical Division The Importance of Databases in the Dissemination Process Steven Vale, UNECE.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
1 CSE 2337 Introduction to Data Management Access Book – Ch 1.
SAGA: Array Storage as a DB with Support for Structural Aggregations SSDBM 2014 June 30 th, Aalborg, Denmark 1 Yi Wang, Arnab Nandi, Gagan Agrawal The.
1 On-Line Analytic Processing Warehousing Data Cubes.
Building Dashboards SharePoint and Business Intelligence.
Foundations of Business Intelligence: Databases and Information Management.
Variant Indexes. Specialized Indexes? Data warehouses are large databases with data integrated from many independent sources. Queries are often complex.
1 Database Systems, 8 th Edition 1 Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: –How business intelligence.
12 1 Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel 12.4 Online Analytical Processing OLAP creates an advanced data.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
Towards Unifying Vector and Raster Data Models for Hybrid Spatial Regions Philip Dougherty.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
I am Xinyuan Niu I am here because I love to give presentations. Data Warehousing.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University.
Foundations of information systems : BIS 1202 Lecture 4: Database Systems and Business Intelligence.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Dense-Region Based Compact Data Cube
Indexing Goals: Store large files Support multiple search keys
Data Warehouse.
Similarity Search: A Matching Based Approach
Slides based on those originally by : Parminder Jeet Kaur
Presentation transcript:

March DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University

March DGRC FedStats Visit Research Experience n Complex query processing n Data Warehousing n Main memory databases Students: Kazi Zaman, Junyan Ding

March DGRC FedStats Visit Mediator Query Unified Results User Main- Memory DBMS Traditional DBMS... Scenario A

March DGRC FedStats Visit Mediator Data Request Unified Results User Web Traditional DBMS... Scenario B Main Memory DB Sequence Of Interactive Queries Queries

March DGRC FedStats Visit Mediator Data Request Unified Results User Web Traditional DBMS... Scenario C Main Memory DB Graphical User Interface Dynamic Query

March DGRC FedStats Visit Outline n Introduction to Datacubes n Frameworks for querying cubes n The Main Memory based framework n Experimental Results n Conclusions and Plan

March DGRC FedStats Visit The CUBE BY Operator State Year Grade Sales CA 1997 Regular 90 NY 1997 Premium 70 CA 1998 Premium 65 NY 1998 Premium 95 State Year Grade Sales CA 1997 Regular 90 CA 1997 ALL 90 ALL 1997 Regular 90 CA ALL Regular 90 ALL 1997 Regular 90 ALL 1997 ALL 160 ALL ALL Regular 90 CA ALL ALL 155 ALL ALL ALL 320 CUBE BY (sum Sales) Large increase in total Size, especially with many dimensions ……. Additional records

March DGRC FedStats Visit Lattice Representation State, Year, Grade State, YearState, Grade Year, Grade StateYear Grade

March DGRC FedStats Visit Modeling Queries Slice Queries ask for a single aggregate record SELECT State, year, sum(sales) FROM BLS GROUP BY State, year HAVING State = “NY” AND year = “1998”

March DGRC FedStats Visit Existing Frameworks State, Year, Grade State, Year State,Grade Year,Grade State Year Grade Choose subset of cube to materialize based on workload. Materialize on disk Appropriate record recovered or computed for incoming slice query Drawbacks: Ignores Clustering of Relation on disk. Smallest unit of materialization is too big.

March DGRC FedStats Visit Our approach State, Year, Grade State, Year State,Grade Year,Grade State Year Grade The full cube is often larger than available memory, but... The finest granularity aggregate may fit. Any record can be computed without having to go to disk. How should the finest granularity be organized ?

March DGRC FedStats Visit Framework Level-1 Store Level-2 Store records in linked lists Slot directory Selected coarse records in hash table Finest granularity cuboid Query q

March DGRC FedStats Visit The Level-1 Store Records are pairs stored in a hash table. Records can contain ALL’s Given query Q, form composite key and check level-1 store (constant time). If not found, use level-2 store Key Value a1 55 b2 34 c2 12 …...

March DGRC FedStats Visit The Level-2 Store Level-2 Store records in linked lists Slot directory Finest granularity cuboid Slot directory is organized as a multidimensional array: level2[sz1][sz2][sz3][sz4] Each slot points to a linked list of elements. Records placed according to set of mapping functions H

March DGRC FedStats Visit Using the Level-2 store b4 Query Q without ALL’s d5 a3 c2 Slot 4 Slot 3 Slot 7 Slot1 Access list denoted by level2[4][3][7][1] ; aggregate those matching (a3,b4,c2,d5).

March DGRC FedStats Visit Using the Level-2 store ALL Query Q with ALL’s ALLa3 c2 Slot 4 List of Slots Slot 7 List of Slots Access lists matching level2[4][*][7][*] ; aggregate those matching (a3,*,c2,*).

March DGRC FedStats Visit Demo n Shows multidimensional dataset (subset of columns of 5% Census sample for NY in 1990). n User asks queries: fast answers. n Future: User Interface asks many queries, with display changing interactively. n demo demo

March DGRC FedStats Visit Experimental Results Scanning all records takes 194 ms.

March DGRC FedStats Visit Importance of Work Aggregation is fundamental to analysis. Make analysis interactive, even for many dimensions. Make a variety of aggregate granularities available, where possible.

March DGRC FedStats Visit Contributions n A Main Memory based framework for answering datacube queries efficiently. n Query Performance in the 2-4 ms range which is more efficient than going to disk.

March DGRC FedStats Visit Plan n Integrate with user interface to generate dynamic queries. n Self-tuning capability. n Multiple data sets. n Work with agencies to generate value –For intra-agency analysis –For enhanced data dissemination