Performance techniques for OLAP ( On-line analytical processing ) 이화여자대학교 컴퓨터학과 데이터베이스 연구실 석사 2 학기 강 주 영

Slides:



Advertisements
Similar presentations
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Advertisements

Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Generating the Data Cube (Shared Disk) Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint Work with F. Dehne T. Eavis S. Hambrusch.
Data Warehouses and Data Cubes
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
Implementation & Computation of DW and Data Cube.
Parallelizing the Data Cube PhD Oral Defence Todd Eavis July 23, 2003.
 N. Roussopoulos 2007 OLAP & Data Cubing Spring 2007 Nick Roussopoulos
OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries.
Data Cube and OLAP Server
Analytical Processing OLAP MOLAP ROLAP. Comparison OLAP - A computer application that allows multiple dimensional manipulation, display and visualization.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
1 Computing the cube Abhinandan Das CS 632 Mar
Chapter 13 The Data Warehouse
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
August 14, 2015Data Mining: Concepts and Techniques 1 Chapter 3: Data Warehousing and OLAP Technology: An Overview What is a data warehouse? Data warehouse.
Data Cube Computation Model dependencies among the aggregates: most detailed “view” can be computed from view (product,store,quarter) by summing-up all.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
1 Dr. Panagiotis Symeonidis Data Engineering Laboratory Data Warehouse implementation: Part B.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Data Warehouse & Data Mining
On-Line Analytic Processing Chetan Meshram Class Id:221.
1 CUBE: A Relational Aggregate Operator Generalizing Group By By Ata İsmet Özçelik.
September 2011Copyright 2011 Teradata Corporation1 Teradata Columnar.
1 Cube Computation and Indexes for Data Warehouses CPS Notes 7.
1 Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung Implementing Data Cubes Efficiently.
OnLine Analytical Processing (OLAP)
Efficient Methods for Data Cube Computation and Data Generalization
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross- Tab and Sub-Totals Gray et Al. Presented By: Priya Rajan.
Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
1 Fast Computation of Sparse Datacubes Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung.
1 1 MSCIT 5210: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Jiawei Han, Micheline.
Prof. Bayer, DWH, Ch.4, SS Chapter 4: Dimensions, Hierarchies, Operations, Modeling.
Data Warehousing.
Designing Aggregations. Performance Fundamentals - Aggregations Pre-calculated summaries of data Intersections of levels from each dimension Tradeoff.
Han: Dataware Houses and OLAP 1 What is Data Warehouse? Defined in many different ways, but not rigorously. A decision support database that is maintained.
ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! URL: Jul. 12th,
Frank Dehnewww.dehne.net Parallel Data Cube Data Mining OLAP (On-line analytical processing) cube / group-by operator in SQL.
COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong
Ayyat IT Group Murad Faridi Roll NO#2492 Muhammad Waqas Roll NO#2803 Salman Raza Roll NO#2473 Junaid Pervaiz Roll NO#2468 Instructor :- “ Madam Sana Saeed”
1 On-Line Analytic Processing Warehousing Data Cubes.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.
The Cubetree Storage Organization A High Performance ROLAP Datablade 데이터베이스 연구실 석사 3 학기 강 주 영
What is OLAP?.
CSE 5331/7331 F'071 CSE 5331/7331 Fall 2007 Dimensional Modeling Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University.
The Data Warehouse Chapter Operational Databases = transactional database  designed to process individual transaction quickly and efficiently.
12 1 Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel 12.4 Online Analytical Processing OLAP creates an advanced data.
Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
SF-Tree and Its Application to OLAP Speaker: Ho Wai Shing.
Cubing Heuristics (JIT lecture) Heuristics used during data cube computation.
병렬분산컴퓨팅연구실 1 Cubing Algorithms, Storage Estimation, and Storage and Processing Alternatives for OLAP 병렬 분산 컴퓨팅 연구실 석사 1 학기 이 은 정
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Data Warehouses and OLAP 1.  Review Questions ◦ Question 1: OLAP ◦ Question 2: Data Warehouses ◦ Question 3: Various Terms and Definitions ◦ Question.
CSE6011 Implementing a Warehouse  Monitoring: Sending data from sources  Integrating: Loading, cleansing,...  Processing: Query processing, indexing,...
Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies 병렬 분산 컴퓨팅 연구실 석사 1 학기 김남희.
Chapter 13 Business Intelligence and Data Warehouses
Chapter 13 The Data Warehouse
Chapter 5: Advanced SQL Database System concepts,6th Ed.
Introduction of Week 9 Return assignment 5-2
Data Warehouse.
CUBE MATERIALIZATION E0 261 Jayant Haritsa
Chapter 4: Data Cube Computation and Data Generalization
Online analytical processing (OLAP) is a category of software technology that enables analysts, managers, and executives to gain insight into data through.
Presentation transcript:

Performance techniques for OLAP ( On-line analytical processing ) 이화여자대학교 컴퓨터학과 데이터베이스 연구실 석사 2 학기 강 주 영

1999/ 12/ 8 Juyoung Kang. Database Lab2 Contents OLAP Overview Key demand of OLAP Approaches Cubing Algorithms Multidimensional aggregation Selection of the views to materialize Method for MOLAP system Conclusions and further researches

1999/ 12/ 8 Juyoung Kang. Database Lab3 OLAP Overview 매출액 제품별 매출액은 ? 권역별 매출액은 ? 매출액의 목 표 대비 실적 은 ? 다양한 사용자의 관점 OLAP 정보의 다차원적 분석을 위한 대화식 분석 도구 정보의 다차원적 분석을 위한 대화식 분석 도구

1999/ 12/ 8 Juyoung Kang. Database Lab4 OLAP Overview OLAP 의 특징 다차원 정보, 직접 접근, 대화식 분석, 의사결정에 활용 OLTP vs. OLAPTransactionalsolutionsTransactionalsolutionsOLAPsolutionsOLAPsolutions Data Size 수 - 수십 GB 수백 GB - 수 TB Structured for Data integrity Ease in querying Optimized for Transaction performance Query performance Data features Atomized, Present, Process-oriented Summarized,Historical, Subject oriented Applications Process-oriented Subject oriented

1999/ 12/ 8 Juyoung Kang. Database Lab5 Key demand of OLAP Multidimensional Queries Car Sales DataModelModel Ford YearYear ColorColor Black White Black Ford 1995 White SalesSales FordALL 290 Aggregate by Year Ford1994ALL90 Ford1995ALL120 Model Ford Year 1994 Color Black White Sales Ford1995Black85 Ford1995White115 다차원 질의 를 위한 Unioned GROUP BYs 년에 팔린 Ford 자동차는 몇 대 인가 ? SELECT Model, Year, Color,Sum(Sales) FROM Sales WHERE Model = ‘Ford’ GROUP BY Model, Year, Color UNION SELECT Model, ‘ALL’, ‘ALL’, Sum(Sales) FROM Sales WHERE Model = ‘Ford’ GROUP BY Model UNION SELECT Model, Year, ‘ALL’, Sum(Sales) FROM Sales WHERE Model = ‘Ford’ GROUP BY Model, Year

1999/ 12/ 8 Juyoung Kang. Database Lab6 Key demand of OLAP Queries be answered quickly !! Pre-calculation 다차원 집계 계산을 위한 효율적인 방법은 ? Tradeoffs between Storage and Performance 적절한 저장공간을 사용하면서 응답성능을 최대로 할 수 있는 방법은 ? 강남권 : 20 개 매장 매장 당 제품 수 : 100 여 개 Record to be processed Record to be processed = 20  100  365  2 = 1,460,000 !!! 강남권 : 20 개 매장 매장 당 제품 수 : 100 여 개 Record to be processed Record to be processed = 20  100  365  2 = 1,460,000 !!! 올 해 지난 해 강남권 매장들의 올 해 매출액을 지난 해 매출액과 비교하면 ? 의사결정을 위한 분석적 질의

1999/ 12/ 8 Juyoung Kang. Database Lab7 Approaches Requirement for simultaneous Multidimensional aggregation Cube operator [GBLP95] PipeSort, PipeHash [AAD+96] OVERLAP [AAD+96] Requirement for right selection of the views to materialize Greedy Algorithm for selecting views [HRU96] One-step algorithm combining selection and indexes [GHRU97] A Array based method for MOLAP system Multi-way Array based Algorithm [ZDN97]

1999/ 12/ 8 Juyoung Kang. Database Lab8 Cubing Algorithms Computing the Data Cube efficiently Computing the Data Cube efficiently Cube Operator [GBLP95] N-Dimensional generalization of simple aggregate function GROUP-BY Compute every possible cell of the data cube Sparsity is not considered SELECT FROM WHERE GROUP BY UNION SELECT FROM WHERE GROUP BY …… SELECT FROM WHERE GROUP BY UNION SELECT FROM WHERE GROUP BY …… GROUP BY CUBE SELECT Model, Year, Ccolor, SUM(sales ) As Sales FROM Sales WHERE Model in ( ‘Ford’, ‘Chevy’) AND Year BETWEEN 1990 AND 1992 GROUP BY CUBE Model, Year, Color

1999/ 12/ 8 Juyoung Kang. Database Lab9 Cubing Algorithms ( Cont’d )

1999/ 12/ 8 Juyoung Kang. Database Lab10 Cubing Algorithms ( Cont’d ) PipeSort, PipeHash [AAD+96] Fast algorithms for computing a collection of group-bys Optimizations for computing multiple group-bys Smallest-parent, Cache-results, Amortize-scans, Shared Sort, Shared-partitions Combine the optimizations to reduce the total cost PipeSort Reducing the problem to a minimum weight matching problem on a bipartite graph PipeHash Deciding the order of group-by and choosing a shared partition that takes into account the memory availability

1999/ 12/ 8 Juyoung Kang. Database Lab11 Cubing Algorithms ( Cont’d ) Performance Results of PipeSort & PipeHash 2-8 times faster than the naive methods PipeHash is within 8% and PipeSort is within 22% of these lower bound Data Set 에 따른 PipeSort, PipeHash 의 성능평가

1999/ 12/ 8 Juyoung Kang. Database Lab12 Cubing Algorithms ( Cont’d ) OVERLAP [AAD+96] One particular sorting based scheme Overlaps the computation of different cuboids and minimizes the number of scans ( disk I/O ) needed PipeSort VS. OVERLAP PipeSort : Scanning, sorting cost 를 줄이기 위해 각 Group-by 의 size 를 고려해 sort order 를 결정 => 하나이상 의 order OVERLAP : 하나의 sort order, 다중 pipelined fashion, Group-by 의 size 를 고려하지 않으며, partition 을 이용해 computation 이 더 많이 overlap 되도록 함.

1999/ 12/ 8 Juyoung Kang. Database Lab13 Cubing Algorithms ( Cont’d ) Selection of the Views ( group-bys ) Selection of the Views ( group-bys ) Efficient implementation of data cube Efficient implementation of data cube [HRU96] Plan for the right selection of the views to materialize - What and How much to precompute psc 6M pc 6Msc 6Mps 0.8M p 0.2Ms 0.01Mc 0.1M None 1 How many views must we materialize to get reasonable performance? Given that we have space S, what views do we materialize so that we minimize average query cost?

1999/ 12/ 8 Juyoung Kang. Database Lab14 Cubing Algorithms ( Cont’d ) The greedy algorithm for selection polynomial-time and competitive ( always gives a solution that is within a constant factor of the optimum ) Guarantee to give at least 63% of the optimum Indexes for selected views [GHRU97] Indexes for selected views [GHRU97] Automated Selection of summary tables and indexes A family of one-step algorithm that select which subcubes and indexes should be computed for improved query performance, given the space constraint

1999/ 12/ 8 Juyoung Kang. Database Lab15 Cubing Algorithms ( Cont’d ) ROLAP vs. MOLAPROLAPROLAPMOLAPMOLAP RDB 기반 MDDB 기반 저장과 분석이 분리 저장과 분석이 통합 Table 형식 저장 Array 형식 저장 적은 공간 차지 많은 공간 차지 상대적으로 느림 빠른 응답성능 일부 변동 시 대처 일부 변동 시 재구축 Scalable Relatively less scalable 제품 매장 (Shoes, WestTown, 3-July-1996, $34) POSITION!POSITION! VALUEVALUE

1999/ 12/ 8 Juyoung Kang. Database Lab16 Cubing Algorithms ( Cont’d ) Cube computation for MOLAP An array-based method for MOLAP system [ZDN97] Identify the tradeoffs between MOLAP/ROLAP Multi-way Array algorithm Overlaps the computation of different group-bys, while using minimal memory for each group-by The dimension order used by the algorithm minimizes the total memory requirement for the algorithm Performance results : response time Performs much better than previously published ROLAP algorithms ( in this case, OVERLAP )

1999/ 12/ 8 Juyoung Kang. Database Lab17 Cubing Algorithms ( Cont’d ) Performance comparison with ROLAP # of valid cells 에 따른 응답성능차원의 수에 따른 응답성능

1999/ 12/ 8 Juyoung Kang. Database Lab18 Conclusions & future works Conclusion 의사결정을 지원하기 위한 OLAP 의 필요성 Response Time OLAP or 다차원 데이터 분석의 Key Demand : Response Time 최적의 응답성능을 위한 approaches Multidimensional aggregation Selection of the views to materialize Method for MOLAP system Future Works Performance techniques on WRITE optimization Slicing, dicing, drilldown, rollup 들의 연산에 관련한 최적화 기법연구 Cube computation 의 병렬처리

1999/ 12/ 8 Juyoung Kang. Database Lab19 References [AAD+96] S. Agrawal, R. Agrawal, P.M. Deshpande, A. Gupta, J.F. Nautghton, R. Ramakrishnan, S. Sarawagi. On the Computation of Multidimensional Aggregates. Proc of the 22nd Int. VLDB Conf.,1996. [GBLP95] J. Gray, A. Bosworth, A. Layman,H. Pirahesh. Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab, and Sub- Totals, Technical Report. MSR-TR-95-22, Microsoft Research, Advance Technology Division, Microsoft Corporation, Redmond, Washington, November 1995 [GHRU97] H. Gupta, V. Harinarayan, A. Rajaraman, J.D. Ullman, Index Selection for OLAP, Proc. ICDE '97, 1997 [HRU96] V. Harinarayanan, A. Rajaraman, J.D. Ullman, Implementing Data Cubes Efficiently, Proc. ACM SIGMOD Int. Conf. On Management of Data, , 1996 [ZDN97] Yihong Zhao, Prasad M. Deshpande, J.F. Naughton, An Array-Based Algorithm for Simultaneous Multidimensional Aggregates. In Proceedings of the 1997 SIGMOD Conference, Tucson, Arizona, May, 1997