DWH, Prof. Bayer, SS Caller Prefixsmallint100 Numberinteger10 7 Namestring Adress...string... Callee Prefixsmallint100 Numberinteger10 7 Namestring.

Slides:



Advertisements
Similar presentations
Vorlesung Datawarehousing Table of Contents Prof. Rudolf Bayer, Ph.D. Institut für Informatik, TUM SS 2002.
Advertisements

An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Multidimensional Data
COMP 451/651 Indexes Chapter 1.
Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.
ITIS 5160 Indexing. Indexing datacubes Objective: speed queries up. Traditional databases (OLTP): B-Trees Time and space logarithmic to the amount of.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Physical Data Warehouse Design Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.
1 9 Adv. DBMS Data Warehouse CSC5301 Review Hachim Haddouti.
Multidimensional Data Many applications of databases are ``geographic'' = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates
Information Retrieval Space occupancy evaluation.
BTREE Indices A little context information What’s the purpose of an index? Example of web search engines Queries do not directly search the WWW for data;
Communications Technology 2104 Mercedes Lahey. Bit 1. bit=From a shortening of the words “binary digit” 2. the basic unit of information for computers.
Algorithms for Information Retrieval Is algorithmic design a 5-mins thinking task ???
8.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 08 Main Memory (Page table questions)
1 Cube Computation and Indexes for Data Warehouses CPS Notes 7.
ITCS 6163 Lecture 5. Indexing datacubes Objective: speed queries up. Traditional databases (OLTP): B-Trees Time and space logarithmic to the amount of.
OBJECTIVES  Explain why a computer represents data in the form of binary  Explain the terms related to data storage: bit, byte, character, word  Calculate.
© 1998 FORWISS FORWISS Oracle Measurement Results Prof. Bayer, PhD Dipl.-Inform. Volker Markl Roland Pieringer.
© 1999 FORWISS FORWISS MISTRAL und DWH 6-2 Processing Relational Queries Using the Multidimensional Access Method UB-Tree Prof. R. Bayer, Ph.D. Dr. Volker.
Prof. Bayer, DWH, Ch.4, SS Chapter 4: Dimensions, Hierarchies, Operations, Modeling.
1 3 Computing System Fundamentals 3.2 Computer Architecture.
© 1999 FORWISS FORWISS MISTRAL Performance of TPC-D Benchmark and Datawarehouses Prof. R. Bayer, Ph.D. Dr. Volker Markl Dept. of Computer Science, Technical.
Exercises for the Course Datawarehousing, SS 2002, Prof. R. Bayer, TUM Exercise Sheet 2 Exercise 4, Ch.2-5: Schema optimization Consider the data warehouse.
Prof. Bayer, DWH, CH. 4.5, SS Chapt.4.5 Modeling of Features of Dimensions Within a dimension hierarchy, elements at the same level may have different.
Prof. Bayer, DWH, Ch.5, SS Chapter 5. Indexing for DWH D1Facts D2.
Buffer-pool aware Query Optimization Ravishankar Ramamurthy David DeWitt University of Wisconsin, Madison.
CSCI 4333 Database Design and Implementation – Exercise (5) Xiang Lian The University of Texas – Pan American Edinburg, TX
Graphics in a computers memory How a picture (i.e. a graphic) is stored in a computers memory A computer screen is made up of little dots, called PICture.
Chapter 3.2 Basic Concepts of the MDD-Model
Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.
Prof. Bayer, DWH, Ch.7, SS20021 Chapt. 7 Multidimensional Hierarchical Clustering Fig. 3.1 Hierarchies in the `Juice and More´ schema Year (3) Month (12)
CS411 Database Systems Kazuhiro Minami 10: Indexing-1.
Prof. Bayer, DWH, Ch.6, SS Chapter 6: UB-tree for Multidimensional Indexing Note: all relational databases are multidimensional: a tuple in a relation.
Computer Performance. Hard Drive - HDD Stores your files, programs, and information. If it gets full, you can’t save any more. Measured in bytes (KB,
Binary Numbers. Base 10 and Base 2  We normally work with numbers in base 10.  In base 10 we use the digits 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9.  Everything.
© 2016 AQA. Created by Teachit for AQA Number bases and Units of information Lesson.
NetFlow Analyzer Best Practices, Tips, Tricks. Agenda Professional vs Enterprise Edition System Requirements Storage Settings Performance Tuning Configure.
CSCI 4333 Database Design and Implementation – Exercise (5)
CPS216: Data-intensive Computing Systems
ITIS 5160 Indexing.
Indexing & querying text
Multidimensional Access Structures
Scientific Computing Lab
PRIMARY STORAGE.
Indexes.
RECORD. RECORD Subspaces of Vector Spaces: Check to see if there are any properties inherited from V:
Native Multidimensional Indexing in Relational Databases
Data Structures and Algorithms
CS 140 Lecture Notes: Technology and Operating Systems
Scientific Computing Lab
Chapter 4: Dimensions, Hierarchies, Operations, Modeling
Bits, Bytes, and Storage.
Lecture 15: Bitmap Indexes
Project Title, date “Team Name: Team members, …”
CSCI 4333 Database Design and Implementation – Exercise (5)
Native Multidimensional Indexing in Relational Databases
Page Table Implementations
Technology 3 Bits & Bytes.
Prof. R. Bayer, Ph.D. Dr. Volker Markl
Slides based on those originally by : Parminder Jeet Kaur
Chapt. 7 Multidimensional Hierarchical Clustering
Chapter 10.1: UB-tree for Multidimensional Indexing
Chapter 6: UB-tree for Multidimensional Indexing
Oracle Measurement Results
Efficient Aggregation over Objects with Extent
Presentation transcript:

DWH, Prof. Bayer, SS Caller Prefixsmallint100 Numberinteger10 7 Namestring Adress...string... Callee Prefixsmallint100 Numberinteger10 7 Namestring Adress...string... TimeOfCall Yearsmallint10 Monthstring12 Daysmallint31 Hoursmallint24 Minutesmallint60 RateCentsdecimal LocCaller XCoordinteger 10 4 YCoordinteger 10 4 LocCalleee XCoordinteger 10 4 YCoordinteger 10 4 CallsFacts Prefix...YCoord (13 key components, 7 dimensions) DurationSecinteger Solution proposal for exercise 1 on sheet 1

DWH, Prof. Bayer, SS Solution proposal for exercise 2 on sheet calls/day * 365 days/year * 49 B/call = * B/year ~ 2 TB/year The size of the space spanned by the dimensions is: 100*10 7 *100 *10 7 *10 * 12 *31*24*60* 10 4 * 10 4 * 10 4 * 10 4 = 5,4 * The number of Tuples is 10 8 calls/day * 365 days/year = 3.65 * Sparsity 1- (3.65 * / 5,4 * ) ~ = i.e. extremely sparse, but not unusual for datamining

DWH, Prof. Bayer, SS Solution proposal for exercise 3 on sheet 1 The partially aggregated cube has 100 * 100 * 10 * 12 * 31 * 24 * 60 * = 5,4 * tuples, it cannot be computed or stored with forseeable technology.

DWH, Prof. Bayer, SS Solution proposal for exercise 4 on sheet 1 1. Bit Vectors of length 3.65*10 10 Bits = 36.5*10 9 Bits = 5*10 9 B = 5 GB uncompressed. Bit vectors for TimeOfCall : 137 vectors of 5 GB ~ 685 GB Bit vectors for LocCaller : vectors of 5 GB ~ 50 TB per coordinate  Bit vectors usable at most for TimeOfCall, but not for other dimensions 2. Compound B-Trees: 45B/compound key + 4B/fact relation size ~ 2 TB/8KB/page = 0.25*10 9 pages = pages Height of B-Tree: at 45 B/key + 4 B/pointer  branching degree ~ 160 for 8 KB pages  height 5

DWH, Prof. Bayer, SS Solution proposal for exercise 4 on sheet 1 continued Example Query1: select Name, Prefix, Number, Year, Month, sum(duration) from Caller C, CallsFacts F where C.Name = ‘Rudolf Bayer’ and C.Prefix = F.Prefix and C.Number = F.Number group by Year, Month Time estimate: ~ 10 calls/day = 3650 calls/year = 3650 calls/year* 49 B/call = B/year/ 8000 B/page = 23 pages*10 ms/page ~ ¼ second with B-Tree index Example Query2:... from Callee C...  B-Tree not usable  relation scan at 10 MB/s  ~ 2 days