Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.

Similar presentations


Presentation on theme: "Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision."— Presentation transcript:

1 Advanced Querying OLAP Data Warehousing

2 Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision support –Offline setting –Strategic planning (statistics)

3 Transaction Processing Transaction processing Operational setting Up-to-date = critical Simple data Simple queries Flight reservations ticket sales do not sell a seat twice reservation, date, name Give flight details of X List flights to Y

4 Transaction Processing Database must support –simple data tables –simple queries select from where … –consistency & integrity CRITICAL –concurrency Relational databases, Object-Oriented, Object-Relational

5 Decision Support Decision support Off-line setting « Historical » data Summarized data Different databases Statistical queries Flight company Evaluate ROI flights Flights of last year # passengers on line L Passengers, fuel costs, maintenance info Average % of seats sold/month/destination

6 A decision support DB that is maintained separately from the organization’s operational databases. Why Separate Data Warehouse? High performance for both systems –DBMS— tuned for OLTP access methods, indexing, concurrency control, recovery –Warehouse—tuned for OLAP complex OLAP queries, multidimensional view, consolidation. Different functions and different data –Missing data: Decision support requires historical data which operational DBs do not typically maintain –Data consolidation: DS requires consolidation (aggregation, summarization) of data from heterogeneous sources –Data quality: different sources typically use inconsistent data representations, codes and formats which have to be reconciled Data Warehouse

7 Three-Tier Architecture Data Warehouse Extract Transform Load Refresh OLAP Engine Monitor & Integrator Metadata Data Sources Front-End Tools Serve Data Marts Operational DBs other sources Data Storage OLAP Server Analysis Query/Reporting Data Mining ROLAP Server

8 OLAP OLAP = OnLine Analytical Processing –Online = no waiting for answers OLAP system = system that supports analytical queries that are dimensional in nature.

9 This Lecture Examples of decision support queries Data Cubes –Conceptual data model –Typical operations Implementation –ROLAP vs MOLAP –Indexing structures SQL:1999 support for OLAP

10 Examples of Queries Flight company: evaluate ticket sales –give total, average, minimal, maximal amount –per date: week, month, year –by destination/source port/country/continent –by ticket type –by # of connections –…

11 Characteristics One special attribute: amount  measure Other attributes: select relevant regions  dimensions Different levels of generality (month, year, …)  hierarchies Measure data is summarized: sum, min, max, average  aggregations

12 Supermarket example Evaluate the sales of products –Product cost in $ –Customer: ID, city, state, country, –Store: chain, size, location, –Product: brand, type, … –… What are the measure and dimensional attributes, where are the hierarchies? measure Dim. hierarchies

13 Why dimensions? customer store product Cost in $ Multidimensional view on the data

14 Cross Tabulation Cross-tabulations are highly useful –Sales of clothes June  August ‘06 BlueRedOrangeTotal June5125158234 July5820120198 August652251138 Total17467329570 Product: color Date:month, June  August 2006

15 Data cubes Extension of Cross-Tables to multiple dimensions Conceptual notion BlueRedOrangeTotal June5125158234 July5820120198 August652251138 Total17467329570 Dimensions Data Points/ 1 st level of aggregation Aggregated w.r.t. X-dim Aggregated w.r.t. Y-dim Aggregated w.r.t. X and Y

16 Data Cubes Date Product Country sum TV VCR PC 1Qtr 2Qtr 3Qtr 4Qtr Ireland France Germany sum

17 Data Cubes Base cuboid = n-dimensional cube with n number of dimensions The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid The lattice of cuboids forms a data cube

18 Lattice of Cuboids all product, date, country product date country product, date product, country date, country

19 Operations with Data Cubes Scenario: Before starting the analysis task: –what data? select a few relevant dimensions define hierarchy aggregation functions of interest –Pre-materialize load data compute counts/max, min, avg, … on beforehand

20 Operations with Data Cubes What operations can you think of an analyst might find useful? (e.g., store)

21 Operations with Data Cubes What operations can you think of that an analyst might find useful? (e.g., store) –only look at stores in the Netherlands –look at cities instead of individual stores –look at the cross-table for product-date –restrict analysis to 2006, product O1 –go back to a finer granularity at the store level

22 Roll-Up Move in one dimension from a lower granularity to a higher one –store  city –cities  country –product  product type

23 Drill-down Move in one dimension from a higher granularity to a lower one –city  store –country  cities –product type  product Drill-through: –go back to the original, individual data records

24 Pivoting Change the dimensions that are “displayed”; select a cross-tab. –look at the cross-table for product-date –display cross-table for date-customer

25 Slice & dice Select a part of the cube by restricting one or more dimensions –restrict analysis to “city = Eindhoven”

26 Summary of Concepts Cube: Multidimensional view on data –dimensional attributes –measure attribute Operations: –roll-up/drill-down –pivoting –slice and dice

27 Implementation To make query answering more efficient: consolidate (materialize) aggregations Obvious implementation: multidimensional array. –Fast lookup: cell(prod. p, date d, prom. pr): look up index of p1, index of d, index of pr: index = (p x D x PR) + (d x PR) + pr

28 Implementation Multidimensional array –obvious problem: sparse data can easily be solved, though. Example: binary search tree, key on index hash table.

29 Implementation However: very quickly people were confronted with the Data Explosion Problem Consolidating the summaries blows up the data enormously ! Reasons are often misunderstood and confusing.

30 Data Explosion Problem Why? Suppose: –n dimensions, every dimension has d values –d n possible tuples. –Number of cells in the cube: (d+1) n –So, this is not the problem

31 Data Explosion Problem Why? Suppose –n dimensions, every dimension has d values –every dimension has a hierarchy –most extreme case: binary tree  2d possibilities/dimension

32 Data Explosion Problem Why? Suppose –n dimensions, every dimension has d values –every dimension has a hierarchy –most extreme case: binary tree  2d possibilities/dimension  2 n x d n cells Only partial explanation (factor 2 n comes from an extremely pathological case)

33 Data Explosion Problem Why? –The problem is that most data is not dense, but sparse. –Hence, not all d n combinations are possible. Example: 10 dimensions with 10 values –10 000 000 000 possibilities Suppose « only » 1 000 000 are present

34 Data Explosion Problem Example: 10 dimensions with 10 values –10 000 000 000 possibilities Suppose « only » 1 000 000 are present Every tuple increases count of 2 10 cells ! With hierarchies: effect even worse! If every hierarchy has 5 items: 5 10 = 9 765 625 cells!

35 View Selection Problem Suffices to precompute some aggregates, and compute others on demand. –aggregate on (item-name, color) from an aggregate on (item-name, color, size) –For all but a few “non-decomposable” aggregates such as median Several optimizations for computing multiple aggregates –Compute aggregate on (item-name, color) from an aggregate on (item-name, color, size) –Compute aggregates on (item-name, color, size), (item-name, color) and (item-name) in single DB sort

36 View Selection Problem all product, date, country product date country product, date product, country date, country

37 View Selection Problem all product, date, country product date country product, date product, country date, country Which views to select: hard research problem !

38 Implementation Nowadays systems can be divided in three categories: –ROLAP (Relational OLAP) OLAP supported on top of a relational database –MOLAP (Multi-Dimensional OLAP) Use of special multi-dimensional data structures –HOLAP: (Hybrid) combination of previous two

39 ROLAP Cubes can easily be represented in relational tables: special value “all” MonthProd.Cust.Price Janp1c110 Janp2c18 Janp1c210 Febp1c19 … allp1c1102 Janallc118 Janp1all1 230 allallc14 235 … allallall1 253 458

40 ROLAP Typical database scheme: –star schema fact table is central links to dimensional tables –Extensions: snowflake schema –dimensions have hierarchy/extra information attached Star constellation –multiple star schemas sharing dimensions

41 Example of a Star Schema Order No Order Date Customer No Customer Name Customer Address City SalespersonIDSalespersonNameCityQuota OrderNOSalespersonIDCustomerNOProdNoDateKeyCityNameQuantity Total Price ProductNOProdNameProdDescrCategoryCategoryDescriptionUnitPrice DateKeyDate CityNameStateCountry Order Customer Salesperson City Date Product Fact Table

42 Example of a Snowflake Schema Order No Order Date Customer No Customer Name Customer Address City SalespersonIDSalespersonNameCityQuota OrderNOSalespersonIDCustomerNOProdNoDateKeyCityNameQuantity Total Price ProductNOProdNameProdDescrCategoryCategoryUnitPrice DateKeyDateMonth CityNameStateCountry Order Customer Salesperson City Date Product Fact Table CategoryNameCategoryDescr MonthYear Year StateNameCountry Category State Month Year

43 branch_key branch_name branch_type time_key day day_of_the_week month quarter year Measures Branch Time item_key item_name brand type supplier_key Item location_key street city Province/street country Location Sales Fact Table Avg_sales Euros_sold Unit_sold Location_key Branch_key Item_key Time_key shipper_key shipper_name location_key shipper_type shipper unit_shipped Euros_sold to_location from_location shipper_key Item_key Time_key Shipping Fact Table Multiple fact tables share dimension tables Example of Fact Constellation

44 SQL 1999 support for OLAP see other set of slides


Download ppt "Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision."

Similar presentations


Ads by Google