Download presentation
Presentation is loading. Please wait.
Published byJoella Conley Modified over 9 years ago
1
OLAP On Line Analytic Processing
2
OLTP On Line Transaction Processing –support for ‘real-time’ processing of orders, bookings, sales –typically access to single rows of tables –current data only - current products, available flights –supports Operational level of organisational hierarchy
3
Analysis Support for –tactical (e.g. stock ordering in view of forecast demand –strategic (e.g. where to open a new store) Need to ‘understand the data’ - ‘Business Intelligence’
4
Requirements Operations –Summation –Statistical analysis - variation for standard –Ranking and percentages –Comparison over time - last year-to-date ‘Static data picture’ to avoid ‘inconsistent reads’ during analysis production, and comparisons between analyses Addition data - forecasting models, external data, structural data (location of tills, of products on shelves)
5
Analysis pre database 50% of code used to create complex bulk reports on how the company was performing inflexible, costly, untimely ‘raw data’ too large to retain, so aggregation required - restricts year-to year comparison to level of prior aggregation
6
OLAP/Data warehouse Specialised tools, data structures to support analysis ‘on line’ i.e. on demand Data will be a snap-shot to ensure consistency Hold base ‘fact’ table - raw basic facts - credit card transaction, item sale, booking + multiple analytic dimensions - –time, product, customer, store
7
SuperMarket ‘Basket’ data Fact - a single line on a till receipt –basket no –date –customer no –product code –till no –quantity –price Dimensions –Customer, Product, Time, Till/Store
8
Star Schema
9
Kinds of Attributes Measures –continuously varying –interval,ratio scales –able to be summed, ranked etc Dimensions –nominal scales - no ordering –may be classified in multiple ways Date/Time –interval scale - i.e ordered only –can be treated as dimension and classified
10
Product dimension Product e.g. size 9 Bata slippers –Product category - shoes Product group - clothing –Size e.g. shoe9 –Range e.g Bata –Product class e.g. cheap
11
Time Dimension Date e.g. 11 Dec 2002 –Day of Week e.g. Wednesday –Week e.g. 49 –Monthe.g. December Qtr e.g.3 –Year e.g. 2002 –Promotion period e.g. Pre-Christmas Sales –Season e.g. Autumn
12
Snowflake schema
13
Query Processing Consider a typical analytic query –How do sales of clothing vary by day of week in stores in the SW region? –Select product.name,dow.name,sum(qty*price) –from sales, product, productCategory, productGroup,date,DOW –where ( the 5 join conditions) –and (productGroup.name=‘Clothing’) –group by DOW.name,product.name –order by DOW.name,product.name
14
Difficulties Fact table is huge - must be compressed as much as possible by reducing field sizes etc Since dimension data is smaller and more stable, OK to denormalise to reduce joins –date - add dow.id,month.id,year,fiscalYear,… –Balance required –Denormalised dimensions result in ‘Starflake schema’
15
Aggregation Simple case : –sales - date,product,cust,store,value –How many aggregations are possible –Store(5) Product(10) Customer(30)
16
Aggregation operations Cube - all possible aggregations –8 for 3 dimensions Roll-up - aggregate in order –e.g. Product,Store,Cust –4 for 3 dimensions Slice and Dice –Takes parts or slices of a cube of aggregations Drill down –Given an aggregation, expand an aggregated dimension e.g. Expand clothing sales analysis by City
17
Data Mining OLAP requires a priori assumptions about the categories of interest But a useful category may be ‘hidden’ - can it be discovered a posteriori ? –e.g. identify high risk motor insurance policies by attributes of policy - gender, age, type of vehicle, job, postcode Rule induction (machine learning) methods can be used
18
Finding relationships –looking for deviations from normal behaviour - e.g. to identify fraudulent transactions in a credit card company –looking for deviations from average e.g. Non- random combinations of goods in a basket - classic example is beer and nappies Requires heavy aggregations, statistical selection, rule induction
19
ETL Extract - Transform - Load OLAP databases are often said to be read only - but all need periodic updating with new data extracted from sources, validation, re-organising and load, whilst maintaining aggregations and index
20
Extract –new or changed data from the OLTP –changed product and structural data –external data External data may be in legacy systems, remote databases, flat files
21
Transform Filtering out bad transaction data Validating against database
22
Load Load new facts Re-de-normalising if product dimensions change Re-aggregation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.