Download presentation
Presentation is loading. Please wait.
Published byDora Blanche Copeland Modified over 9 years ago
1
Bogdan Shishedjiev Data Analysis1 Data Analysis OLTP and OLAP Data Warehouse SQL for Data Analysis Data Mining
2
Bogdan Shishedjiev Data Analysis 2 Data Processing Data processing types –OLTP (On Line Transaction Processing) –OLAP (On-line Analytical Processing ) Database types –Transactional Numerous users Dynamic (flickering) Always maintaining current state Critical (very loaded) –Warehouses Aa few users (analyzers) Relatively stable Maintaining the data history (all states in the time) Not loaded
3
Bogdan Shishedjiev Data Analysis 3 Architecture
4
Bogdan Shishedjiev Data Analysis 4 Architecture Source compnents –Filter – separate and assure the coherency of data to be exported –Export – do the transfer of data portions in precise moments of time. Warehouse components –Loader – Initial loading and preparing the warehouse. –Refresh – loads the portions –достъп –data mining –Export – to other warehouses. This creates an hierarchy of warehouses
5
Bogdan Shishedjiev Data Analysis 5 Relational Schemes Star
6
Bogdan Shishedjiev Data Analysis 6 Relational Schemes Snowflake
7
Bogdan Shishedjiev Data Analysis 7 Data Warehouse Design Stages –Choose the activity processes to model –Choose the granularity of the activity procesus –Choose the dimensions that can be applied to every record of the fact table. –Choose the facts that must be recorded in the fact table
8
Bogdan Shishedjiev Data Analysis 8 Data Warehouse Design Fact types – the most valuable are numerical continuous values –Additive – they can be added along all dimensions (Money amounts) –Semi-additive – they can be added along some of dimensions (Precipitations, Product quantities) –Non-additive – they cannot be added along any dimension (Wind speed, wind direction)
9
Bogdan Shishedjiev Data Analysis 9 Data Warehouse Design Recommendations –Use continuous additive numerical values –The fact table is highly normalized –Don’t normalize the dimensions. The gain is < 1% –Design thoroughly the dimension attributes. Most often they are textual and discrete. They are used as headings and constraint sources in the answers to users
10
Bogdan Shishedjiev Data Analysis 10 Example – Hypermarket Chain time_key day_of_ week day_no_in_month day_no_overall week_no_ln_year week_no_overall month month_no_overall quarter fiscal_period holiday_flag weekday_flag last_day_in_month_flag season event Dimension time promotion_key promotion_name price__reduction_ type ad_type dlsplay_ type coupon_type ad_media_name display_provider promo_cost promo_begin_date promo_end-date..and others Promotion product_key SKU(stock keeping units )_description SKU_number package_size brand subcategory category departement package_ type diet_type weight weight_unit_of_mesure units_per_retail_case units_per_shipping_case cases_per_pallet shelf_width shelf_height shelf_depth..and others Dimension product stoie_key store_ name store_number store_street_address store_city store_county store_state store_zip store__manager store_phone store_FАX floor_plan_ type photo_processing_type finance_services_type first_opened_date last_remodel_date store_surface grocery_surface frozen-surface..and others Dimension shop Time_key product_key Store_key promotionkey dollar_sales units_sales dollar_cost customer-count Facts - Sales
11
Bogdan Shishedjiev Data Analysis 11 Hypermarket Chain Fact table –Granularity – each sell of a product (SKU – Stock Keeping Unit) –Values – Total cost (additive), SKU quantity (semi- additive), price (non-additive), customer count (non additive) Dimensions –Time –Product –Store –Promotion
12
Bogdan Shishedjiev Data Analysis 12 Hypermarket Chain Calculation of disk space needed –Dimension time : 2 years x 365 days = 730 days –Dimension shop : 300 shops, everyday records –Dimension product : 30.000 products in each shop; 3000 are sold every day in each shop. –Dimension promotion : An article can participate in only one promotion in a shop during one day. –Elementary fact records 300 x 730 x 3000 x 1 = 657.10 6 records –Key field number 4; Value field number 4 ; Total number osf fields =8 –Fact table size - 657.10 6 x 8 fields x 4B = 21 GB
13
Bogdan Shishedjiev Data Analysis 13 Data Operations for Data Analysis General form of a SQL statement select D1.C1,... Dn.Cn, Aggr1(F,Cl),…, Aggrn(F,Cn) from Fact as F, Dimension1 as D1,... DimensionN as Dn where join-condition (F, D1) and... and join-condition (F, Dn) and selection-condition group by D1.C1,... Dn.Cn order by D1.C1,... Dn.C
14
Bogdan Shishedjiev Data Analysis 14 Data Operations for Data Analysis Example select Time.Month, Product.Name, sum(Qty) from Sale, Time, Product, Promotion where Sale.TimeCode = Time.TimeCode and Sale.ProductCode = Product.ProductCode and Sale.PromoCode = Promotion.PromoCode and (Product. Name = ' Pasta' or Product.Name = 'Oil') and Time.Month between 'Feb' and 'Apr' and Promotion.Name = 'SuperSaver' group by Time.Month, Product.Name order by Time.Month, Product.Name pivot Time.Month FebMarApr Oil5K5K7K Pasta45K50K51K
15
Bogdan Shishedjiev Data Analysis 15 Data Cube The cube is used to represent data along some measure of interest. Although called a "cube", it can be 2-dimensional, 3-dimensional, or higher-dimensional. Each dimension represents some attribute in the database and the cells in the data cube represent the measure of interest.
16
Bogdan Shishedjiev Data Analysis 16 Data Cube Data cube representation CombinationCount {P1, Calgary, Vance}2 {P2, Calgary, Vance}4 {P3, Calgary, Vance}1 {P1, Toronto, Vance}5 {P3, Toronto, Vance}8 {P5, Toronto, Vance}2 {P5, Montreal, Vance}5 {P1, Vancouver, Bob}3 {P3, Vancouver, Bob}5 {P5, Vancouver, Bob}1 {P1, Montreal, Bob}3 {P3, Montreal, Bob}8 {P4, Montreal, Bob}7 {P5, Montreal, Bob}3 {P2, Vancouver, Richard}11 CombinationCount {P3, Vancouver, Richard}9 {P4, Vancouver, Richard}2 {P5, Vancouver, Richard}9 {P1, Calgary, Richard}2 {P2, Calgary, Richard}1 {P3, Calgary, Richard}4 {P2, Calgary, Allison}2 {P3, Calgary, Allison}1 {P1, Toronto, Allison}2 {P2, Toronto, Allison}3 {P3, Toronto, Allison}6 {P4, Toronto, Allison}2
17
Bogdan Shishedjiev Data Analysis 17 Data Cube Totals - the value ANY or ALL or NULL
18
Bogdan Shishedjiev Data Analysis 18 Data Cube Drill down – adding a dimension for more detailed results Time. MonthProduct.Namesum(Qty) FebPasta48K MarPasta50K AprPasta51K TIme.MonihProduct.NameZonesum(Qty) FebPastaNorth18K FebPastaCentre18K FebPastaSouth12K MarPastaNorth18K MarPastaCentre18K MarPastaSouth14K AprPastaNorth18K AprPastaCentre17K AprPastaSouth16K
19
Bogdan Shishedjiev Data Analysis 19 Data Cube Roll-up - removing dimension TIme.MonihProduct.NameZonesum(Qty) FebPastaNorth18K FebPastaCentre18K FebPastaSouth12K MarPastaNorth18K MarPastaCentre18K MarPastaSouth14K AprPastaNorth18K AprPastaCentre17K AprPastaSouth16K Product.NameZonesum(Qty) PastaNorth54K PastaCentre53K PastaSouth42K
20
Bogdan Shishedjiev Data Analysis 20 Data Cube TIme.MonihProduct.NameZonesum(Qty) FebPastaNorth18K FebPastaCentre18K FebPastaSouth12K MarPastaNorth18K MarPastaCentre18K MarPastaSouth14K AprPastaNorth18K AprPastaCentre17K AprPastaSouth16K ALLPastaNorth54K ALLPastaCentre53K ALLPastaSouth42K FebPastaALL48K MarPastaALL50K AprPastaALL51K ALLPastaALL149K ALL 149K The whole data cube
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.