Data Warehousing COMP3017 Advanced Databases Dr Nicholas Gibbins – 2012-2013.

Slides:



Advertisements
Similar presentations
An overview of Data Warehousing and OLAP Technology Presented By Manish Desai.
Advertisements

OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Online Analytical Processing OLAP
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Data Sources Data Warehouse Analysis Results Data visualisation Analytical tools OLAP Data Mining Overview of Business Intelligence Data visualisation.
Dr. M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2010 COMP207: Data Mining Data Warehousing COMP207: Data Mining.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
1 9 Concepts of Database Management, 4 th Edition, Pratt & Adamski Chapter 9 Database Management Approaches.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 29 Overview of Data Warehousing and OLAP.
The University of Akron Dept of Business Technology Computer Information Systems Database Management Approaches 2440: 180 Database Concepts Instructor:
1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.
Data Warehousing. On-Line Analytical Processing (OLAP) Tools The use of a set of graphical tools that provides users with multidimensional views of their.
1 9 Adv. DBMS Data Warehouse CSC5301 Review Hachim Haddouti.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
DATA WAREHOUSE (Muscat, Oman).
CS346: Advanced Databases
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Chapter 13 – Data Warehousing. Databases  Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age  Information,
M ODULE 5 Metadata, Tools, and Data Warehousing Section 4 Data Warehouse Administration 1 ITEC 450.
1 Basic concepts of On-Line Analytical processing DT211 /4.
8/20/ Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously. Defined in many different ways, but.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Data Warehouse & Data Mining
OnLine Analytical Processing (OLAP)
Datawarehouse Objectives
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Data Warehousing.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Ayyat IT Group Murad Faridi Roll NO#2492 Muhammad Waqas Roll NO#2803 Salman Raza Roll NO#2473 Junaid Pervaiz Roll NO#2468 Instructor :- “ Madam Sana Saeed”
OLAP in DWH Ján Genči PDT. 2 Outline OLAP Definitions and Rules The term OLAP was introduced in a paper entitled “Providing On-Line Analytical.
Data Warehousing Multidimensional Analysis
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
Data Warehousing.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
An Overview of Data Warehousing and OLAP Technology
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
11/20/ :11 AMData Mining 1 Data Mining – CSE 9033 Chapter – 1; Data Warehousing Dr. Goutam Sarker, B.E., M.E., Ph.D.(Engineering), Fellow: IE(I),
Overview of Data Warehousing (DW) and OLAP
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Chapter 13 Business Intelligence and Data Warehouses
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Data Warehouse.
Chapter 13 – Data Warehousing
Online Analytical Processing OLAP
Data Warehouse and OLAP
Data Warehousing: Data Models and OLAP operations
Introduction of Week 9 Return assignment 5-2
OLAP in DWH Ján Genči PDT.
Data Warehouse.
Data Warehouse and OLAP
Presentation transcript:

Data Warehousing COMP3017 Advanced Databases Dr Nicholas Gibbins –

Processing Styles – OLTP 2 On-Line Transaction Processing – Traditional workloads, ‘ bread and butter ’ processing –Volumes of data, transactions grow, networks getting larger.

Processing Styles – OLAP 3 On-Line Analytical Processing –includes the use of data warehouses –multidimensional databases –data analysis

Online Analytical Processing 4 OLAP is the name given to the dynamic enterprise analysis required to create, manipulate, animate and synthesise information from exegetical, contemplative and formulaic data analysis models

Online Analytical Processing 5 OLAP is the name given to the dynamic enterprise analysis required to create, manipulate, animate and synthesise information from exegetical, contemplative and formulaic data analysis models Exegesis: critical explanation How did we get to where we are?

Online Analytical Processing 6 OLAP is the name given to the dynamic enterprise analysis required to create, manipulate, animate and synthesise information from exegetical, contemplative and formulaic data analysis models Asking ‘what if?’ questions How does the outcome change if we vary the parameters?

Online Analytical Processing 7 OLAP is the name given to the dynamic enterprise analysis required to create, manipulate, animate and synthesise information from exegetical, contemplative and formulaic data analysis models Which parameters must be varied in order to achieve a given outcome?

1.Multidimensional conceptual view 2.Transparency 3.Accessibility 4.Consistent reporting performance 5.Client-server architecture 6.Generic dimensionality 7.Dynamic sparse matrix handling 8.Multi-user support 9.Unrestricted cross-dimensional operations 10.Intuitive data manipulation 11.Flexible reporting 12.Unlimited dimensions and aggregation levels 12 Rules for OLAP 8

Data Mining 9 Data mining is the process of discovering hidden patterns and relations in large databases using a variety of advanced analytical techniques Data mining attempts to use the computer to discover relationships that can be used to make predictions Data mining tools often find unsuspected relationships in data that other techniques will overlook

Data Mining Approaches 10 Rule-based analysis Neural networks Fuzzy Logic K-nearest-neighbour Genetic algorithms Advanced visualisation Combination of any of the above

11 A data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data that is used primarily in organisational decision making The Data Warehouse

12 A data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data that is used primarily in organisational decision making The Data Warehouse The data is organised according to subject instead of application and contains only the information necessary for ‘decision support ’ processing.

13 A data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data that is used primarily in organisational decision making The Data Warehouse Data encoding is made uniform (e.g. sex = f or m, 1 or 2, b or g - needs to be all the same in the warehouse). Data naming is made consistent.

14 A data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data that is used primarily in organisational decision making The Data Warehouse Data is collected over time and can then be used for comparisons, trends and forecasting

15 A data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data that is used primarily in organisational decision making The Data Warehouse The data is not updated or changed once in the data warehouse, but is simply loaded, and then accessed. The data warehouse is held quite separately from the operational database, which supports OLTP.

Why a Separate Data Warehouse? 16 Performance –Operational databases are optimised to support known transactions and workloads –Special data organisation, access methods and implementation methods are needed –Complex OLAP queries would degrade performance for operational transactions

Why a Separate Data Warehouse? 17 Missing data –Decision support requires historical data, which operational databases do not typically maintain Data consolidation –Decision support requires consolidation (aggregation, summarisation) of data from many heterogeneous sources, including operational databases and external sources Data quality –Different sources typically use inconsistent data representations, codes and formats, which have to be reconciled

Extracting Data 18 Car Policy Claim OPERATIONAL DATA WAREHOUSE Customers replace change insert change replace insert Liability Risk EIS DSS Analysis Etc - Data is cleansed - Data is restructured

The Data Warehouse 19 A Data Warehouse may be realised: –via a front end to existing databases and files –in a fresh relational database –in a multidimensional database (MDDB) –in a proprietary database format –using a mixture of the above

The Data Warehouse 20 Data may be accessed in various ways: –Decision Support Systems (DSS) –Executive Information Systems (EIS) –Data Mining –On-Line Analytical Processing

Data Marts 21 A data mart focuses on –only one subject area, or –only one group of users An organisation can have –one enterprise data warehouse –many data marts Data marts do not contain operational data Data marts are more easily understood and navigated

Multidimensional Analysis 22 Need to examine data in various ways Produce views of multidimensional data for users: –Slice –Dice –Pivot –Drill down –Roll up

Multidimensional Analysis – Slice 23 Operator Performance Time One operator's performance over time Overall performance on a particular day Overall performance for one criterion over time Train Performance - 3 dimensions - Operators - Performance - Time

Multidimensional Analysis – Dice Juice Cola Milk Cream Toothpaste Soap Month N W S Region Product

Multidimensional Analysis – Pivot Juice Cola Milk Cream Toothpaste Soap Month N W S Region Product Month Region

Multidimensional Analysis – Drill Down Juice Cola Milk Cream Toothpaste Soap Month N W S Region Product by brand

Multidimensional Analysis – Roll Up Beverages Toiletries Month N W S Region Product

Internal Aspects 28 Schemas –Star schema –Snowflake schema –Fact constellation schema Aggregated data Specialised indexes –Bit map indexes (see lecture on multidimensional indexes) –Join indexes Specialised join methods

Star Schema 29 Time Code Quarter Code Quarter Name Date Month Code Month Name Day Code Day of Week Season Geography Code Region Code Region Manager City Code City Name Post Code Account Code Key Account Code Key Account Name Account Name Account Type Account Market Product Code Product Name Brand Manager Brand Name Prod Line Code Prod Line Name Prod Line Mgr Product Name Product Colour Product Model No Geography Code Time Code Account Code Product Code Sterling Amount Units Time Sales Account Product

Fact Tables key columns joining fact table to the dimension tables numerical measures Prod_CodeTime_CodeAcct_CodeSalesQty

Part of a Snowflake Schema 31 Time Code Year Code Quarter Code Month Code Day Code Product Code Prod Line Code Brand Code Geography Code Time Code Account Code Product Code Sterling Amount Units Time Sales Product Quarter Code Quarter Name Quarter Month Code Month Name Month Day Code Day of Week Season Day Product Name Product Colour Product Model No Product Desc Brand Code Brand Manager Brand Name Brand Prod Line Code Prod Line Name Prod Line Mgr Product Line

Data Warehouse Databases 32 Relational and Specialised RDBMSs –Specialised indexing techniques, join and scan methods Relational OLAP (ROLAP) servers –Explicitly developed to use a relational engine to support OLAP –Include aggregation navigation logic, the ability to generate multi- statement SQL, and other additional services Multidimensional OLAP (MOLAP) servers –The storage model is an n-dimensional array –May use a 2-level approach, with 2-D dense arrays indexed by B-Trees –Time is often one of the dimensions