Data warehouse and OLAP

Slides:



Advertisements
Similar presentations
An overview of Data Warehousing and OLAP Technology Presented By Manish Desai.
Advertisements

Business Information Warehouse Business Information Warehouse.
Chapter 13 The Data Warehouse
C6 Databases.
April 30, Data Warehousing and OLAP Technology: An Overview  What is a data warehouse?  Data warehouse architecture  From data warehousing to.
Data Warehousing M R BRAHMAM.
Data Warehouse Architecture Sakthi Angappamudali Data Architect, The Oregon State University, Corvallis 16 th May, 2005.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Data Warehouse IMS5024 – presented by Eder Tsang.
Chapter 13 The Data Warehouse
DATA WAREHOUSE (Muscat, Oman).
1 Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously.  A decision support database that is maintained.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
An Overview of Data Warehousing and OLTP Technology Presenter: Parminder Jeet Kaur Discussion Lead: Kailang.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Business Intelligence Instructor: Bajuna Salehe Web:
Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.
Dr. Bernard Chen Ph.D. University of Central Arkansas
8/20/ Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously. Defined in many different ways, but.
An overview of Data Warehousing and OLAP Technology
Bab 3 Data Warehousing. Why Data Warehouse? Scenario 1 ABC Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager.
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
Datawarehouse & Datamart OLAPs vs. OLTPs Dimensional Modeling Creating Physical Design Using SQL Mgt. Studio Module II: Designing Datamarts 1.
Data Warehouse and Business Intelligence Dr. Minder Chen Fall 2009.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
CISB594 – Business Intelligence
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
CISB594 – Business Intelligence Data Warehousing Part I.
UNIT-II Principles of dimensional modeling
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
MAIN BOOKS 1. DATA WAREHOUSING IN THE REAL WORLD : Sam Anshory & Dennis Murray, Pearson 2. DATA MINING CONCEPTS AND TECHNIQUES : Jiawei Han & Micheline.
CISB594 – Business Intelligence Data Warehousing Part I.
Data Mining Data Warehouses.
CISB594 – Business Intelligence Data Warehousing Part I.
January 21, 2016Data Mining: Concepts and Techniques 1 Chapter 3: Data Warehousing and OLAP Technology: An Overview What is a data warehouse? A multi-dimensional.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
An Overview of Data Warehousing and OLAP Technology
Introduction to Data Warehousing. Subject: Data Warehousing.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Data Mining: Data Warehousing
Data warehouse.
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Data Warehouse.
Chapter 13 The Data Warehouse
Data Warehouse—Subject‐Oriented
OLAP Concepts and Techniques
Data Warehouse.
Data Warehouse and OLAP
Overview of Data Warehousing and OLAP
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
Introduction to Data Warehousing
CHAPTER SIX OVERVIEW SECTION 6.1 – DATABASE FUNDAMENTALS
Data Warehousing: Data Models and OLAP operations
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Data Warehousing Data Model –Part 1
Introduction of Week 9 Return assignment 5-2
Data Warehouse.
Data Warehousing Concepts
Data Warehouse and OLAP
Data Warehouse and OLAP Technology
Presentation transcript:

Data warehouse and OLAP Lecture 1

Goals of a Data warehouse DW provides access to corporate or organizational data. The data in DW is consistent The data in the DW can be separated and combined by means of every possible measure in the business DW is not just data, but also set of tools to query, analyze and present information The DW is a place where we publish used data The quality of data in the DW is a driver of business reengineering

Data Warehousing Knowledge workers Information Operational data Cost Products Brands Cost Suppliers Mktg Customer Care Sales Knowledge workers Information Clients Operational data 10 12

What is data warehouse DW – database that is maintained separately from an organization’s operational database. Provides a solid platform of consolidated historical data for analysis. A data warehouse is a subject-oriented, integrated,time-variant, and nonvolatile collection of data in support of management’s decision-making process.

Enterprise DW Architecture OLTP OLAP Metadata Extract Integrate Transform Maintain External Data Warehouse Reporting Legacy Data Mining Operational Environment Analysis Environment 3

Two types of systems: OLTP – covers most of day-to-day operations of an organization (e.g. purchasing, inventory, manufacturing, banking, payroll, registration, accounting and etc.) OLAP – data analysis and decision making on historical data.

OLTP vs OLAP Characteristic Operational processing Informational processing Orientation transaction analysis User Clerk, DBA, client Knowledge worker (manager, analyst) Function Day-to-day Historical info requirements, decision support DB design ER based, app-oriented Start/snowflake, subject oriented Data Current; up-to-date historical Summarization Primitive, highly detailed Summarized; consolidated View Detailed, flat relational Summarized, multidimensional Unit of work short;, simple transaction Complex query Access Read/write Mostly read

OLTP vs OLAP Focus Data in Information out Operations Index/hash on primary key Lots of scans Number of records accessed tens millions Number of users thousands hundreds DB size 1 GB 100 GB - 1TB Priority High performance High flexibility, end-user autonomy Metric Transaction throughput Query throughput, response time

Data Warehouse Subject -Oriented Organized around major subjects, such as customer, product, sales. Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing. Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.

Data Warehouse - Integrated Constructed by integrating multiple, heterogeneous data sources relational databases, flat files, on-line transaction records Data cleaning and data integration techniques are applied. Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources When data is moved to the warehouse, it is converted.

Data Warehouse – Time Variant Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) Every key structure in the data warehouse Contains an element of time, explicitly or implicitly But the key of operational data may or may not contain “time element”.

Data Warehouse - Non-Volatile A physically separate store of data transformed from the operational environment. Operational update of data does not occur in the data warehouse environment. Does not require transaction processing, recovery, and concurrency control mechanisms Requires only two operations in data accessing: Initial loading of data and access of data.

Relational Database Collection of tables Each table consists of a set of attributes (fields), each of which is assigned with a unique name Stores a large set of tuples (records)

Multidimensional Data Model OLAP tools are based on multidimensional data model View data in the form of data cube Cube defined by dimensions and facts Dimensions – entities with respect to which an organization wants to keep records. (time, item, branch, location, etc.)

Multidimensional Data Model Typically organized around a central theme, like Sales, for instance. The theme is represented by Facts – numerical measures. Ex: dollars_sold (sales amount in dollars), units_sold (number of units sold)

Tables View (2D)

3-D view by adding Location dimension

In Data warehouse the data cube is n- dimensional Suppose we would like to view data with additional fourth dimension like supplier

Degrees of summarization We may display any n-D data as a series of (n-1)-D cubes We can show the data at different degrees of summarization. In SQL GROUP BY statement is used for this purposes.

Lattice of cuboids

Levels of lattice The cuboid that holds the lowest level of summarization is called the base cubiod The 0-D cuboid, which holds the highest level of summarization, is called the apex cuboid. In this case this is total sales summarized over all four dimentions. (All)

Modeling paradigms The most popular data models: Star schema Snowflake schema Fact constellation schema

Star Schema

Star Schema Fact table – a large central table containing the bulk of data, with no redundancy Big Constantly growing Stores measures (often aggregated in queries) Dimension tables – a set of smaller attendant tables, one for each dimension Small Infrequently updated

Star schema Each dimension is represented by only one table Each table contains a set of attributes.

Snowflake schema Variant of star schema with normalized dimension tables Saves space But evolves a lots of joins

Snowflake schema

Fact Constellation Multiple fact tables share dimension tables Collection of stars AKA galaxy schema

Fact Constellation