Download presentation
Presentation is loading. Please wait.
Published byHarold Byrd Modified over 6 years ago
1
Data storage is growing Future Prediction through historical data
Why Data Warehouse Crisis of Credibility Data storage is growing Future Prediction through historical data Intelligent Decision Support System for efficient decision making 1
2
Questions for Data Warehouse
What are our five most attractive resources on the site? Users from what country loaded this resource most of all over the course of the previous year? Connections from what region generated most outgoing traffic on the site for the last three months? 2
3
Level of Aggregation for Dimensions
Geography, which organizes the data related to the geography locations the site users come from Resource, which categorizes the data related to the site resources Time, which is used to aggregate traffic data across time 3
4
Organization of Data to Answer the Questions
Hierarchy of levels (with the highest level listed first): Geography dimensions: Region Country Resource dimensions: Group Resource Time dimensions: Year Month Day 4
5
Data warehouse Provide integrated and total view of the enterprise by considering the historical data for efficient decision making Operational System does not effect the Decision Support System but collaborate and supports decisions A consistent and logical view of information across the organization for effective short-term and long-term policies Presents a flexible and interactive source of strategic information 5
6
Data warehouse Definition
A Data Warehouse is a subject oriented, integrated, nonvolatile/non-updateable and time variant collection of data in support of management’s decisions. 6
7
Data warehouse Subject Oriented: Organized around key subjects in the organization, customer, students, patients. 7
8
Data warehouse Subject Oriented: Usually the organization store data according to their application, like order processing, student enrollment, customer loans Organized around key subjects in the organization, customer, students, patients. 8
9
Data warehouse 9
10
Data comes from various operational systems
Data warehouse Integrated Data: Data comes from various operational systems Each operational system may have various kind of data formats or naming conventions However, to produce effective results the data should be standardized and integrated into a single data warehouse 10
11
Data warehouse 11
12
Data in the data warehouse is stored along the dimension of time
Time Variant Data: Data in the data warehouse is stored along the dimension of time Helpful in: Allows for analysis of past Related information to present Enables forecast of future 12
13
Non-updateable/ Non-volatile:
Data warehouse Non-updateable/ Non-volatile: In the data warehouse data is not updated by end user frequently Rather data is updated as per business requirement after some specific intervals such as fortnightly, monthly. 13
14
Data warehouse 14
15
Data warehouse 15
16
Data warehouse Applications
Fraud Detection Profitability Analysis Credit Risk Prediction Customer Retention Modeling Yield Management 16
17
Data warehouse Architecture
Data warehouse Architecture is proper arrangement of its building blocks or main components. The three major components or building blocks are; Data Acquisition Data Storage Information Delivery Further divided into Source Data Data Staging Metadata Management and Control 17
18
Data warehouse Architecture
18
19
Data warehouse Architecture
19
20
Data warehouse Architecture
20
21
Data warehouse Architecture
Source Data is categorized as four major categories Production Data Data is accessed in Operational System in predictable way, Data warehouse has large data, coming from various units. Main challenge is to handle the data disparity. Internal Data Maintained as individual files and spreadsheets, adds additional complexity and a mechanism has to be developed for acquisition of internal data Archived Data Operational System stores archived data in archived files, these backups are essentially required for data warehousing External Data Data from External sources are required for efficient decision making, eg. A car rental company acquires data from leading car manufacturers for fleet management. 21
22
Data warehouse Architecture
Data Staging Motivation: Data in the data warehouse comes from different sources, and subject-oriented, it cuts the operational procedures as per subject of interest Therefore, Data acquired from different sources needs to prepared, changed, converted and made ready for a single source for queries and analysis 22
23
Data warehouse Architecture
Data Staging Three main Operations are Data Extraction Data Transformation Data Loading In short called (ETL) 23
24
Data warehouse Architecture
Data Extraction Data extraction deals with numerous data sources where data resides in different formats Some of the data is retrieved from legacy systems Other type of data may be from different models like network or hierarchical Data Extraction tools may be purchased from market or developed in-house 24
25
Data warehouse Architecture
Data Transformation Data conversion is an important step in data warehousing Data is acquired from different sources On-going changes in the source data needs to acquired with the passage of time Clean Data Correction of spellings Resolution of conflicts between domain values e.g. different zip codes from different data sources Provision of missing values Elimination of duplication of data from acquired from different sources 25
26
Data warehouse Architecture
Data Standardization Syntax Standardization Data types Data Lengths Semantic Standardization Synonyms: Two terms for same things Homonyms: Single terms two different things 26
27
Data warehouse Architecture
At the final stage of data transformation we achieve a single collection of integrated data that is cleaned, standardized and summarized 27
28
Data warehouse Architecture
Data Loading: Initially the data is loaded in large volume Subsequent increment loads and revisions are made to keep the data warehouse updated. 28
29
Data warehouse or Data mart
Data Mart is a bottom up approach, Data warehouse is a top down approach 29
30
A data warehouse that is limited in scope, whose data are obtained by
Data Mart A data warehouse that is limited in scope, whose data are obtained by selecting and summarizing data from a data warehouse or from separate extract, transform, and load processes from source data systems. 30
31
Data Mart and Data warehouse
A data mart, in this practical approach, is a logical subset of the complete data warehouse, a sort of pie-wedge of the whole data warehouse. A data warehouse, therefore, is a conformed union of all data marts. 31
32
Data Mart and Data warehouse
Individual data marts are targeted to particular business groups in the enterprise, but the collection of all the data marts form an integrated whole, called the enterprise data warehouse. 32
33
Three Dimensional Modeling as Informational Cubes
33
34
Query Steps in an Analysis
34
35
OLAP functions in database without moving data outside of database
35
36
OLAP Systems (Online Analytical Processing Systems)
Definition: On-Line Analytical Processing (OLAP) is a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access in a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user. Roll-up Drill down Slice & Dice 36
37
OLAP Systems (Online Analytical Processing Systems)
37 OLAP is a fancy name for multi-dimensional analysis
38
A simple database design in which dimensional data are
Star Schema A simple database design in which dimensional data are separated from fact or event data. A dimensional model is another name for a star schema. 38
39
Fact Table and Dimension Table
A Star Schema is consists of two types of tables One Fact table and One or more dimension tables A Fact table holds factual, numerical or measured data such as no of order booked, no of unit sold A dimension table hold subjective nature of data, these attributes are used to aggregate or summarize the data in fact table A data mart might contain any number of star schema with similar dimensions but different types of facts 39
40
Fact Table and Dimension Table
Typical Business Dimensions are Products, Customer and Time 40
41
A simple star schema 41
42
A simple star schema 42
43
Three dimensional display of data
43
44
Drill Down/ Rollup Rollup: Rolling up dimension to see higher level of aggregate values Drill Down: Looking at more details of data though dimension cube 44
45
Slice-and-Dice or Rotation
Months are displayed as rows, products as columns and stores as pages Each page consists of sale of one store The data model corresponds to physical cubes with these data elements as its primary edge Slice or two dimensional plane of the cube In Normalization we analyze association between attributes and based on those analysis group the attribute in tables to form tables and relationships 45
46
Slice and Dice 46
47
Slice and Dice Now rotate the cube so that products are along the Z-axis, months are along the X-axis, and stores are along the Y-axis. The slice we are considering also rotates. What happens to the display page that represents the slice? Months are now shown as columns and stores as rows. The display page represents the sales of one product, namely product: hats. You can go to the next rotation so that months are along the Z-axis, stores are along the X-axis, and products are along the Y-axis. The slice we are considering also rotates. What happens to the display page that represents the slice? Stores are now shown as columns and products as rows. The display page represents the sales of one month, namely month: January. What is the great advantage of all of this for the users? Did you notice that with each rotation, the users can look at page displays representing different versions of the slices in the cube. The users can view the data from many angles, understand the numbers better, and arrive at meaningful conclusions. 47
48
???????????????? 48
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.