Chapter 13 The Data Warehouse.

Name: Chapter 13 The Data Warehouse.
Uploaded: 2017-10-17T12:03:40+00:00
Duration: PTM15S4
Channel: Agatha Cameron
Description: Chapter 13 The Data Warehouse.

Chapter 13 The Data Warehouse

The Need for Data Analysis
Managers must be able to track daily transactions to evaluate how the business is performing By tapping into the operational database, management can develop strategies to meet organizational goals Data analysis can provide information about short-term tactical evaluations and strategies

Solving Business Problems and Adding Value with Data Warehouse-Based Solutions

Solving Business Problems and Adding Value with Data Warehouse-Based Solutions (continued)

Decision Support Systems
Methodology (or series of methodologies) designed to extract information from data and to use such information as a basis for decision making Decision support system (DSS): Arrangement of computerized tools used to assist managerial decision making within a business Usually requires extensive data “massaging” to produce information Used at all levels within an organization Often tailored to focus on specific business areas Provides ad hoc query tools to retrieve data and to display data in different formats

Decision Support Systems (continued)
Composed of four main components: Data store component Basically a DSS database Data extraction and filtering component Used to extract and validate data taken from operational database and external data sources End-user query tool Used to create queries that access database End-user presentation tool Used to organize and present data

Main Components of a Decision Support System (DSS)

Transforming Operational Data Into Decision Support Data

Contrasting Operational and DSS Data Characteristics

The Data Warehouse Integrated, subject-oriented, time-variant, nonvolatile database that provides support for decision making

A Comparison of Data Warehouse and Operational Database Characteristics

Creating a Data Warehouse

Scrub or data cleansing Transform Load and Index
The ETL Process Capture/Extract Scrub or data cleansing Transform Load and Index ETL = Extract, transform, and load

Figure 11-10: Steps in data reconciliation
Capture/Extract…obtaining a snapshot of a chosen subset of the source data for loading into the data warehouse Figure 11-10: Steps in data reconciliation Incremental extract = capturing changes that have occurred since the last static extract Static extract = capturing a snapshot of the source data at a point in time

Scrub/Cleanse…uses pattern recognition and AI techniques to upgrade data quality Figure 11-10: Steps in data reconciliation (cont.) Fixing errors: misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistencies Also: decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating missing data

Cleanse: Process to identify erroneous data, not to fix them Fixes are made at the source Scrubbing: A technique using pattern recognition and other AI techniques to upgrade the quality of data

Transform = convert data from format of operational system to format of data warehouse Figure 11-10: Steps in data reconciliation (cont.) Record-level: Selection–data partitioning Joining–data combining Aggregation–data summarization Field-level: single-field–from one field to one field multi-field–from many fields to one, or one field to many

Load/Index= place transformed data into the warehouse and create indexes Figure 11-10: Steps in data reconciliation (cont.) Refresh mode: bulk rewriting of target data at periodic intervals Update mode: only changes in source data are written to data warehouse

Figure 11-11: Single-field transformation
In general–some transformation function translates data from old form to new form Algorithmic transformation uses a formula or logical expression Table lookup–another approach, uses a separate table keyed by source record code

Figure 11-12: Multifield transformation
M:1–from many source fields to one target field 1:M–from one source field to many target fields

Derived Data Objectives Characteristics
Ease of use for decision support applications Fast response to predefined user queries Customized data for particular target audiences Ad-hoc query support Data mining capabilities Characteristics Detailed (mostly periodic) data Aggregate (for summary) Distributed (to departmental servers) Most common data model = star schema (also called “dimensional model”)

Star Schemas Data modeling technique used to map multidimensional decision support data into a relational database Creates the near equivalent of a multidimensional database schema from the existing relational database Yield an easily implemented model for multidimensional data analysis, while still preserving the relational structures on which the operational database is built Has four components: facts, dimensions, attributes, and attribute hierarchies

Figure 11-13 Components of a star schema
Fact tables contain factual or quantitative data 1:N relationship between dimension tables and fact tables Dimension tables are denormalized to maximize performance Dimension tables contain descriptions about the subjects of the business Excellent for ad-hoc queries, but bad for online transaction processing

Figure 11-14 Star schema example
Fact table provides statistics for sales broken down by product, period and store dimensions

Figure 11-15 Star schema with sample data

Size of fact table assume: Total number of stores=1000
Total number of products=10,000 Total number of periods= 24 (two years) Assume 50% of products record sales Then total rows in fact table 1000stores* 5000 active products)*24 months =120,000,000 rows

Size of “fact” table Assume there are 6 fields each 4 bytes long, then total size 120,00,000* 6 fields* 4bytes/field =2,880,000,000 (2.88 gigabytes) If instead of monthly, you record daily data Multiply above by 30 (30 days/per month)

Online Analytical Processing
Advanced data analysis environment that supports decision making, business modeling, and operations research OLAP systems share four main characteristics: Use multidimensional data analysis techniques Provide advanced database support Provide easy-to-use end-user interfaces Support client/server architecture

Operational vs. Multidimensional View of Sales

Another example: Simple Star Schema

Possible Attributes for Sales Dimensions

Three-Dimensional View of Sales

Slice and Dice View of Sales

Location Attribute Hierarchy

Star Schema for Sales

Orders Star Schema

Normalized Dimension tables

Multiple Fact Tables

On-Line Analytical Processing (OLAP) Tools
The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques Relational OLAP (ROLAP) Traditional relational representation Multidimensional OLAP (MOLAP) Cube structure OLAP Operations Cube slicing–come up with 2-D view of data Drill-down–going from summary to more detailed views

Figure 11-23 Slicing a data cube

Figure 11-24 Example of drill-down Summary report
Starting with summary data, users can obtain details for particular cells Drill-down with color added

Implementing a Data Warehouse
Numerous constraints: Available funding Management’s view of the role played by an IS department and of the extent and depth of the information requirements Corporate culture No single formula can describe perfect data warehouse development

Factors Common to Data Warehousing
Data warehouse is not a static database Dynamic framework for decision support that is always a work in progress Data warehouse data cross departmental lines and geographical boundaries Must satisfy: Data integration and loading criteria Data analysis capabilities with acceptable query performance End-user data analysis needs Apply database design procedures

Data Mining Tools that: Require minimal end-user intervention
analyze data uncover problems or opportunities hidden in data relationships, form computer models based on their findings, and then use the models to predict business behavior Require minimal end-user intervention

Extraction of Knowledge From Data

Data-Mining Phases

A Sample of Current Data Warehousing and Data-Mining Vendors

Data Mining and Visualization
Knowledge discovery using a blend of statistical, AI, and computer graphics techniques Goals: Explain observed events or conditions Confirm hypotheses Explore data for new or unexpected relationships Techniques Statistical regression Decision tree induction Clustering and signal processing Affinity Sequence association Case-based reasoning Rule discovery Neural nets Fractals Data visualization–representing data in graphical/multimedia formats for analysis

Summary Data analysis is used to derive and interpret information from data Decision support is a methodology designed to extract information from data and to use such information as a basis for decision making Decision support system is an arrangement of computerized tools used to assist managerial decision making within a business Data warehouse is an integrated, subject-oriented, time-variant, nonvolatile database that provides support for decision making

Summary (continued) Online analytical processing is an advanced data analysis environment that supports decision making, business modeling, and operations research Star schema is a data-modeling technique used to map multidimensional decision support data into a relational database The implementation of any company-wide information system is subject to conflicting organizational and behavioral factors

Summary (continued) Data mining automates analysis of operational data with the intention of finding previously unknown data characteristics, relationships, dependencies, and/or trends Data warehouse is storage location for decision support data

Chapter 13 The Data Warehouse.

Similar presentations

Presentation on theme: "Chapter 13 The Data Warehouse."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 13 The Data Warehouse.

Similar presentations

Presentation on theme: "Chapter 13 The Data Warehouse."— Presentation transcript:

Similar presentations

About project

Feedback