Download presentation
Published byBrent Potter Modified over 9 years ago
1
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Professor’s Notes In this lecture you will learn about data warehouses. More and more businesses are creating decision support systems, where the data warehouse is an essential component. The data warehouse has a different architecture than does a transactional relational database. So we will seek to explore the differences between a data warehouse and a relational database, and how it is used to advanced information into business knowledge. Samuel Conn, Asst. Professor
2
In this lecture, you will learn:
How operational data and decision support differ What a data warehouse is and how its data are prepared What star schemas are and how they are constructed What steps are required to implement a data warehouse successfully What data mining is and what role it plays in decision support 2 In this lecture, you will learn: How operational data and decision support differ What a data warehouse is and how its data are prepared What star schemas are and how they are constructed What steps are required to implement a data warehouse successfully What data mining is and what role it plays in decision support The key points of this lecture are how operational data from a transactional system is different from decision support data. We also want to look at the basic construct of the data warehouse, the steps to implementing a data warehouse, and how data mining is used in decision support. 2
3
The Need for Data Analysis
External and internal forces require tactical and strategic decisions Search for competitive advantage Business environments are dynamic Decision-making cycle time is reduced Different managers require different decision support systems (DSS) 3 External and internal forces require tactical and strategic decisions Search for competitive advantage Business environments are dynamic Decision-making cycle time is reduced Different managers require different decision support systems (DSS) The Need for Data Analysis Businesses need to make both tactical (short term) and strategic (long term) decisions. They are always searching for a competitive advantage in the marketplace. Since the business environment is dynamic and always changing, the amount of time that a business has to make decisions is shortened. Decision support systems can help by providing data “analytics” that are positioned against various time intervals. The different managers within the organization all require different types of business knowledge, or intelligence. 3
4
Decision Support Systems
Is a methodology Extracts information from data Uses information as basis for decision making 4 Decision Support Is a methodology Extracts information from data Uses information as basis for decision making Decision Support Systems Decision Support Systems, or DSS, is based on a methodology. The methodology is a combination of processes and technologies that extract information from data, and knowledge from information. The idea is to use the decision support data as the basis for making decision. 4
5
Decision Support Systems
Decision support system (DSS) Arrangement of computerized tools Used to assist managerial decision Extensive data “massaging” to produce information Used at all levels in organization Tailored to focus on specific areas and needs Interactive Provides ad hoc query tools 5 Decision Support Systems Decision support system (DSS) Arrangement of computerized tools Used to assist managerial decision Extensive data “massaging” to produce information Used at all levels in organization Tailored to focus on specific areas and needs Interactive Provides ad hoc query tools A DSS is a complex environment. It involves many computers and analytic software. Data is said to be “massaged” in order to advance the IT value proposition from data to information, and then information to knowledge. 5
6
DSS Components Figure 13.1 6 6 DSS Components Figure 13.1
Here are the components of the DSS….you can see that it starts with operational data. This is the transactional data found in your organization’s relational database. The data is extracted and reformatted into the business data, or the data store. From the data store business model data and rules can be applied and the end-user can generate graphic representations of the analyzed data. We use “data visualization” tools to accomplish this. Figure 13.1 6
7
Operational vs. Decision Support Data
Operational data Relational, normalized database Optimized to support transactions Real time updates DSS Snapshot of operational data Summarized Large amounts of data Data analyst viewpoint Timespan Granularity Dimensionality 7 Operational data Relational, normalized database Optimized to support transactions Real time updates DSS Snapshot of operational data Summarized Large amounts of data Data analyst viewpoint Timespan Granularity Dimensionality Operational vs. Decision Support Data There is a big difference between what is considered “operational” data and “DSS” data. Operation data is that which is normalized and stored in a relational database. It is optimized for transactions from the customer or user. DSS data is a “snapshot” of the operational data. It is summarized and generally involves large amounts of data. DSS data have characteristics that operational data do not have. Principally, the data is viewed from the standpoint of Timespan, Granularity, and Dimensionality. 7
8
The DSS Database Requirements
Database schema Support complex (non-normalized) data Extract multidimensional time slices Data extraction and filtering End-user analytical interface Database size Very large databases (VLDBs) Contains redundant and duplicated data 8 Database schema Support complex (non-normalized) data Extract multidimensional time slices Data extraction and filtering End-user analytical interface Database size Very large databases (VLDBs) Contains redundant and duplicated data The DSS Database Requirements The database for the DSS has a different architecture, or schema, than the relational architecture found in the database the hosts the operational data. It has to support complex, non-normalized data and it has to be able to extract multidimensional slices of data based on time. So data extraction and filtering becomes important, along with the end-user interface that is able to run “analytics” on the data, or “data mine” it. Also, the size of the database is generally very large because you are dealing with the accumulation of the operational data. 8
9
Data Warehouse Integrated Subject-Oriented Time Variant Non-Volatile
Centralized Holds data retrieved from entire organization Subject-Oriented Optimized to give answers to diverse questions Used by all functional areas Time Variant Flow of data through time Projected data Non-Volatile Data never removed Always growing 9 Integrated Centralized Holds data retrieved from entire organization Subject-Oriented Optimized to give answers to diverse questions Used by all functional areas Time Variant Flow of data through time Projected data Non-Volatile Data never removed Always growing Data Warehouse Here are some characteristics of the Data Warehouse. One is that it is centralized and holds the data for the entire organization. Next, it is subject oriented and optimized to answer very specific questions. It also has a time variant that shows the flow of the data through time. The data is also considered to be non-volatile, or never deleted or removed, and always growing from the continual feed of operational data. 9
10
Creating a Data Warehouse
10 Creating a Data Warehouse Figure 13.3 This illustration shows the process of constructing a data warehouse. From the operational data sources, you use Extraction Transformation and Loading (ETL) tools to extract, clean, and load the data into the data warehouse schema. The schema that is used in most data warehouses is not relational, but generally what is referred to as a “star” schema. Figure 13.3 10
11
Data Marts Single-subject data warehouse subset
Decision support to small group Can be test for exploring potential benefits of Data warehouses Address local or departmental problems 11 Single-subject data warehouse subset Decision support to small group Can be test for exploring potential benefits of Data warehouses Address local or departmental problems Data Marts Data Marts are single subject “subsets” of the data warehouse that serve some single entity, usually a business entity like a department. So the marketing department, the sales department, the finance department, the human resources department, and so on, would have their own data marts. 11
12
DSS Architectural Styles
Traditional mainframe-based OLTP Managerial information system (MIS) with 3GL First-generation departmental DSS First-generation enterprise data warehouse using RDMS Second-generation data warehouse using MDBMS 12 DSS Architectural Styles Traditional mainframe-based OLTP Managerial information system (MIS) with 3GL First-generation departmental DSS First-generation enterprise data warehouse using RDMS Second-generation data warehouse using MDBMS There are various styles and configurations of architecture for the DSS. Beginning with the legacy mainframe systems, to the 3rd Generation Language (3GL) management information systems, through the first generation departmental DSS, the first generation enterprise data ware house using a relational design, to the second generation data ware house using multidimensional data base management systems……data warehouse concepts are continuing to grow. 12
13
Online Analytical Processing (OLAP)
Advanced data analysis environment Supports decision making, business modeling, and operations research activities Characteristics of OLAP Use multidimensional data analysis techniques Provide advanced database support Provide easy-to-use end-user interfaces Support client/server architecture 13 Advanced data analysis environment Supports decision making, business modeling, and operations research activities Characteristics of OLAP Use multidimensional data analysis techniques Provide advanced database support Provide easy-to-use end-user interfaces Support client/server architecture Online Analytical Processing (OLAP) The online analytical processing (OLAP) environment is different than the online transaction processing (OTLP) environment. The OLAP environment supports decision making, business modeling, and operations research and has the ability to use multidimensional data analysis techniques. 13
14
OLAP Client/Server Architecture
14 OLAP Client/Server Architecture Figure 13.6 Here is the basic OLAP environment implemented on client/server platform architecture. The OLAP system can take feeds from both operational data and data warehouse data. This allows the OLAP system to present to the user in a GUI format, both analytical processing and data processing logic from the operational data and the data warehouse data. Figure 13.6 14
15
OLAP Server Arrangement
15 OLAP Server Arrangement Figure 13.7 This illustration depicts the OLAP server configuration. The thing to note here is that the OLAP server, or “engine” provides the front-end to the data warehouse. Figure 13.7 15
16
OLAP Server with Multidimensional Data Store Arrangement
16 OLAP Server with Multidimensional Data Store Arrangement Figure 13.8 This illustration shows and advanced architecture for OLAP systems where the OLAP “engine” can construct reporting from multidimensional data in the data warehouse. On the right, you will see the multiple GUI interfaces that can be built to access the OLAP engine. Figure 13.8 16
17
OLAP Server with Local Mini-Data-Marts
Figure 13.9 And a final complexity added to the architecture is the addition of local data marts built to interface with the OLAP GUIs. The local data marts cache the data “cube” that is built when the data is represented in multidimensional form. So the data cubes may be built to support the business intelligence needs of various departments within the organization. Figure 13.9 17
18
Relational OLAP (ROLAP)
OLAP functionality Uses relational DB query tools Extensions to RDBMS Multidimensional data schema support Data access language and query performance optimized for multidimensional data Support for very large databases (VLDBs) 18 OLAP functionality Uses relational DB query tools Extensions to RDBMS Multidimensional data schema support Data access language and query performance optimized for multidimensional data Support for very large databases (VLDBs) Relational OLAP (ROLAP) When OLAP functionality uses relational query tools (SQL based tools) that have “extensions” built in to support the data warehouse’s multidimensional data schema, then we call it Relational Online Analytical Processing, or ROLAP, for short. 18
19
Typical ROLAP Client/Server Architecture
19 Typical ROLAP Client/Server Architecture Figure 13.10 A typical ROLAP environment implemented on client/server architecture would look like this. Figure 13.10 19
20
Multidimensional OLAP (MOLAP)
OLAP functionality to multidimensional databases (MDBMS) Stored data in multidimensional data cube N-dimensional cubes called hypercubes Cube cache memory speeds processing Affected by how the database system handles density of data cube called sparsity 20 OLAP functionality to multidimensional databases (MDBMS) Stored data in multidimensional data cube N-dimensional cubes called hypercubes Cube cache memory speeds processing Affected by how the database system handles density of data cube called sparsity Multidimensional OLAP (MOLAP) Then we have Multidimensional Online Analytical Processing, or MOLAP. This is when OLAP functionality extends to the multidimensional data cube. In this architecture the caching of the data cube is done on the MOLAP server, on the MOLAP client, or both. MOLAP databases are typically known to be faster than ROLAP databases. 20
21
MOLAP Client/Server Architecture
21 MOLAP Client/Server Architecture Figure 13.11 Compare and contrast this client/server platform architecture implementation of a MOLAP system with that of the ROLAP system. Figure 13.11 21
22
Star Schema Data-modeling technique
Maps multidimensional decision support into relational database Yield model for multidimensional data analysis while preserving relational structure of operational DB Four Components: Facts Dimensions Attributes Attribute hierarchies 22 Data-modeling technique Maps multidimensional decision support into relational database Yield model for multidimensional data analysis while preserving relational structure of operational DB Four Components: Facts Dimensions Attributes Attribute hierarchies Star Schema We said that the principal schema design used to host the data warehouse data is the “star” schema. The star schema is a different “topology” than the relational (normalized) schema. A star schema gives up what is effectively 100% indexing. It has four components: fact tables, dimension tables, attributes, and attribute hierarchies. 22
23
Simple Star Schema Figure 13.12 23 23 Simple Star Schema Figure 13.12
Here is a simple example of a star schema. You see the fact table in the middle, with dimension tables (location, product, and time) linked to it. Figure 13.12 23
24
Slice and Dice View of Sales
24 Slice and Dice View of Sales Figure 13.14 The nice thing about the construct is that you can see different “views” of the data….you can “slice and dice” the data into a variety of views. Figure 13.14 24
25
Star Schema Representation
Facts and dimensions represented by physical tables in data warehouse DB Fact table related to each dimension table (M:1) Fact and dimension tables related by foreign keys Subject to the primary/foreign key constraints 25 Facts and dimensions represented by physical tables in data warehouse DB Fact table related to each dimension table (M:1) Fact and dimension tables related by foreign keys Subject to the primary/foreign key constraints Star Schema Representation Here are the basics of a star schema’s representation of the data. You have tables that store facts and dimensions, the fact table is related to the dimension table in a M:1 relationship, they are related by foreign keys that are subject to the primary/foreign key constraints. 25
26
Star Schema for Sales Figure 13.17 26 26 Star Schema for Sales
This is an example of a star schema at the table level. You see the SALES “fact table” in the middle, with relations to the location, customer, product, and time dimension tables. Figure 13.17 26
27
Performance-Improving Techniques for Star Schema
Normalization of dimensional tables Multiple fact tables representing different aggregation levels Denormalization of the fact tables Table partitioning and replication 27 Normalization of dimensional tables Multiple fact tables representing different aggregation levels Denormalization of the fact tables Table partitioning and replication Performance-Improving Techniques for Star Schema There are some ways to improve the performance of the star schema. One way is to normalize the dimension tables. You can also create multiple fact tables that represent different aggregation levels. Another technique is to denormalize the fact tables. Or one “physical” method is to partition the fact table. 27
28
Data Warehouse Implementation Road Map
28 Data Warehouse Implementation Road Map Figure 13.21 This is a road map to the implementation of a data warehouse. In effect, this is the life cycle model of development that you would follow if you were creating a data warehouse for a DSS. Figure 13.21 28
29
Data Mining Seeks to discover unknown data characteristics
Automatically searches data for anomalies and relationships Data mining tools Analyze data Uncover problems or opportunities Form computer models based on findings Predict business behavior with models Require minimal end-user intervention 29 Seeks to discover unknown data characteristics Automatically searches data for anomalies and relationships Data mining tools Analyze data Uncover problems or opportunities Form computer models based on findings Predict business behavior with models Require minimal end-user intervention Data Mining Data mining is what we do to the data once we have it in a multidimensional format. It seeks to discover patterns, trends, and things unknown about the data. 29
30
Extraction of Knowledge from Data
30 Extraction of Knowledge from Data Figure 13.22 Study this illustration of the IT value proposition. You see that at the bottom of the pyramid you have the Data, next transform it to Information, and the final transformation is to Knowledge. This illustration identifies the various technologies and processes associated with the transformation process. 30 Figure 13.22
31
Data Mining Process Figure 13.23 31 31 Data Mining Process
This illustration shows the data mining process. There are essentially four phases that begin with the Data preparation process and end with the Prognosis phase. Figure 13.23 31
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.