McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 8 Accessing Organizational Information – Data Warehouse
8-2 Learning Outcomes 8.1 Describe the roles and purposes of data warehouses and data marts in an organization 8.2 Compare the multidimensional nature of data warehouses (and data marts) with the two-dimensional nature of databases
8-3 Learning Outcomes 8.3 Identify the importance of ensuring the cleanliness of information throughout an organization 8.4 Explain the relationship between business intelligence and a data warehouse
8-4 Intro to Data Warehousing This video must be watched Data warehouse demystified
8-5 History of Data Warehousing Data warehouses extend the transformation of data into information In the 1990’s executives became less concerned with the day-to-day business operations and more concerned with overall business functions The data warehouse provided the ability to support decision making without disrupting the day-to-day operations
8-6 Data Warehouse Fundamentals Data warehouse – a logical collection of information – gathered from many different operational databases – that supports business analysis activities and decision-making tasks The primary purpose of a data warehouse is to aggregate information throughout an organization into a single repository for decision-making purposes
8-7 Data Warehouse – Definitions A data warehouse is a type of computer based information system developed to provide an organization with business intelligence to support decision making and to monitor the operations in a company. Integrates data from many different sources and makes it available to end users in a what they can understand and use in a business context in a timely manner.
8-8 Primary difference between a database and data warehouse database stores information for a single application, whereas a data warehouse stores information from multiple databases, or multiple applications, and external information such as industry information
8-9 Data Warehouse Fundamentals Extraction, transformation, and loading (ETL) – a process that extracts information from internal and external databases, transforms the information using a common set of enterprise definitions, and loads the information into a data warehouse Data mart – contains a subset of data warehouse information. The ETL process also gathers data from the data warehouse and passes it to the data marts
8-10 Data Warehouse: Extended Definition Data Warehouse External Data Operation Data ……………… Reporting AnalysisSpreadsheets Charts “Sales”
8-11 Data Warehouse Fundamentals
8-12 Components of a Data Warehouse Metadata means data about data
8-13 Multidimensional Analysis and Data Mining Databases contain information in a series of two-dimensional tables In a data warehouse and data mart, information is multidimensional, it contains layers of columns and rows
8-14 Dimensions Dimensions could include such things as: –Products –Promotions –Stores –Category –Region –Stock price –Date –Time –Weather
8-15 slicing-and-dicing –The ability to look at information from different dimensions can add tremendous business insight –By slicing-and-dicing the information a business can uncover great unexpected insights
8-16 Multidimensional Analysis and Data Mining Cube – common term for the representation of multidimensional information
8-17 Data Warehouses Are Multidimensional A Multidimensional Data Warehouse with Information from Multiple Operational Databases
8-18 Online analytical processing (OLAP) OLAP is is an approach to quickly answer multi-dimensional analytical queries The typical applications of OLAP are in business reporting for sales, marketing, management reporting
8-19 OLAP The output of an OLAP query is typically displayed in a matrix (or pivot) format. Excel pivot table is a simple analogy to OLAP.
8-20 Data purchased from outside source for Data Warehousing
8-21 Data Marts – Smaller Data Warehouses Data mart - a subset of a data warehouse in which only a focused portion of the data warehouse information is kept.
8-22 Data Mart Examples
8-23 Multidimensional Analysis and Data Mining Data mining – the process of analyzing data to extract information not offered by the raw data alone To perform data mining users need data-mining tools –Data-mining tool – uses a variety of techniques to find patterns and relationships in large volumes of information and infers rules that predict future behavior and guide decision making
8-24 Data-Mining Systems Process data using statistical techniques like regression analysis and decision tree analysis Look for patterns and relationships to anticipate events or predict outcomes –Example: Market-basket analysis – computes correlation of items on past orders to determine items that are frequently purchased together.
8-25 Decision Tree Figure CE14-3
8-26 Example of Data Mining Several years ago, a large retailer implemented a data warehouse to analyze sales. Loaded huge volumes (Terabytes) of Point of Sale data into the warehouse Built an application, based on specialized ‘data mining’ software to perform ‘market basket analysis’ –What items are purchased with other items in the same in the same transactions
8-27 “Diapers and Beer” Noticed some unusual correlations, one was many transactions where beer in same market basket as diapers Analysis identified a ‘micro-segment’ of customer base – young fathers, buying diapers, deciding to get beer at same time Based on information, retailer reorganized diaper aisle – placed beer at end on aisle Beer sales increased.
8-28 Problems with Operational Data for BI Raw data usually unsuitable for sophisticated reporting or data mining Non-Standard Dirty data ( gender, age, phone #, misspelling, address) Values may be missing ( gender….) Inconsistent data ( time zone) Data can be too fine ( clickstream) or too coarse (totals)
8-29 Information Cleansing or Scrubbing An organization must maintain high-quality data in the data warehouse Information cleansing or scrubbing – a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information
8-30 Information Cleansing or Scrubbing Contact information in an operational system
8-31 Information Cleansing or Scrubbing Standardizing Customer name from Operational Systems
8-32 Cleaning of raw data Cleaning is a process used to remove unwanted ‘noise’ from data and to make data more presentable Consider the following shipping address: –Address 1: 123 Main * deliver to rear entrance * –City: Antigonish –Postal Code: B2G 2A1 –Province: NS –Cleaning process would remove the *comment * in the address field
8-33 Standardization of data Consider the following addresses: These should be standardized as: –123 Main, Antigonish, Nova Scotia, B2G 2A1 StreetCityPostal Code 123 MainAntigonish 123 MainB2G 2A1
8-34 Information Cleansing or Scrubbing Information cleansing activities
8-35 Information Cleansing or Scrubbing Accurate and complete information
8-36 Business Intelligence (BI) Systems Provide information for improving decision making hence competitive advantage Primary systems: –Reporting systems –Data-mining systems
8-37 What is a report? It’s a program that retrieves data from a database, formats and presents it to a user Can allow filtering, sorting, aggregation A report can be: –displayed online –exported to excel or other software – ed as an attachment –printed Data Report program Lists, charts, spreadsheets
8-38 Reporting Systems Integrate data from multiple sources Process data by sorting, grouping, summing, averaging, and comparing Results formatted into reports Improve decision making by providing right information to right user at right time
8-39 Tools for looking at data Digital Dashboard Instead of paper reports, warehouses give users tools for looking at data differently. They also allow those users to manipulate their data. There are times when a color coded ‘speedometer’ speaks volumes over a simple paper report. An interactive table that allows the user to drill down into detail data with the click of a mouse can answer questions that might take months to answer in a traditional system.
8-40 Freedom from IS department resource limitations One of the problems with computer systems is that they usually require computer experts to use them. When a report is needed, the requesting manager calls the IS department. IS then assigns a programmer to write a program to produce the report. The report can be created in a few days or, in extreme cases, in over a year. With a warehouse, users create most of their reports themselves. Thus, if a manager needs a report for a meeting in half an hour, they, or their assistant, can create that report in a matter of minutes.