Presentation is loading. Please wait.

Presentation is loading. Please wait.

CISB594 – Business Intelligence

Similar presentations


Presentation on theme: "CISB594 – Business Intelligence"— Presentation transcript:

1 CISB594 – Business Intelligence
Data Warehouse Part I

2 Reference Materials used in this presentation are extracted mainly from the following texts, unless stated otherwise.

3 Objectives At the end of this lecture, you should be able to:
Understand the basic definitions and concepts of data warehouses Understand how a data warehouse differs from an operational database Describe the characteristics of data warehouse Describe data warehouse process overview Describe the different types of data warehouse architectures CISB594 – Business Intelligence

4 Data Warehouse “The data warehouse is a collection of integrated, subject-oriented databases designed to support DSS functions, where each unit of data is non-volatile and relevant to some moment in time” (Inmon) A copy of transaction data specifically structured for query and analysis (Kimball) A data warehouse is a repository of an organization's electronically stored data, designed to facilitate reporting and analysis . (Wikipedia)

5 Data Warehouse A decision support database that is maintained separately from the organization’s operational database Support information processing by providing a solid platform of consolidated, historical data for analysis In your own words?

6 4 main characteristics of data warehousing
Subject oriented Organized around major subjects, such as sales progress Containing only information relevant for decision support Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing Provide a simple and concise view around particular subject issues

7 4 main characteristics of data warehousing
Subject oriented For example, to learn more about your company's sales, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like "Who was our best customer for this item last year?" This ability to define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented (

8 4 main characteristics of data warehousing
Integrated Constructed by integrating multiple, various data sources Must place data from different sources into a consistent format, to do so they must deal with naming conflict and discrepancies Data cleaning and data integration techniques are applied Ensure consistency in naming conventions among different data sources When data is moved to the warehouse, it is converted

9 4 main characteristics of data warehousing
3. Time variant (time series) maintains historical data, data for analysis from multiple sources contain multiple time points A data warehouse's focus on change over time The time horizon for the data warehouse is significantly longer than that of operational systems Operational database: current value data Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years)

10 4 main characteristics of data warehousing
4. Non-volatile after data are entered into a data warehouse, users cannot change or update the data. Operational update of data does not occur in the data warehouse environment Does not require transaction processing, recovery, and concurrency control mechanisms Requires only two operations in data accessing: Initial loading of data and access of data

11 Summary of Data Warehouse
Runs on a DBMS such as Oracle, SQL, DB2 … Keeps a large amount of data from different time for a long period of time Data in data warehouse cannot be overwritten by users Data comes from various sources, internally and externally Carefully designed to allow for analysis/ pattern discovery on identified subject matter

12 OLTP OLTP (on-line transaction processing)
Major task of traditional relational DBMS Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc. Database type : Operational

13 OLAP Online Analytical Processing (OLAP) is a reporting application that provides high-performance analysis and easy reporting on large volumes of data The goal of OLAP: multidimensional data analysis, provide fast and flexible data summarization, analysis, and reporting capabilities ability to view trends over time Type of database : Data warehouse

14 OLTP vs OLAP OLTP OLAP Users Clerk, IT professional Knowledge worker
Function Day to day operations Decision support DB Design To suit typical database function of update, edit, delete, relational Designed for reporting on Subjects, datawarehouse Data Current, up-to-date detailed, Historical, summarized, multidimensional, integrated, consolidated Usage Repetitive, structured Ad-hoc, un-structured Access Read/write Read. Lots of scans Type of Work Short, simple transaction Complex query # Records Accessed Tens Millions # Users Thousands Hundreds, Tens DB Size 100MB-GB 100GB-TB

15 How the database looks like for the two types
The operational database (relational):

16 How the database looks like for the two types
The datawarehouse (star schema):

17 Why … Can we not operate on operational database to obtain the answers to our business questions? Answer : require complex query formulation, preparation of data to address the query and if use the operational database, the process will be very slow due to complex joins and multiple scans A typical data warehouse query scans thousands or millions of rows. For example, "Find the total sales for all customers last month." A typical OLTP operation accesses only a handful of records. For example, "Retrieve the current order for this customer."

18 Ask yourself Explain data warehouse. How does it differ from operational database? Provide an example to support your answer Explain the 4 main characteristics of data warehouse Compare and contrast OLAP to OLTP

19 Data Warehousing - Concept
Data mart Smaller and focuses on a particular subject or department. It is a subset of data warehouse/departmental data warehouse A data mart is a smaller DW designed around one problem, organizational function, topic, or other focus area. Can be Dependent data mart A subset that is created directly from a data warehouse Ensures that the end user is viewing the same version of the data that are accessed by all other data warehouse users Or Independent data mart A small data warehouse designed for a strategic business unit or a department

20 Data Warehousing - Concept
Enterprise data warehouse (EDW) A large scale data warehouse used across the enterprise for decision support Used to provide data for many types of DSS, including CRM, supply chain management, BPM, KMS etc Metadata Data about data. In a data warehouse, metadata describe the contents of a data warehouse and the manner of its use. Metadata in layman term : Metadata describes other data. It provides information about a certain item's content. For example, an image may include metadata that describes how large the picture is, the color depth, the image resolution, when the image was created, and other data

21 Data Warehousing Process Overview
The data warehousing process consists of the following steps: 1. Data are imported from various internal and external sources 2. Data are cleansed and organized consistently with the organization’s needs 3a. Data are loaded into the enterprise data warehouse 4a.If desired, data marts are created as subsets of the EDW —or— 3b.Data are loaded into data marts 4b.The data marts are consolidated into the EDW Analyses are performed as needed

22 Data Warehousing - Process Overview
The major components of a data warehousing process Data sources. Data are sourced from operational systems and possibly from external data sources. Data extraction. Data are extracted using custom-written or commercial software called ETL. Data loading. Data are loaded into a staging area, where they are transformed and cleansed. The data are then ready to load into the data warehouse. Data warehouse/Comprehensive database. This is the EDW that supports decision analysis by providing relevant summarized and detailed information. Middleware tools. Middleware tools enable access to the data warehouse from a variety of front-end applications.

23 Data Warehousing - Process Overview

24 Data Warehousing Architectures
There are several basic architectures for data warehousing To distinguished the architectures data warehouse is divided into three parts: The data warehouse itself Data acquisition (back-end) software, which extracts data from legacy systems and external sources, consolidates and loads into the data warehouse Client (front-end) software, which allows users access and analyze data from the warehouse

25 Data Warehousing Architectures
CISB594 – Business Intelligence

26 Data Warehousing Architectures
Factors that potentially affect the architecture selection decision: 4. Constraints on resources, funding 5. Strategic view of the data warehouse prior to implementation 6. Compatibility with existing systems 7. Perceived ability of the in-house IT staff 8. Technical issues, technology 9. Social/political factors/nature of users Information interdependence between organizational units Upper management’s information needs Urgency of need for a data warehouse CISB594 – Business Intelligence

27 Now ask if .. You are able to:
Understand the basic definitions and concepts of data warehouses Understand how a data warehouse differs from a database Describe the characteristics of data warehouse Describe data warehouse process overview CISB594 – Business Intelligence


Download ppt "CISB594 – Business Intelligence"

Similar presentations


Ads by Google