Download presentation
Presentation is loading. Please wait.
1
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Warehouse Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of Business
2
2 Outline Why Data Warehouse? –Problems, causes and data warehouse solutions What is Data Warehouse? –Characteristics and components Current Practices of Data Warehouse
3
3 Why Data Warehouse? Knowledge Management Problems (Drowning in data, starving for knowledge) 1.Can’t access data (easily) E.g., data from different branches, years, functional areas, etc. 2.Give me only what’s important (knowledge) E.g., Regions and products that have upward sales trends over the last five years. 3.I need to reduce data to what’s important by slicing and dicing. E.g., by branch, product, year, etc.
4
4 Why Data Warehouse? 4.Data inconsistency and poor data quality E.g., the 2001 PC sales amount in SLC from the CFO and the SLC Account Manager are not the same. 5.Need to improve the practices of making informed decisions. E.g., Did the VP for Marketing decide on the advertising budgets for branches in the SW region based on their sales performances over the last five years? 6.Hard and slow to query the database? E.g., VP for Marketing, CFO and Account Manager had to wait for the MIS Department to generate sales performance reports and analyses.
5
5 Why Data Warehouse? ROI Problems 7.Can I get more value out of my data? Ans: Make informed, potent decisions using knowledge extracted from integrated and consistent data over a long period of time. 8.Can I do this cost-effectively? Options: federated (interoperable) databases vs. a data warehouse 9.Can I easily scale up or change how I get knowledge out of my data? E.g., Add more regions, functional areas or years in sales performance analyses.
6
6 Causes for the Problems Cause 1: Isolated databases distributed in an enterprise Sales CRM Inventory A Root cause for problems 1, 4, 5, 6, 7, 8 and 9
7
7 Why Data Warehouse Cause 1: Isolated databases distributed in an enterprise SalesCRM Inventory Ad hoc access solutions cannot alleviate the problems
8
8 Why Data Warehouse Cause 2: Historical data is archived in offline storage systems Sales Another Root cause for problems 1, 4, 5, 6, 7, 8 and 9 Archive Historical Sales Data
9
9 Why Data Warehouse Cause 2: Historical data is archived in offline storage systems Sales Ad hoc accesses are slow and inconvenient Archive Historical Sales Data
10
10 Cause 3: Metadata for Transaction DB systems is Not User Friendly Student Course InstructorDependent Under- graduate Graduate IS-A Take Has SSN Address Name Phone Major Minor SSN Rank Name C-Name C-No Name Relation Sex Grade MM M 1 1
11
11
12
12 Why Data Warehouse Cause 4: Query and programming languages are even less user friendly –DESB students’ academic grades and GPAs since the freshman year –Sales amount distribution by product category, customer state and year –Slicing and dicing –SQL statements??? –Report/screen interface codes???
13
13 Why Data Warehouse Cause 5: Transaction databases are optimized (normalized) to process transactions but not to answer decision support queries –Bad query performance to join the normalized tables –Heavy transaction processing workload
14
14 What is Data Warehouse Designed to solve problems associated with current database practices: Isolated, distributed databases SalesCRM Inventory Extract, replicate, integrate, cleanse & load Data Warehouse
15
15 Why Data Warehouse Historical data is archived in offline storage systems Sales Archive Historical Sales Data Data Warehouse Integrate Historical Data with Current Data
16
16 What is Data Warehouse Causes 3, 4 and 5: Hard-to-understand metadata, and query and programming languages; poor decision support query performances Solution: In data warehouse, organize data in subject –oriented way rather than process- oriented way – dimensional modeling.
17
17 Dimensional Modeling (Star Schema) Academic Performance. Grade. Name. Rank Instructor. Name. UG/PG. Major Student Course. Number. Title Semester. Year. Length. Start date
18
18 Dimensional Modeling (Star Schema) Sales. Qty. Amt. Name. State. City Branch. Name. Category Product Customer. Name. State. City Time. Year. Quarter. Month
19
19 One System for Multiple Uses Database Management System (DBMS) Database Application Program Interactive Queries/ Transactions Database System Application Program Metadata
20
20 Two Worlds -> Two Systems Operational DSS Operational Application Operational Application Operational Application Data warehouse Executive Information System Decision Support System (DSS) Reporting OLTP DBs
21
21 What is Data Warehouse Data Warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data in support of management’s decision making process. 1. Subject-oriented means the data warehouse focuses on the high-level entities of business such as sales, products, and customers. This is in contrast to database systems, which deals with processes such as placing an order.
22
22 What is Data Warehouse 2. Integrated means the data is integrated from distributed data sources and historical data sources and stored in a consistent format. 3. Time-variant means the data associates with a point in time (i.e., semester, fiscal year and pay period) 4. Non-volatile means the data doesn’t change once it gets into the warehouse.
23
23 Characteristics of Data Warehouse
24
24 Data Warehouse and Data Mart Data warehouse – defined by its decision support purpose and other characteristics –Other characteristics: subject-oriented, integrated Data mart – a data warehouse for a more limited business scope (e.g., a department, etc.) A data warehouse may be built from several data marts
25
25 Source System (Legacy) extract Storage: Flat files (fastest); RDBMS; Other Processing: Clean; Prune; Combine; Remove duplicates; households; standardize; conform dimensions; store awaiting replications; archive; export to data marts No user query services Populate, replicate, recover Data Mart #1: OLAP ( ROLAP and/or MOLAP) query services; dimensional! Subject oriented; locally implemented; user group driven; may store atomic data; may be frequently refreshed; conform to DW Bus Data Mart #2 Data Mart #3 Populate, replicate, recover Ad Hoc Query Tools Report Writers End User Applications feed Models: forecasting; scoring; allocating; data mining; other downstream Systems; other parameters; special UI Data Staging Area The Data Warehouse Presentation Servers End User Data Access Uploaded cleaned dimensions Uploaded model results Basic Elements of a Data Warehouse System DW BUS Conformed dimensions and facts Relational Flat files Spreadsheets ERP Legacy
26
26 Current Practice of DW * Expected DW market value in 2002 was projected to have grown to $113.5 billion. Average DW development cost is $1.5 million and average maintenance cost is $0.5 million. DW development time ranges from 1 to 3 yrs. * Source: H.J. Watson, “ Current Practicing in Data Warehousing”, I.S. Management, 2001
27
27 Current Practice of DW * Sponsorship for the DW project SponsorPercentage VP of a business unit39.8 CIO26.9 Business unit manager16.7 CEO11.1 Other25.0 * Source: H.J. Watson, “ Current Practicing in Data Warehousing”, I.S. Management, 2001
28
28 Current Practice of DW * DW Benefits – Less effort to produce better information – Better decisions – Improvement of business processes – Support for accomplishments of strategic business objectives Return on Investments and Cost of Ownership? * Source: H.J. Watson, “ Current Practicing in Data Warehousing”, I.S. Management, 2001
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.