Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. Abdul Basit Siddiqui Assistant Professor FUIEMS (Lecture Slides Week # 2)

Similar presentations

Presentation on theme: "Dr. Abdul Basit Siddiqui Assistant Professor FUIEMS (Lecture Slides Week # 2)"— Presentation transcript:

1 Dr. Abdul Basit Siddiqui Assistant Professor FUIEMS (Lecture Slides Week # 2)

2 Why a Data Warehouse (DWH)? Data recording and storage is growing: Almost every industry has huge amount of operational data. Careful use/analysis of historic information may result in excellent prediction for the future: Knowledge worker wants to turn available data into useful information. This information is used by them to support strategic decision making. Gives total view of the organization: It is a platform for consolidated historical data for analysis. It stores data of good quality so that knowledge worker can make correct decisions. Intelligent decision-support is required for decision- making. Data Warehouse & Mining- Spring 2012

3 Why a Data Warehouse? (Contd.) From business perspective: It is latest marketing weapon. Helps to keep customers by learning more about their needs. Valuable tool in today’s competitive fast evolving world. Data Warehouse & Mining- Spring 2012

4 Reason-I: Why a Data Warehouse (DWH)? Data sets are growing: How Much Data is that? 1 MB2 20 or 10 6 bytes Small novel 3½ Disk. 1 GB2 30 or 10 9 bytes Paper reams that could fill the back of a pickup van. 1 TB2 40 or 10 12 bytes 50,000 trees chopped and converted into paper and printed. 2 PB1 PB = 2 50 or 10 15 bytesAcademic research libraries across USA. 5 EB1 EB = 2 60 or 10 18 bytes All words ever spoken by the Human Beings. Data Warehouse & Mining- Spring 2012

5 Reason-I: Why a Data Warehouse (DWH)? Size of Data Sets are going up. Cost of Data Storage is coming down. The amount of data average business collects and stores is doubling every year. Total hardware and software cost to store and manage 1 MB of data: 1990: $ 15 2002: ¢ 15 (down 100 times) 2010: < ¢ 1 (down 150 times) A few examples: Wall Mart: 24+ TB Finance Telecom: 100+ TB CERN: Upto 20 PB by 2006 Stanford Linear Accelerator Center (SLAC): 500 TB Telenor, Ufone, Mobilink, Warid, Zong ??? Data Warehouse & Mining- Spring 2012

6 Caution! A Warehouse of Data is NOT a Data Warehouse. Data Warehouse & Mining- Spring 2012

7 Caution! Size is NOT Everything. Data Warehouse & Mining- Spring 2012

8 Reason-2: Why a Data Warehouse (DWH)? DBMS Approach List of all items that were sold last month? List of all makeup items purchased by Sassi? The total sales of the last month grouped by branch? How many sales transactions occurred during the month of January? Intelligent Enterprise Which items sell together? Which items to stock? Where and how to place the items? What discounts to offer? How best to target customers to increase sales at a branch? Which customers are most likely to respond to my next promotional campaign, and why? Data Warehouse & Mining- Spring 2012 Businesses demand Intelligence (BI). Complex questions from integrated data. “Intelligent Enterprise”

9 Reason-3: Why a Data Warehouse (DWH)? Businesses want much more … What happened? Why it happened? What will happen? What is happening? What do you want to happen? Data Warehouse & Mining- Spring 2012

10 What is a Data Warehouse? A complete repository of historical corporate data extracted from transaction systems that is available for ad-hoc access by knowledge workers. Data Warehouse & Mining- Spring 2012

11 What is a Data Warehouse? Transaction System: Management Information System (MIS) Could be typed sheets (NOT transaction system) Ad-Hoc Access: Does not have a certain access pattern Queries not known in advance Difficult to write SQL in advance Knowledge Workers: Typically NOT IT literate (Executives, Analysts, Managers) NOT clerical workers Decision makers Data Warehouse & Mining- Spring 2012

12 What is a Data Warehouse? Inmons’s Definition: A Data Warehouse is: Subject-oriented Integrated Time-variant Nonvolatile Collection of data in support of management’s decision making process. Data Warehouse & Mining- Spring 2012

13 Another View of a DWH Data Warehouse & Mining- Spring 2012 Subject Oriented Integrated Time Variant Non Volatile

14 Subject-oriented Data Warehouse is organized around subjects such as sales, product, customer. It focuses on modeling and analysis of data for decision makers. Excludes data not useful in decision support process. Data Warehouse & Mining- Spring 2012

15 Integration Data Warehouse is constructed by integrating multiple heterogeneous sources. Data Preprocessing are applied to ensure consistency. Data Warehouse & Mining- Spring 2012 RDBMS Legacy System Data Warehouse Flat File Data Processing Data Transformation

16 Time-variant Provides information from historical perspective e.g. past 5-10 years. Every key structure contains either implicitly or explicitly an element of time. Data Warehouse & Mining- Spring 2012

17 Nonvolatile Data once recorded cannot be updated. Data Warehouse requires two operations in data accessing Initial loading of data Access of data Data Warehouse & Mining- Spring 2012 load access

18 Summary: What is a Data Warehouse? It is a blend of many technologies, the basic concept being: Take all data from different operational systems If necessary, add relevant data from industry Transform all data and bring into a uniform format Integrate all data as a single entity Store data in a format supporting easy access for decision support Create performance enhancing indices Implement performance enhancement joins Run ad-hoc queries with slow selectivity Data Warehouse & Mining- Spring 2012

19 Benefits of Data Warehouse High returns on investment. Substantial competitive advantage. Increased productivity of corporate decision-makers. Fast reporting for decision making process. Reduced reporting load on transactional systems. Making institutional data more user-friendly and accessible for knowledge workers. Integrated data from different source systems. Enabled ‘point-in-time’ analysis and trending over time. Helps in identifying and resolving data integrity issues, either in the warehouse itself or in the source systems that collect the data. Data Warehouse & Mining- Spring 2012

20 Data Warehouse: How is it Different? 1.Decision making is Ad-Hoc Data Warehouse & Mining- Spring 2012

21 Data Warehouse: How is it Different? 2.Different patterns of hardware utilization Data Warehouse & Mining- Spring 2012 Bus Service vs. Train

22 Data Warehouse: How is it Different? 3.Combines operational and historic data Don’t do data entry into a DWH. OLTP or ERP are the source systems. OLTP systems don’t keep history, cannot get balance statement more than a year old. DWH keep historical data, even of bygone customers. Why? In the context of bank, want to know why the customer left? What are the events that led to his/her leaving? Why? Customer retention Data Warehouse & Mining- Spring 2012

23 Data Warehouse: How is it Different? How much history? Depends on: Industry Cost of storing historical data Economic value of historical data Industry and history Telecom calls are much much more as compared to bank transactions 18 months Retailers interested in analyzing yearly seasonal patterns 65 weeks, why? Insurance companies want to do actuary analysis, use the historical data in order to predict risk 7 years Hence NOT a complete repository of data. Data Warehouse & Mining- Spring 2012

24 Data Warehouse: How is it Different? How much history? Economic value of data vs. storage cost Data Warehouse a complete repository of data? Data Warehouse & Mining- Spring 2012

Download ppt "Dr. Abdul Basit Siddiqui Assistant Professor FUIEMS (Lecture Slides Week # 2)"

Similar presentations

Ads by Google