Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Sharif University Data Warehouse. 2 Sharif University Objectives Need for Data Warehouse. What is Data Warehouse? Data Warehouse Properties. Data Warehouse.

Similar presentations


Presentation on theme: "1 Sharif University Data Warehouse. 2 Sharif University Objectives Need for Data Warehouse. What is Data Warehouse? Data Warehouse Properties. Data Warehouse."— Presentation transcript:

1 1 Sharif University Data Warehouse

2 2 Sharif University Objectives Need for Data Warehouse. What is Data Warehouse? Data Warehouse Properties. Data Warehouse Architectures. Data Marts. Corporate Information Factory. Extraction, Transportation, Loading and Transformation. Design in Data Warehouses. Data Warehousing Schemas.

3 3 Sharif University Decision support questions that enterprises need to have answered How did sales representatives perform over different periods of time? What are the popular products? What types of customers buy what types of products? How much are the various internal organizations spending on what products?

4 4 Sharif University Cont. What were the variances between the amounts budgeted and the amounts spent? What positions are being filled by people with what types of background? What is the average pay for people within different age brackets?

5 5 Sharif University What is a Data Warehouse? A data warehouse is a relational database that is designed for query and analysis rather than for transaction processingA data warehouse is a relational database that is designed for query and analysis rather than for transaction processing A common way of introducing data warehousing is to refer to the characteristics of a data warehouse as set forth by “ William Inmon ”: – Subject Oriented – Integrated – Nonvolatile – Time Variant

6 6 Sharif University Data Warehouse Properties Subject Oriented Integrated Data Warehouse Non Volatile Time Variant

7 7 Sharif University Subject Oriented For example, to learn more about your company’s sales data, "Who was our best customer for this item, in this region last year?" This ability to define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented. Data is categorized and stored by business subject rather than by application. Operational Systems Operational Systems Region Time Customer Product Customer Financial Information Customer Financial Information Data Warehouse Subject Area

8 8 Sharif University Integrated Data warehouses must put data from disparate sources into a consistent format.

9 9 Sharif University Time Variant (time series) Data is stored as a series of snapshots, each representing a period of time. DataTime Jan/03 Feb/03 Mar/03 Data for January Data for February Data for March DataWarehouse

10 10 Sharif University Non Volatile Typically data in the data warehouse is not updated or deleted. Read Load INSERT Read UPDATEDELETE Operational Databases Warehouse Database Nonvolatile means that, once entered into the warehouse, data should not change.This is logical because the purpose of a warehouse is to enable you to analyze what has occurred.

11 11 Sharif University Other Characteristics of Data Warehouse Summarized Not Normalized Meta Data Sources (Both operational and external data are presents)

12 12 Sharif University Summary Data –Provide fast access to pre-computed data –Reduce use of I/O CPU Memory –Distill from Source systems - lightly summarized Pre-calculated summaries - highly summarized –Determine requirements early

13 13 Sharif University Summary Data Average Maximum Total Percentage DimensionData FactData Units Sold Sales($) Store Product A Total Product B Total Product C Total

14 14 Sharif University Summary Data Time Product Store Summary Fact (Derived)

15 15 Sharif University Normalization –Normalized data contains no Redundancy. Repeating data. Key independent columns. –Denormalized data often Improves efficiency in OLAP systems. Exists in data warehouse databases. Comprises derived or summary data. –Star and snowflake models are denormalized.

16 16 Sharif University Meta Data (Data about Data) Provides information about the content of the warehouse. Meta Data includes: A guide to moving data to the warehouse Rules for summarization Business terms used to describe data Technical terminology Rules for data extractions

17 17 Sharif University Data Warehouse Architectures Data Warehouse Architecture (Basic) Data Warehouse Architecture (with a Staging Area) Data Warehouse Architecture (with a Staging Area and Data Marts)

18 18 Sharif University Data Warehouse Architecture (Basic) End users directly access data derived from several source systems through the data warehouse.

19 19 Sharif University Data Warehouse Architecture (with a Staging Area) you need to clean and process your operational data before putting it into the warehouse. You can do this programmatically, although most data warehouses use a staging area instead.

20 20 Sharif University Data Warehouse Architecture (with a Staging Area and Data Marts) you may want to customize your warehouse’s architecture for different groups within your organization. You can do this by adding data marts, which are systems designed for a particular line of business.

21 21 Sharif University Data Marts A Data Mart is a small warehouse designed for strategic business unit or a department. Data Mart Advantages: The cost is low. Implementation time is shorter. They are controlled locally rather than centrally. They contain less information than the data warehouse and hence have more rapid response. They allow a business unit to build its own DSS without relying on a centralized IS department. Data Mart Types: Replicated Data Marts. Stand-alone Data Marts.

22 22 Sharif University Information Workshop Meta Data Management Operation & Administration Library & Toolbox Workbench Change Management Service Management Data Acquisition Management Systems Management Data Acquisition CIF Data Management Data Delivery Information Feedback API DSI TrI DSI Operational Systems Operational Data Store Data Warehouse Exploration Warehouse Data Mining Warehouse OLAP Data Mart Oper Mart External ERP Internet Legacy Other Corporate Information Factory

23 23 Sharif University Information Workshop Meta Data Management Operation & Administration Library & Toolbox Workbench Change Management Service Management Data Acquisition Management Systems Management Data Acquisition CIF Data Management Data Delivery Information Feedback API DSI TrI DSI Operational Systems Operational Data Store Data Warehouse Exploration Warehouse Data Mining Warehouse OLAP Data Mart Oper Mart External ERP Internet Legacy Other Business Operations Business Intelligence Business Management Major Business Functions

24 24 Sharif University Information Workshop Meta Data Management Operation & Administration Library & Toolbox Workbench Change Management Service Management Data Acquisition Management Systems Management Data Acquisition CIF Data Management Data Delivery Information Feedback API DSI TrI DSI Operational Systems Operational Data Store Data Warehouse Exploration Warehouse Data Mining Warehouse OLAP Data Mart Oper Mart External ERP Internet Legacy Other Operational Systems are the internal and external core systems that run the day-to-day business operations. They are accessed through application program interfaces (APIs) and are the source of data for the data warehouse and operational data store. Operational Systems

25 25 Sharif University Information Workshop Meta Data Management Operation & Administration Library & Toolbox Workbench Change Management Service Management Data Acquisition Management Systems Management Data Acquisition CIF Data Management Data Delivery Information Feedback API DSI TrI DSI Operational Systems Operational Data Store Data Warehouse Exploration Warehouse Data Mining Warehouse OLAP Data Mart Oper Mart External ERP Internet Legacy Other External Data is any data outside the normal data collected through an enterprise’s internal applications. Generally, external data, such as demographic, credit, competitor, and financial information, is purchased by the enterprise from a vendor of such information. External Data

26 26 Sharif University Information Workshop Meta Data Management Operation & Administration Library & Toolbox Workbench Change Management Service Management Data Acquisition Management Systems Management Data Acquisition CIF Data Management Data Delivery Information Feedback API DSI TrI DSI Operational Systems Operational Data Store Data Warehouse Exploration Warehouse Data Mining Warehouse OLAP Data Mart Oper Mart External ERP Internet Legacy Other Data Acquisition is the set of processes that capture, integrate, transform, cleanse, and load source data into the data warehouse and operational data store. Data Acquisition

27 27 Sharif University Data Problems

28 28 Sharif University Information Workshop Meta Data Management Operation & Administration Library & Toolbox Workbench Change Management Service Management Data Acquisition Management Systems Management Data Acquisition CIF Data Management Data Delivery Information Feedback API DSI TrI DSI Operational Systems Operational Data Store Data Warehouse Exploration Warehouse Data Mining Warehouse OLAP Data Mart Oper Mart External ERP Internet Legacy Other The Data Warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data used to support the strategic decision- making process for the enterprise. Data Warehouse

29 29 Sharif University Information Workshop Meta Data Management Operation & Administration Library & Toolbox Workbench Change Management Service Management Data Acquisition Management Systems Management Data Acquisition CIF Data Management Data Delivery Information Feedback API DSI TrI DSI Operational Systems Operational Data Store Data Warehouse Exploration Warehouse Data Mining Warehouse OLAP Data Mart Oper Mart External ERP Internet Legacy Other The Operational Data Store is an subject- oriented, integrated, current, volatile collection of data used to support the tactical decision-making process for the enterprise. Operational Data Store

30 30 Sharif University Comparing an Operational Data Store and a Data Warehouse

31 31 Sharif University Information Workshop Meta Data Management Operation & Administration Library & Toolbox Workbench Change Management Service Management Data Acquisition Management Systems Management Data Acquisition CIF Data Management Data Delivery Information Feedback API DSI TrI DSI Operational Systems Operational Data Store Data Warehouse Exploration Warehouse Data Mining Warehouse OLAP Data Mart Oper Mart External ERP Internet Legacy Other CIF Data Management is the set of processes that protect the integrity and continuity of the data within and across the data warehouse and operational data store. It may employ a staging area for cleansing and synchronizing data. CIF Data Management

32 32 Sharif University Information Workshop Meta Data Management Operation & Administration Library & Toolbox Workbench Change Management Service Management Data Acquisition Management Systems Management Data Acquisition CIF Data Management Data Delivery Information Feedback API DSI TrI DSI Operational Systems Operational Data Store Data Warehouse Exploration Warehouse Data Mining Warehouse OLAP Data Mart Oper Mart External ERP Internet Legacy Other The Transactional Interface is an easy-to-use and intuitive interface for the end user to access and manipulate data in the operational data store. Transactional Interface

33 33 Sharif University Information Workshop Meta Data Management Operation & Administration Library & Toolbox Workbench Change Management Service Management Data Acquisition Management Systems Management Data Acquisition CIF Data Management Data Delivery Information Feedback API DSI TrI DSI Operational Systems Operational Data Store Data Warehouse Exploration Warehouse Data Mining Warehouse OLAP Data Mart Oper Mart External ERP Internet Legacy Other Data Delivery is the set of processes that enables end users and their supporting IT groups to filter, format, and deliver data to data marts and oper-marts. Data Delivery

34 34 Sharif University Information Workshop Meta Data Management Operation & Administration Library & Toolbox Workbench Change Management Service Management Data Acquisition Management Systems Management Data Acquisition CIF Data Management Data Delivery Information Feedback API DSI TrI DSI Operational Systems Operational Data Store Data Warehouse Exploration Warehouse Data Mining Warehouse OLAP Data Mart Oper Mart External ERP Internet Legacy Other The Exploration Warehouse is a data mart whose purpose is to provide a safe haven for exploratory and ad hoc processing. An exploration warehouse may utilize specialized technologies to provide fast response times with the ability to access the entire database. Exploration Warehouse

35 35 Sharif University Information Workshop Meta Data Management Operation & Administration Library & Toolbox Workbench Change Management Service Management Data Acquisition Management Systems Management Data Acquisition CIF Data Management Data Delivery Information Feedback API DSI TrI DSI Operational Systems Operational Data Store Data Warehouse Exploration Warehouse Data Mining Warehouse OLAP Data Mart Oper Mart External ERP Internet Legacy Other The Data Mining Warehouse includes tasks known as knowledge extraction, data archaeology, data exploration, data pattern processing and data harvesting. Data Mining Warehouse

36 36 Sharif University Information Workshop Meta Data Management Operation & Administration Library & Toolbox Workbench Change Management Service Management Data Acquisition Management Systems Management Data Acquisition CIF Data Management Data Delivery Information Feedback API DSI TrI DSI Operational Systems Operational Data Store Data Warehouse Exploration Warehouse Data Mining Warehouse OLAP Data Mart Oper Mart External ERP Internet Legacy Other The OLAP (online analytical processing) Data Mart is aggregated and/or summarized data that is derived from the data warehouse and tailored to support the multidimensional requirements of a given business unit or business function. OLAP Data Mart

37 37 Sharif University Information Workshop Meta Data Management Operation & Administration Library & Toolbox Workbench Change Management Service Management Data Acquisition Management Systems Management Data Acquisition CIF Data Management Data Delivery Information Feedback API DSI TrI DSI Operational Systems Operational Data Store Data Warehouse Exploration Warehouse Data Mining Warehouse OLAP Data Mart Oper Mart External ERP Internet Legacy Other The Oper-Mart is a subset of data derived from of the operational data store used in tactical analysis and usually stored in a multidimensional manner (star schema or hypercube). They may be created in a temporary manner and dismantled when no longer needed. Oper-Mart

38 38 Sharif University Information Workshop Meta Data Management Operation & Administration Library & Toolbox Workbench Change Management Service Management Data Acquisition Management Systems Management Data Acquisition CIF Data Management Data Delivery Information Feedback API DSI TrI DSI Operational Systems Operational Data Store Data Warehouse Exploration Warehouse Data Mining Warehouse OLAP Data Mart Oper Mart External ERP Internet Legacy Other The Decision Support Interface is an easy-to-use, intuitive tool to enable end user capabilities such as exploration, data mining, OLAP, query, and reporting to distill information from data. Decision Support Interface

39 39 Sharif University Information Workshop Meta Data Management Operation & Administration Library & Toolbox Workbench Change Management Service Management Data Acquisition Management Systems Management Data Acquisition CIF Data Management Data Delivery Information Feedback API DSI TrI DSI Operational Systems Operational Data Store Data Warehouse Exploration Warehouse Data Mining Warehouse OLAP Data Mart Oper Mart External ERP Internet Legacy Other Meta Data Management is the set of processes for managing the information needed to promote data legibility, use, and administration. Meta Data Management

40 40 Sharif University Information Workshop Meta Data Management Operation & Administration Library & Toolbox Workbench Change Management Service Management Data Acquisition Management Systems Management Data Acquisition CIF Data Management Data Delivery Information Feedback API DSI TrI DSI Operational Systems Operational Data Store Data Warehouse Exploration Warehouse Data Mining Warehouse OLAP Data Mart Oper Mart External ERP Internet Legacy Other Information Feedback is the set of processes that transmit the intelligence gained through usage of the Corporate Information Factory to appropriate data stores. Information Feedback

41 41 Sharif University Information Workshop Meta Data Management Operation & Administration Library & Toolbox Workbench Change Management Service Management Data Acquisition Management Systems Management Data Acquisition CIF Data Management Data Delivery Information Feedback API DSI TrI DSI Operational Systems Operational Data Store Data Warehouse Exploration Warehouse Data Mining Warehouse OLAP Data Mart Oper Mart External ERP Internet Legacy Other Information Workshop is the set of the facilities that optimize use of the Corporate Information Factory by organizing its capabilities and knowledge, and then assimilating them into the business process. Information Workshop

42 42 Sharif University Information Workshop Meta Data Management Operation & Administration Library & Toolbox Workbench Change Management Service Management Data Acquisition Management Systems Management Data Acquisition CIF Data Management Data Delivery Information Feedback API DSI TrI DSI Operational Systems Operational Data Store Data Warehouse Exploration Warehouse Data Mining Warehouse OLAP Data Mart Oper Mart External ERP Internet Legacy Other The Library and Toolbox is the collection of meta data and capabilities that provides information to effectively use and administer the Corporate Information Factory. The library provides the medium from which knowledge is enriched. The toolbox is a vehicle for organizing, locating, and accessing capabilities. Library and Toolbox

43 43 Sharif University Information Workshop Meta Data Management Operation & Administration Library & Toolbox Workbench Change Management Service Management Data Acquisition Management Systems Management Data Acquisition CIF Data Management Data Delivery Information Feedback API DSI TrI DSI Operational Systems Operational Data Store Data Warehouse Exploration Warehouse Data Mining Warehouse OLAP Data Mart Oper Mart External ERP Internet Legacy Other The Workbench is a strategic mechanism for automating the integration of capabilities and knowledge into the business process. Workbench

44 44 Sharif University Information Workshop Meta Data Management Operation & Administration Library & Toolbox Workbench Change Management Service Management Data Acquisition Management Systems Management Data Acquisition CIF Data Management Data Delivery Information Feedback API DSI TrI DSI Operational Systems Operational Data Store Data Warehouse Exploration Warehouse Data Mining Warehouse OLAP Data Mart Oper Mart External ERP Internet Legacy Other Operation and Administration is the set of activities required to ensure smooth daily operations, to ensure that resources are optimized, and to ensure that growth is managed. Operations and Administration

45 45 Sharif University Information Workshop Meta Data Management Operation & Administration Library & Toolbox Workbench Change Management Service Management Data Acquisition Management Systems Management Data Acquisition CIF Data Management Data Delivery Information Feedback API DSI TrI DSI Operational Systems Operational Data Store Data Warehouse Exploration Warehouse Data Mining Warehouse OLAP Data Mart Oper Mart External ERP Internet Legacy Other Systems Management is the set of processes for maintaining, versioning, and upgrading the core technology on which the data, software, and tools operate. Systems Management

46 46 Sharif University Information Workshop Meta Data Management Operation & Administration Library & Toolbox Workbench Change Management Service Management Data Acquisition Management Systems Management Data Acquisition CIF Data Management Data Delivery Information Feedback API DSI TrI DSI Operational Systems Operational Data Store Data Warehouse Exploration Warehouse Data Mining Warehouse OLAP Data Mart Oper Mart External ERP Internet Legacy Other Data Acquisition Management is the set of processes that manage and maintain processes used to capture source data and its preparation for loading into the data warehouse or operational data store. Data Acquisition Management

47 47 Sharif University Information Workshop Meta Data Management Operation & Administration Library & Toolbox Workbench Change Management Service Management Data Acquisition Management Systems Management Data Acquisition CIF Data Management Data Delivery Information Feedback API DSI TrI DSI Operational Systems Operational Data Store Data Warehouse Exploration Warehouse Data Mining Warehouse OLAP Data Mart Oper Mart External ERP Internet Legacy Other Service Management is the set of processes for promoting user satisfaction and productivity within the Corporate Information Factory. It includes processes that manage and maintain service level agreements, requests for change, user communications, and the data delivery mechanisms. Service Management

48 48 Sharif University Information Workshop Meta Data Management Operation & Administration Library & Toolbox Workbench Change Management Service Management Data Acquisition Management Systems Management Data Acquisition CIF Data Management Data Delivery Information Feedback API DSI TrI DSI Operational Systems Operational Data Store Data Warehouse Exploration Warehouse Data Mining Warehouse OLAP Data Mart Oper Mart External ERP Internet Legacy Other Change Management is the set of processes coordinating modifications to the Corporate Information Factory. Change Management

49 49 Sharif University Extraction, Transportation, Loading and Transformation (ETL) OLTP Databases Staging File Warehouse Database Purchase specialist tools, or develop programs Extraction - select data using different methods Extraction - select data using different methods Transportation - move data into the warehouse Transportation - move data into the warehouse Loading and Transformation - validate, clean, integrate, and time stamp data Loading and Transformation - validate, clean, integrate, and time stamp data

50 50 Sharif University Data Quality - Importance Ensure data is Relevant Useful Warehouse Change Clean up Restructure Operationalsystems Relevant Useful Quality Accurate Accessible Large time consuming taskLarge time consuming task Quality Quality Accurate Accurate Accessible Accessible

51 51 Sharif University An Example a recorof as X + Customers: Browser: http:// Hollywood Sale1/2/98 12:00:01 Ham Pizza $10.00 Sale1/2/98 12:00:02 Cheese Pizza $15.00 Sale1/2/98 12:00:02 Anchovy Pizza $12.00 Return1/2/98 12:00:03 Anchovy Pizza - $12.00 Sale1/2/98 12:00:04 Sausage Pizza $11.00 Sale1/2/98 12:00:02 Anchovy Pizza $12.00 Return1/2/98 12:00:03 Anchovy Pizza - $12.00 Sale1/2/98 12:00:01 Ham Pizza $10.00 Sale1/2/98 12:00:02 Cheese Pizza $15.00 Sale1/2/98 12:00:04 Sausage Pizza $11.00

52 52 Sharif University Extraction in Data Warehouses Logical Extraction Methods –Full Extraction The data is extracted completely from the source system. –Incremental Extraction At a specific point in time, only the data that has changed since a well-defined event back in history will be extracted. Physical Extraction Methods –Online Extraction The data is extracted directly from the source system itself. –Offline Extraction Flat files Dump files Redo and archive logs Transportable tablespaces

53 53 Sharif University Changing Data Operational Databases Warehouse Database First time load Refresh Refresh Refresh PurgeorArchive

54 54 Sharif University Transportation in Data Warehouses Transportation Mechanisms in Data Warehouses –Transportation Using Flat Files –Transportation Through Distributed Operations –Transportation Using Transportable Tablespaces

55 55 Sharif University Transportation in Data Warehouses Transportation Using Flat Files –The most common method for transporting data is by the transfer of flat files, using mechanisms such as FTP or other remote file system access protocols Transportation Through Distributed Operations – Distributed queries, either with or without gateways, can be an effective mechanism for extracting data. These mechanisms also transport the data directly to the target system. Transportation Using Transportable Tablespaces –Some Databases such as Oracle and DB2 introduced an important mechanism for transporting data: transportable tablespaces. This feature is the fastest way for moving large volumes of data between two databases.

56 56 Sharif University Loading and Transformation in Data Warehouses Loading Mechanisms –SQL*Loader – External Tables – OCI and Direct-Path APIs – Export/Import Transformation Mechanisms – Transformation Using SQL – Transformation Using PL/SQL – Transformation Using Table Functions

57 57 Sharif University Incremental Development –Focus on business functionality –Deliver business benefit –Are suited to warehouse evolution –Once an increment is complete the selection and scope of the next increment is defined –Each increment follows the same phase sequence Strategy Project and Program Management Project and Program Management ETA Enterprise Technical Architecture ETA Enterprise Technical Architecture Definition Analysis Design Build Transition to Production Discovery Incremental Development

58 58 Sharif University Roles –The project team: roles and responsibilities –Common roles Analyst, Database Administrator, Programmer, Tester –Warehouse specific roles DW Architect, Metadata Architect, Data Quality Administrator, DW Administrator

59 59 Sharif University Design in Data Warehouses Logical Design in Data WarehousesLogical Design in Data Warehouses –Data Warehousing Schemas Star Snowflake Constellation Physical Design in Data WarehousesPhysical Design in Data Warehouses –Physical Design Structures Tablespaces Tables and Partitioned Tables Views Integrity Constraints Dimensions Indexes and Partitioned Indexes Materialized Views

60 60 Sharif University Data Warehousing Schemas Star Snowflake Constellation

61 61 Sharif University Star Schema The center of the star consists of one or more fact tables and the points of the star are the dimension tables. Store Table Store_id District_id... Item Table Item_id Item_desc... Time Table Day_id Month_id Period_id Year_id Product Table Product_id Product_desc … Sales Fact Table Product_id Store_id Item_id Day_id Sales_dollars Sales_units...

62 62 Sharif University Snowflake Schema d Sales Fact Table Item_id Store_id Sales_dollars Sales_units Store Table Store_id Store_desc District_id Item Table Item_id Item_desc Dept_id Time Table Week_id Period_id Year_id District Table District_id District_desc Dept Table Dept_id Dept_desc Mgr_id Mgr Table Dept_id Mgr_id Mgr_name Product Table Product_id Product_desc

63 63 Sharif University Constellation Warehouse Table Warehouse_id Warehouse_loc Inventory Fact Table Product_id Shelf_id Cost_dollars Qty_on_hand Store Table Store_id District_id Item Table Item_id Dept_id Time Table Week_id Period_id Year_id Product Table Product_id Product_desc Sales Fact Table Item_id Store_id Sales_dollars Sales_units

64 64 Sharif University Summary Need for Data Warehouse. What is Data Warehouse? Data Warehouse Properties. Data Warehouse Architectures. Data Marts. Corporate Information Factory. Extraction, Transportation, Loading and Transformation. Design in Data Warehouses. Data Warehousing Schemas.

65 65 Sharif University Q & A Data warehouse Internal and externalsystems Decision makers


Download ppt "1 Sharif University Data Warehouse. 2 Sharif University Objectives Need for Data Warehouse. What is Data Warehouse? Data Warehouse Properties. Data Warehouse."

Similar presentations


Ads by Google