Recap of Day 1 1 Dr. Chaitali Basu Mukherji
2 Which are our lowest/highest margin customers ? Who are my customers and what products are they buying? Which customers are most likely to go to the competition ? What impact will new products/services have on revenue and margins? What impact will new products/services have on revenue and margins? What product prom- -otions have the biggest impact on revenue? What is the most effective distribution channel? A producer wants to know….
Data, Data everywhere yet... 3 I can’t find the data I need – data is scattered over the network – many versions, subtle differences zI can’t get the data I need yneed an expert to get the data zI can’t understand the data I found yavailable data poorly documented zI can’t use the data I found yresults are unexpected ydata needs to be transformed from one form to other
What is a Data Warehouse? 4 A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context. [Barry Devlin]
What are the users saying... 5 Data should be integrated across the enterprise Summary data has a real value to the organization Historical data holds the key to understanding data over time What-if capabilities are required
What is Data Warehousing? 6 A process of transforming data into information and making it available to users in a timely enough manner to make a difference [Forrester Research, April 1996] Data Information
Evolution 7 60’s: Batch reports – hard to find and analyze information – inflexible and expensive, reprogram every new request 70’s: Terminal-based DSS and EIS (executive information systems) – still inflexible, not integrated with desktop tools 80’s: Desktop data access and analysis tools – query tools, spreadsheets, GUIs – easier to use, but only access operational databases
Evolution 8 90’s: Data warehousing with integrated OLAP engines and tools 91: Prism Solutions, founded by Bill Inmon, introduces Prism Warehouse Manager, software for developing a data warehouse. 95: The Data Warehousing Institute, a for-profit organization that promotes data warehousing, is founded. 2000: Daniel Linstedt releases the Data Vault, enabling real time auditable Data Warehouses warehouse.
Advantages of using Data warehousing 1.Prior to loading data into the data warehouse, inconsistencies are identified and resolved. This greatly simplifies reporting and analysis. 2.Because they are separate from operational systems, data warehouses provide retrieval of data without slowing down operational systems. 9
Disadvantages of using data warehousing 1.Data warehouses are not the optimal environment for unstructured data. 2.Because data must be extracted, transformed and loaded into the warehouse, there is an element of latency in data warehouse data. 10
Data Warehouse Architecture 11 Data Warehouse Engine Optimized Loader Extraction Cleansing Analyze Query Metadata Repository Relational Databases Legacy Data Purchased Data ERP Systems
Data warehousing methodologies Bottom-up design – Ralph Kimball, a well-known author on data warehousing, is a proponent of an approach to data warehouse design – In this approach data marts are first created to provide reporting and analytical capabilities for specific business processes. Top-down design – Bill Inmon, is one of the leading proponents of the top- down approach to data warehouse design. – In this approach data warehouse is designed using a normalized enterprise data model. "Atomic" data, that is, data at the lowest level of detail, are stored in the data warehouse. 12
13
Application Areas 14 IndustryApplication FinanceCredit Card Analysis InsuranceClaims, Fraud Analysis TelecommunicationCall record analysis TransportLogistics management Consumer goodspromotion analysis Data Service providersValue added data UtilitiesPower usage analysis