CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

Slides:



Advertisements
Similar presentations
1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
Advertisements

Chapter 11: Data Warehousing
Chapter 1: The Database Environment
Copyright: SIPC From Ontology to Data Model: Choices and Design Decisions Matthew West Reference Data Architecture and Standards Manager Shell International.
Relational Database and Data Modeling
Accounting and Financial Reporting
1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.
Data Warehousing – A Technology Marvel -by Swati Chawla.
Information Systems Today: Managing in the Digital World
1 Web-Enabled Decision Support Systems Access Introduction: Touring Access Prof. Name Position (123) University Name.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
Microsoft Access.
State of Connecticut Core-CT Project Query 8 hrs Updated 6/06/2006.
Supervisor : Prof . Abbdolahzadeh
Collections and services in the information environment JISC Collection/Service Description Workshop, London, 11 July 2002 Pete Johnston UKOLN, University.
An overview of Data Warehousing and OLAP Technology Presented By Manish Desai.
1 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. An Introduction to Data.
IS 4420 Database Fundamentals Chapter 11: Data Warehousing Leon Chen
Data Warehouse Overview (Financial Analysis) May 02, 2002.
10-1 Data and Knowledge Management 10-2 Data Management: A Critical Success Factor The difficulties and the process Data sources and collection Data.
Collection-level description in practice Collection-Level Description & NOF-digitise projects NOF-digitise programme seminar, London, 22 February 2002.
Principles Operational v Analytical Systems Data Warehousing & Data Mining Sheffield Hallam University 1.
Systems Analysis and Design with UML Version 2.0, Second Edition
Introduction to Databases
CHAPTER 8 INFORMATION IN ACTION
Executional Architecture
Copyrights, 2000, Information Frameworks September Business Information Warehouse September 2000 Naeem Hashmi Founder, CTO The Information Frameworks.
Presented by Douglas Greer Creating and Maintaining Business Objects Universes.
Stephen C. Hayne 1 Database System Components The Database and the DBMS.
RETAILING MANAGEMENT RETAILING MANAGEMENT 5th Edition.
12 Financial Management 12-1 Financial Planning
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence.
Accounting Principles, Ninth Edition
Chapter 13 The Data Warehouse
© Copyright 2011 John Wiley & Sons, Inc.
© 2007 by Prentice Hall Management Information Systems, 10/e Raymond McLeod and George Schell 1 Management Information Systems, 10/e Raymond McLeod Jr.
C6 Databases.
Management Information Systems, Sixth Edition
Data Warehouse Architecture Sakthi Angappamudali Data Architect, The Oregon State University, Corvallis 16 th May, 2005.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Data Warehouse IMS5024 – presented by Eder Tsang.
Chapter 15 Data Warehousing, OLAP, and Data Mining
Chapter 14 The Second Component: The Database.
Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.
Designing a Data Warehouse
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Intro to MIS – MGS351 Databases and Data Warehouses Chapter 3.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Data Warehousing.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
Datawarehouse A sneak preview. 2 Data Warehouse Approach An old idea with a new interest: Cheap Computing Power Special Purpose Hardware New Data Structures.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Foundations of Business Intelligence: Databases and Information Management.
1 Management Information Systems M Agung Ali Fikri, SE. MM.
1 Data Warehousing Data Warehousing. 2 Objectives Definition of terms Definition of terms Reasons for information gap between information needs and availability.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Supervisor : Prof . Abbdolahzadeh
Intro to MIS – MGS351 Databases and Data Warehouses
Data warehouse and OLAP
Data storage is growing Future Prediction through historical data
Data Warehouse.
Data Warehouse and OLAP
An Introduction to Data Warehousing
Data Warehousing Concepts
Data Warehouse and OLAP
Presentation transcript:

CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

2 by Professor Ronald J Norman of Grossmont College, CA, USARonald J Norman Prof Norman used these slides in his data mining course based on Data Mining Techniques (Second Edition) By Michael J. A. Berry and Gordon S. Linoff 2004 John Wiley & SonsMichael J. A. BerryGordon S. Linoff Management-oriented textbook...

3 Introduction Data, data, data…everywhere! Information…thats another story! Especially, the right the right time! Data warehousings goal is to make the right information the right time Data warehousing is a data store (eg., a filestore or database of some sort) and a process for bringing together disparate data from throughout an organization for decision-support purposes

4 Introduction Data warehouses are natural allies for data mining (work together well) Data mining can help fulfill some of the goal of data warehouses – right the right time Relational database management systems (RDBMS), such as Oracle, DB2, Sybase, Informix, Focus, SQL Server, etc. can be used for data warehousing; or just store as text/HTML

5 Definitions of a Data Warehouse - W.H. Inmon A subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process - Ralph Kimball A copy of transaction data, specifically structured for query and analysis 1. 2.

6 CW corpus as a Data Warehouse - W.H. Inmon Subject-oriented: English terminology on WWW Integrated: harvested from many sources, into a single standard format and file-store Time-variant : WWW pages change! Non-volatile : corpus is a static snap-shot - Ralph Kimball Copy of transaction data: cache structured for query and analysis: raw text yields word-frequency list 1. 2.

7 Data Warehouse For organizational learning to take place, data from many sources must be gathered together and organized in a consistent and useful way – hence, Data Warehousing (DW) DW allows an organization to archive snapshots of its data, and what it has noticed about its data Data Mining techniques make use of the data in a Data Warehouse

8 Data Warehouse Customers Etc… VendorsEtc… Orders Data Warehouse Enterprise Database Transactions Copied, organized summarized Data Mining Data Miners: Farmers – they know Explorers - unpredictable

9 Data Warehouse A data warehouse is a copy of transaction data specifically structured for querying, analysis, reporting, and more rigorous data mining Note that the data warehouse contains a copy of the transactions which are not updated or changed later by the transaction system Also note that this data is specially structured, and may have been transformed when it was copied into the data warehouse

10 Data Mart A Data Mart is a smaller, more focused Data Warehouse – a mini-warehouse. A Data Mart typically reflects the business rules of a specific business unit within an enterprise. Which English dominates the WWW, UK or US: each student captured a Data Mart for 1 domain.

11 Data Warehouse to Data Mart Data Warehouse Data Mart Decision Support Information Decision Support Information Decision Support Information

12 open source Data Warehouses A company may keep its DW private! Large data-sets are valuable Gold Standards for research and development Some Universities host public DWs Eg ICAME: International Computer Archive of Modern English ICAME also runs CORPORA forum Martin Krallinger etc on UK v US English: archive/2006-November/ html archive/2006-November/ html

13 Other Data repositories UPenn: Linguistic Data Consortium European equivalents: ELRA ELDA Leeds Electronic Text Centre Leeds Centre for Translation Studies

14 Generic Architecture of Data (synonym) Transaction data

15 Transaction (Operational) Data Operational (production) systems create (massive number of) transactions, such as sales, purchases, deposits, withdrawals, returns, refunds, phone calls, toll roads, web site hits, web site text, etc… Transactions are the base level of data – the raw material for understanding customer behavior Unfortunately, operational systems change, eg new formats, due to changing business needs Data warehousing strategies need to be aware of operational system changes

16 Operational Summary Data Summaries are for a specific time period and utilize the transaction data for that time period Other Examples???

17 Decision Support Summary Data The data that are used to help make decisions about the business –Financial Data, such as: Income Statements (Profit & Loss) Balance Sheets (Assets – Liabilities = Net Worth) –Sales summaries –Other examples??? Data warehouses maintain this type of data, however financial data of record (for audit purposes) usually comes from databases and not the data warehouse (confusing???) Generally, it is a bad idea to use the same system for analytic and operational purposes

18 Database Schema Database schema defines the structure of data, not the values of the data (e.g., first name, last name = structure; Ron Norman = values of the data) In RDBMS: –Columns = fields = attributes (A,B,C) –Rows = records = tuples (1-7)

19 Logical & Physical Database Schema Describes data in a way that is familiar to business users Describes the data the way it will be stored in an RDBMS which might be different than the way the logical shows it

20 Metadata General definition: Data about data !!! –Examples: A librarys card catalog (metadata) describes publications (data) A file system maintains permissions (metadata) about files (data) A form of system documentation including: –Values legally allowed in a field (e.g., AZ, CA, OR, UT, WA, etc.) –Description of the contents of each field (e.g., start date) –Date when data were loaded –Indication of currency of the data (last updated) –Mappings between systems (e.g., A.this = B.that) Invaluable, otherwise have to research to find it

21 Business Rules Highest level of abstraction from operational (transaction) data Describes why relationships exist and how they are applied Examples: –Need to have 3 forms of ID for credit –Only allow a maximum daily withdrawal of $200 –After the 3 rd log-in attempt, lock the log-in screen –Accept no bills larger than $20 –Others???

22 OLAP – Online Analytical Processing A definition: Data representation for ease of visualization OLAP goes beyond SQL with its analysis capabilities Key feature of OLAP: Relevant multi-dimensional views such as products, time, geography

23 OLAP Architecture

24 General Architecture for Data Warehousing Source systems Extraction, (Clean), Transformation, & Load (ETL) Central repository Metadata repository Data marts Operational feedback End users: analysis, OLAP, Data-Mining

25 CS490D25 DM vs. OLAP Data Mining: – can handle complex data types of the attributes and their aggregations – a more automated process Online Analytic Processing (visualization): –restricted to a small number of dimension and measure types –user-controlled process

26 CS490D26 DM + visualization Data Mining: – can handle complex data types of the attributes and their aggregations – reduces data to smaller number of patterns Visualization: –restricted to a small number of patterns –user-controlled process to select patterns which are interesting or useful

27 Q: Is it a Data Warehouse? Is ANY data-set a Data Warehouse? SIS? Library Catalogue? VLE? Text in a textbook?

28 Definitions of a Data Warehouse - W.H. Inmon A subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process - Ralph Kimball A copy of transaction data, specifically structured for query and analysis 1. 2.

29 CW corpus as a Data Warehouse - W.H. Inmon Subject-oriented: English terminology on WWW Integrated: harvested from many sources, into a single standard format and file-store Time-variant : WWW pages change! Non-volatile : corpus is a static snap-shot - Ralph Kimball Copy of transaction data: cache structured for query and analysis: raw text yields word-frequency list 1. 2.