Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics

Slides:



Advertisements
Similar presentations
Chapter 13 The Data Warehouse
Advertisements

C6 Databases.
Accessing Organizational Information—Data Warehouse
Chapter 13 The Data Warehouse.
Database – Part 3 Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Mr. Sakthi Angappamudali.
Database Systems: Design, Implementation, and Management Tenth Edition
Data Sources Data Warehouse Analysis Results Data visualisation Analytical tools OLAP Data Mining Overview of Business Intelligence Data visualisation.
13 Chapter 13 The Data Warehouse Hachim Haddouti.
Business Driven Technology Unit 2
Chapter 13 The Data Warehouse
DATA WAREHOUSE (Muscat, Oman).
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Chapter 13 – Data Warehousing. Databases  Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age  Information,
CHAPTER 08 Accessing Organizational Information – Data Warehouse
ITEC 3220A Using and Designing Database Systems
Basic Concepts of Datawarehousing An Overview Prasanth Gurram.
Data Warehousing/Mining 1 Data Warehousing/Mining Comp 150 Additional Information Instructor: Dan Hebert.
Chapter 13 The Data Warehouse
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Data Warehouse & Data Mining
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Moving Towards A Data Repository That Facilitates Data Analysis CHOP November 18,
Datawarehouse Objectives
BUS1MIS Management Information Systems Semester 1, 2012 Week 6 Lecture 1.
Data Warehouse. Design DataWarehouse Key Design Considerations it is important to consider the intended purpose of the data warehouse or business intelligence.
1 Data Warehouses BUAD/American University Data Warehouses.
13 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management 4th Edition Peter Rob & Carlos Coronel.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Data Warehousing.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 13 Business Intelligence and Data Warehouses.
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Fox MIS Spring 2011 Data Warehouse Week 8 Introduction of Data Warehouse Multidimensional Analysis: OLAP.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 13 Business Intelligence and Data Warehouses.
UNIT-II Principles of dimensional modeling
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
McGraw-Hill/Irwin ©2009 The McGraw-Hill Companies, All Rights Reserved CHAPTER 6 DATABASES AND DATA WAREHOUSES CHAPTER 6 DATABASES AND DATA WAREHOUSES.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
Managing Data for DSS II. Managing Data for DS Data Warehouse Common characteristics : –Database designed to meet analytical tasks comprising of data.
Data Warehousing.
Advanced Database Concepts
12 1 Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel 12.4 Online Analytical Processing OLAP creates an advanced data.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
ITEC 3220M Using and Designing Database Systems Instructor: Prof. Z.Yang Course Website: c3220m.htm Office: TEL.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
BUSINESS INTELLIGENCE. The new technology for understanding the past & predicting the future … BI is broad category of technologies that allows for gathering,
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Business Intelligence Overview
Advanced Applied IT for Business 2
Chapter 13 Business Intelligence and Data Warehouses
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Data storage is growing Future Prediction through historical data
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
Chapter 13 – Data Warehousing
المحاضرة 4 : مستودعات البيانات (Data warehouse)
Data Warehouse and OLAP
Introduction of Week 9 Return assignment 5-2
Data Warehouse.
Data Warehousing Concepts
Data Warehouse and OLAP
Presentation transcript:

Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics From previous classes, we have learned that data has been valuable resources for businesses. 1

Importance of data What organizations do with data? Transaction Processing (E)commerce: Amazon.com; PNC Bank B2B systems: Supply Chain Management Web Search Decision Making Financial reporting Inventory management Budget allocation Customer Relationship Management Target Marketing Product Design and Promotions Fraud Detection

Unnormalized data set Patient ID Name Address DOB Doc Appt Date Location DX 111111 Cindy Marselis 2320 Edge Hill Road 1/11/64 Armstrong 9/1/09 11:00 AM Alter 2011 Herniated Disc Flu 9331 Rising Sun Avenue Morningstar Allen 11/1/09 10:00 AM Alter 2012 Psoriasis 222222 Kathryn Marselis 11/3/04 Dershaw 8/1/09 11:00 AM Speakman 105 Well baby check Cindy Schwartz 8/11/09 3:00 PM Alter 105

Normalized db - before

Normalized db - after

Decision Making with Databases Databases are used for transaction processing Data from transaction processing is used for tactical decision making Database provides basic reporting function But…

The Need for Data Analysis Different managers require different data and data may come from other part of the organization or outside the organization External and internal forces require tactical and strategic decisions Search for competitive advantage Business environments are dynamic Decision-making cycle time is reduced

Some Questions Analysts Need to Answers Sales analysis: What are the sales by quarter and geography? How do sales compare in two different stores in the same state? Profitability analysis: Which is the most profitable store in the state CA? Which product lines are the highest revenue producers this year? Which products and product lines are the most profitable this quarter? Sale force analysis Which salesperson is the best revenue producer this year? Do salesperson X meet his sale target this quarter?

From transaction processing to supporting decision making

Operational vs. Decision Support Data Operational data Relational, normalized database Optimized to support transactions Real time updates DSS Snapshot of operational data Summarized Large amounts of data Data analyst viewpoint Timespan Granularity Dimensionality

Data Warehouse Integrated Centralized Holds data retrieved from entire organization Subject-Oriented Optimized to give answers to diverse questions Used by all functional areas Time Variant Flow of data through time Projected data Non-Volatile Data never removed Always growing

Data Warehouse Extraction, transformation, and loading (ETL) – a process that extracts information from internal and external databases, transforms the information using a common set of enterprise definitions, and loads the information into a data warehouse Data mart – contains a subset of data warehouse information

ETL – Extraction, Transformation, Load data from source systems Transform: cleanse data for consistency and output exceptions Apply business rules Selecting certain columns to load (not null records) Translating coded values (1, M, male = 0 ) Derive new calculated value (sale_amount = qty * unit_price) Join data from multiple sources (lookup, merge) Aggregate (rollup/summarize data) Transpose/pivot (turning columns into rows) Data validation. Load: data into repository

Data in a Data Warehouse Data for a data warehouse is obtained from a variety of databases E.g. customer database, transaction database, accounts database Data in data warehouse is multidimensional

Multidimensional Analysis Cube – common term for the representation of multidimensional information

Star Schema Data-modeling technique Maps multidimensional decision support into relational database Yield model for multidimensional data analysis while preserving relational structure of operational DB Four Components: Facts Dimensions Attributes Attribute hierarchies

Simple Star Schema Figure 13.12

Slice and Dice View of Sales Figure 13.14

Star Schema Representation Facts and dimensions represented by physical tables in data warehouse DB Fact table related to each dimension table (M:1) Fact and dimension tables related by foreign keys Subject to the primary/foreign key constraints

Star Schema for Sales Sales fact table and its four dimensions: location, time, product, and customer . Allows sales to be aggregated by time, geographic location, product, and by customer.

Data Warehouse to Data Marts Given the large size of a data warehouse, organizations create data marts Subject oriented data Subset of data in a data warehouse Used for focused decision-making

Online Analytical Processing (OLAP) Advanced data analysis environment Supports decision making, business modeling, and operations research activities Characteristics of OLAP Use multidimensional data analysis techniques Provide advanced database support Provide easy-to-use end-user interfaces Support client/server architecture

OLAP Client/Server Architecture Figure 13.6

Data Mining Seeks to discover patterns or relationships within the data Data mining tools automatically search data for patterns and relationships Data mining tools Analyze data Uncover problems or opportunities Form computer models based on findings Predict business behavior with models Require minimal end-user intervention

What Are Data-Mining Tools?

Data Mining Process

Business Intelligence AB113 - Information Technology

MS SQL 2008 Architecture Relational Model Dimensional Model/Star Schema

Back Room—Data prepared from many sources Front Room—Information presented

Multidimensional Analysis and Data Mining Differences between databases and data warehouse/data mart? Data mining – the process of analyzing data to extract information not offered by the raw data alone To perform data mining users need data-mining tools Data-mining tool – uses a variety of techniques to find patterns and relationships in large volumes of information and infers rules that predict future behavior and guide decision making Business intelligence – taking data from multiple sources and turn it into useful and easy to understand information to support decision-making efforts for various kinds of people.