Intro to Data Mining: Extracting Information and Knowledge from Data.

Slides:



Advertisements
Similar presentations
Chapter 13 The Data Warehouse
Advertisements

Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
Chapter 13 The Data Warehouse.
Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics
Chapter 13 Business Intelligence and Data Warehouses
Database Systems: Design, Implementation, and Management Tenth Edition
Chapter 9 DATA WAREHOUSING Transparencies © Pearson Education Limited 1995, 2005.
Chapter 12 The Data Warehouse
Introduction to Data Warehousing. From DBMS to Decision Support DBMSs widely used to maintain transactional data Attempts to use of these data for analysis,
Data Sources Data Warehouse Analysis Results Data visualisation Analytical tools OLAP Data Mining Overview of Business Intelligence Data visualisation.
DATA WAREHOUSING.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
13 Chapter 13 The Data Warehouse Hachim Haddouti.
Chapter 2: Data Warehousing
1 9 Adv. DBMS Data Warehouse CSC5301 Review Hachim Haddouti.
Chapter 13 The Data Warehouse
DATA WAREHOUSE (Muscat, Oman).
Designing a Data Warehouse
Chapter 13 – Data Warehousing. Databases  Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age  Information,
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Data Warehousing/Mining 1 Data Warehousing/Mining Comp 150 Additional Information Instructor: Dan Hebert.
Chapter 13 The Data Warehouse
12 The Data Warehouse and Data Mining MIS 304 Winter 2006.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Data Warehouse & Data Mining
OnLine Analytical Processing (OLAP)
Datawarehouse Objectives
1 Data Warehouses BUAD/American University Data Warehouses.
13 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management 4th Edition Peter Rob & Carlos Coronel.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Data Warehousing.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 13 Business Intelligence and Data Warehouses.
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
By N.Gopinath AP/CSE. There are 5 categories of Decision support tools, They are; 1. Reporting 2. Managed Query 3. Executive Information Systems 4. OLAP.
Ayyat IT Group Murad Faridi Roll NO#2492 Muhammad Waqas Roll NO#2803 Salman Raza Roll NO#2473 Junaid Pervaiz Roll NO#2468 Instructor :- “ Madam Sana Saeed”
Fox MIS Spring 2011 Data Warehouse Week 8 Introduction of Data Warehouse Multidimensional Analysis: OLAP.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 13 Business Intelligence and Data Warehouses.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
What is OLAP?.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
CSE 5331/7331 F'071 CSE 5331/7331 Fall 2007 Dimensional Modeling Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University.
Advanced Database Concepts
1 Database Systems, 8 th Edition 1 Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: –How business intelligence.
12 1 Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel 12.4 Online Analytical Processing OLAP creates an advanced data.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Chapter 13 Business Intelligence and Data Warehouses
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Data Warehouse.
Chapter 13 – Data Warehousing
Data Warehouse and OLAP
Introduction of Week 9 Return assignment 5-2
Chapter 13 The Data Warehouse
Chapter 13 The Data Warehouse
Data Warehouse and OLAP
Presentation transcript:

Intro to Data Mining: Extracting Information and Knowledge from Data

Topics Relationships between DSS/BI, database, data management DSS/BI: transforming data into info to support decision making How operational data and DSS/BI data differ What a data warehouse is, how data for it are prepared, and how it is implemented Multidimensional database Database technology for BI: OLAP, OLTP Examples of applications in healthcare 2

BI: Extraction Of Knowledge From Data

DSS/BI Architecture: Learning and Predicting Courtesy: Tim Graettinger

DSS/BI DSS/BI are technologies designed to extract information from data and to use such information as a basis for decision making Decision support system (DSS) ◦ Arrangement of computerized tools used to assist managerial decision making within business ◦ Usually requires extensive data “massaging” to produce information ◦ Used at all levels within organization ◦ Often tailored to focus on specific business areas ◦ Provides ad hoc query tools to retrieve data and to display data in different formats 5

DSS/BI Components Data store component ◦ Basically a DSS database Data extraction and data filtering component ◦ Used to extract and validate data taken from operational database and external data sources End-user query tool ◦ Used to create queries that access database End-user presentation tool ◦ Used to organize and present data 6

Main Components Of A DSS/BI

DSS/BI: Needs a different type of database A specialized DBMS tailored to provide fast answers to complex queries. Database schema ◦ Must support complex data representations ◦ Must contain aggregated and summarized data ◦ Queries must be able to extract multidimensional time slices Database size: DBMS must support very large databases (VLDBs), Wal-Mart data warehouses is measured in petabyte (1,000 terabyte) Technology: Data warehouse and OLAP

Operational vs. DSS/BI Data

Operational vs DSS Data

What is Data Warehouse? The Data Warehouse is an integrated, subject- oriented, time-variant, non-volatile database that provides support for decision making. Usually a read-only database optimized for data analysis and query processing centralized, consolidated database periodically updated, never removed Requires time, money, and considerable managerial effort to create

OLAP (Online Analytical Processing) 12 Advanced data analysis environment that supports decision making, business modeling, and operations research “engine” or platform for DSS or Data Warehouse OLAP systems share four main characteristics: ◦ Use multidimensional data analysis techniques ◦ Provide advanced database support ◦ Provide easy-to-use end-user interfaces ◦ Support client/server architecture

OLAP vs OLTP Online Transactional Processing (OLTP) ◦ emphasize speed, security, flexibility, reduce redundancy and abnormalities. Online Analytical Processing (OLAP) ◦ multi-dimensional data analysis ◦ advanced database support ◦ easy-to-use user interface ◦ support client/server architecture

Multidimensional Data Analysis Goal: analyze data from different dimensions and different levels of aggregation

Multidimensional Data Analysis Techniques Data are processed and viewed as part of a multidimensional structure Particularly attractive to business decision makers Augmented by following functions: ◦ Advanced data presentation functions ◦ Advanced data aggregation, consolidation and classification functions ◦ Advanced computational functions ◦ Advanced data modeling functions 15

Multidimensional Data Analysis: Operational vs multidimensional view

Integration OLAP with Spreadsheet

Easy-to-Use End-User Interface Many of interface features are “borrowed” from previous generations of data analysis tools that are already familiar to end users ◦ Makes OLAP easily accepted and readily used

Client/Server Architecture Provides framework within which new systems can be designed, developed, and implemented ◦ Enables OLAP system to be divided into several components that define its architecture ◦ OLAP is designed to meet ease-of-use as well as system flexibility requirements

OLAP Architecture Designed to use both operational and data warehouse data Defined as an “advanced data analysis environment that supports decision making, business modeling, and an operation’s research activities” In most implementations, data warehouse and OLAP are interrelated and complementary environments

OLAP Architecture: OLAP engine provides ETL (DTS) functions

Relational OLAP Provides OLAP functionality by using relational databases and familiar relational query tools to store and analyze multidimensional data Adds following extensions to traditional RDBMS: ◦ Multidimensional data schema support within RDBMS ◦ Data access language and query performance optimized for multidimensional data

Relational OLAP (ROLAP)

Multidimensional OLAP (MOLAP) Extends OLAP functionality to multidimensional database management systems (MDBMSs) ◦ MDBMS end users visualize stored data as a 3D cube-a data cube ◦ Data cubes can grow to n number of dimensions, becoming hypercubes ◦ To speed access, data cubes are held in memory in a cube cache

Multidimensional OLAP

Relational vs. Multidimensional OLAP

Star Schemas Data modeling technique used to map multidimensional decision support data into relational database Creates near equivalent of multidimensional database schema from existing relational database Yield an easily implemented model for multidimensional data analysis, while still preserving relational structures on which operational database is built Has four components: facts, dimensions, attributes, and attribute hierarchies

Facts Numeric measurements (values) that represent specific business aspect or activity ◦ Normally stored in fact table that is center of star schema Fact table contains facts that are linked through their dimensions Metrics are facts computed or derived at run time

Dimensions: simple star schema

Attributes Used to search, filter, or classify facts Dimensions provide descriptive characteristics about the facts through their attributes

Attributes: Three-dimensional view of sales

Attributes: slice-and-dice view of sales

Attribute Hierarchies Provides top-down data organization Provides capability to perform drill-down and roll-up searches in a data warehouse

Attribute Hierarchies in multidimensional analysis

Star Schema Representation Each dimension record is related to thousands of fact records Facilitates data retrieval functions

Slice and Dice

Star Schema Representation: order star schema

Apply Database Design Procedures: DW design and implementation

Data Warehouse Vendors

OLAP Market Size 40

OLAP Market Share 41

Market Consolidation 42

Latest Development Oracle-Hyperion Merger Cognos was bought by IBM SPSS was bought by IBM 43

Application 1: Rehab Outcome Data Warehouse Rehabilitation Outcome Database Center for Rehabilitation Service (CRS) – UPMC More than fifty community rehabilitation centers contributed to this database. 547,719 transactions 13 Outcome indicators, 72,541 episodes of treatment, 17,205 patients, 108 therapists, 48 institutions

Multi-dimensional database Fact Table P_id D_id A_id T_id no of patient Demographic D_id gender age N 1 Diagnosis P_id Disease Status 1 N Area A_id Country State City 1 N Time T_id Year Month Week N 1 fact dimension attribute

Star Schema

Output Example: Hierarchy of a dimension: drill-down and roll-up

Power of a visual presentation

Difference in Improvement: Young and Old patients

“radar” display

Application 2: Clinical Research Management 52

53

54

Application 3: Public Health Combining Data Warehouse (OLAP) and GIS OLAP: handles large data, fast retrieval multidimensional, multilevel aggregation, analyses/data mining on huge complex databases GIS: visualization and spatial analyses Visualization and Analysis: Charts and Maps + Statistical Analysis. 55

SOVAT (Spatial OLAP Viz and Analytical Tool)

Linkage of OLAP Cube and spatial data 57 Cube Geography Dimension

Multidimensional database Multidimensional database Functions: Drill-up/Drill-down, Slice/Dice, Pivot

Star Schema

Snowflake schema

Spatial Drill-Up Spatial Drill-Down Spatial Drill-Out

62 Comparison and Border Analysis: “Compare Allegheny County’s cancer incidence rate against it’s bordering counties.”

Ranking and sorting Massive data 67

70

Comparing two arbitrarily defined communities: “Compare the incidence/death rate/procedure related to certain cancer or specific diagnosis between the two metropolitans of Philadelphia and Pittsburgh”

Time Series Example: “Compare Cancer Incidence of Allegheny County to Erie County from ”

Statistical Analysis

Red nodes shows toxic industrial places in Allegheny County

Buffer within 2.5 mile from CLEARWATER INC and the affected municipalities Set the radius here List of affected municipalities Buffer within 2.5 mile

Authentication for accessing iSOVAT

Multidimensional view: cancer incidence in urban & rural areas

Drill-down Washington county