Data warehousing and mining. 2 Introduction Organizations getting larger and amassing ever increasing amounts of data Historic data encodes useful information.

Slides:



Advertisements
Similar presentations
Chapter 13 The Data Warehouse
Advertisements

Supporting End-User Access
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
April 30, Data Warehousing and OLAP Technology: An Overview  What is a data warehouse?  Data warehouse architecture  From data warehousing to.
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Chapter 9 Business Intelligence Systems
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Introduction to Data Warehousing. From DBMS to Decision Support DBMSs widely used to maintain transactional data Attempts to use of these data for analysis,
Data Sources Data Warehouse Analysis Results Data visualisation Analytical tools OLAP Data Mining Overview of Business Intelligence Data visualisation.
Data Mining By Archana Ketkar.
Chapter 14 The Second Component: The Database.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Understanding Analysis Services Architecture. Microsoft Data Warehousing Overview OLTP Source DTS DW Storage Analysis Services Clients OLE DB for OLAP,
Data warehousing and mining Session VII (Part 1) 15: :10 Sunita Sarawagi School of IT, IIT Bombay.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Data Mining Techniques
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Understanding Data Analytics and Data Mining Introduction.
 First two parts of class ◦ Part 1: What is business intelligence and why should organizations consider incorporating more technology-related intelligence.
Database Systems – Data Warehousing
M icrosoft Data Warehousing - SQL Server State of the Technology Presentation by Sujata Angara Nakul Johri Sang Ho Park.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Chapter 9 Business Intelligence and Information Systems for Decision Making.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Ahsan Abdullah 1 Data Warehousing Lecture-11 Multidimensional OLAP (MOLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for.
Datawarehouse Objectives
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Chapter 11 Business Intelligence Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall 11-1.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Ahsan Abdullah 1 Data Warehousing Lecture-10 Online Analytical Processing (OLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Technologies of the future S. Sudarshan Dept. of Computer Science & Engg. IIT Bombay.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Advanced Database Concepts
Academic Year 2014 Spring Academic Year 2014 Spring.
An Overview of Data Warehousing and OLAP Technology
© 2009 Pearson Education, Inc. Publishing as Prentice Hall 1 Lecture 14: Data Warehousing Modern Database Management 9 th Edition Jeffrey A. Hoffer, Mary.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
The Concepts of Business Intelligence Microsoft® Business Intelligence Solutions.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Chapter 20 Data Warehousing and Mining 1 st Semester, 2016 Sanghyun Park.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
1  S. Matwin, 2002 Data Mining What is data mining? Motivating example Why now? Technological foundations Tasks Architectures and processes data warehouse,
Data Mining.
Data Mining – Intro.
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Analysis.
Data Warehousing and Data Mining
I don’t need a title slide for a lecture
Supporting End-User Access
Data Warehouse.
Presentation transcript:

Data warehousing and mining

2 Introduction Organizations getting larger and amassing ever increasing amounts of data Historic data encodes useful information about working of an organization. However, data scattered across multiple sources, in multiple formats. Data warehousing: process of consolidating data in a centralized location Data mining: process of analyzing data to find useful patterns and relationships

3 Typical data analysis tasks Report the per-capita deposits broken down by region and profession. Are deposits from rural coastal areas increasing over last five years? What percent of small business loans were cleared? Why is it less than last year’s? How did similar businesses that did not take loans perform? What should be the new rules for loan eligibility?

4 Bombay branchDelhi branchCalcutta branchCensus data Operational data Detailed transactional data Data warehouse Merge Clean Summarize Direct Query Reporting tools Mining tools OLAP Decision support tools Oracle SAS Relational DBMS+ e.g. Redbrick IMS Crystal reports Essbase Intelligent Miner GIS data

5 Data warehouse construction Heterogeneous data integration –merge from various sources, fuzzy matches –remove inconsistencies Data cleaning: –missing data, outliers, clean fields e.g. names/addresses –Data mining techniques Data loading: summarize, create indices Products: Prism warehouse manager, Platinum info refiner, info pump, QDB, Vality

6 Warehouse maintenance Data refresh –when to refresh, what form to send updates? Materialized view maintenance with batch updates. Query evaluation using materialized views Monitoring and reporting tools –HP intelligent warehouse advisor

7 Bombay branchDelhi branchCalcutta branchCensus data Operational data Detailed transactional data Data warehouse Merge Clean Summarize Direct Query Reporting tools Mining tools OLAP Decision support tools Oracle SAS Relational DBMS+ e.g. Redbrick IMS Crystal reports Essbase Intelligent Miner GIS data

8 OLAP Fast, interactive answers to large aggregate queries. Multidimensional model: dimensions with hierarchies –Dim 1: Bank location: branch-->city-->state –Dim 2: Customer: sub profession --> profession –Dim 3: Time: month --> quarter --> year Measures: loan amount, #transactions, balance

9 OLAP Navigational operators: Pivot, drill-down, roll-up, select. Hypothesis driven search: E.g. factors affecting defaulters –view defaulting rate on age aggregated over other dimensions –for particular age segment detail along profession Need interactive response to aggregate queries..

10 OLAP products About 30 OLAP vendors Dominant ones: –Oracle Express: largest market share: 20% –Arbor Essbase: technology leader –Microsoft Plato: introduced late last year, rapidly taking over...

11 Microsoft OLAP strategy Plato: OLAP server: powerful, integrating various operational sources OLE-DB for OLAP: emerging industry standard based on MDX --> extension of SQL for OLAP Pivot-table services: integrate with Office 2000 –Every desktop will have OLAP capability. Client side caching and calculations Partitioned and virtual cube Hybrid relational and multidimensional storage

12 Data mining Process of semi-automatically analyzing large databases to find interesting and useful patterns Overlaps with machine learning, statistics, artificial intelligence and databases but –more scalable in number of features and instances –more automated to handle heterogeneous data

13 Some basic operations Predictive: –Regression –Classification Descriptive: –Clustering / similarity matching –Association rules and variants –Deviation detection

14 Classification Given old data about customers and payments, predict new applicant’s loan eligibility. Age Salary Profession Location Customer type Previous customersClassifierDecision rules Salary > 5 L Prof. = Exec New applicant’s data Good/ bad

15 Classification methods Nearest neighbor Regression: (linear or any polynomial) –a*salary + b*age + c = eligibility score. Decision tree classifier Probabilistic/generative models Neural networks

16 Clustering Unsupervised learning when old data with class labels not available e.g. when introducing a new product. Group/cluster existing customers based on time series of payment history such that similar customers in same cluster. Key requirement: Need a good measure of similarity between instances. Identify micro-markets and develop policies for each

17 Association rules Given set T of groups of items Example: set of item sets purchased Goal: find all rules on itemsets of the form a-->b such that – support of a and b > user threshold s –conditional probability (confidence) of b given a > user threshold c Example: Milk --> bread Purchase of product A --> service B Milk, cereal Tea, milk Tea, rice, bread cereal T

18 Mining market Around 20 to 30 mining tool vendors Major players: –Clementine, –IBM’s Intelligent Miner, –SGI’s MineSet, –SAS’s Enterprise Miner. All pretty much the same set of tools Many embedded products: fraud detection, electronic commerce applications

19 Conclusions The value of warehousing and mining in effective decision making based on concrete evidence from old data Challenges of heterogeneity and scale in warehouse construction and maintenance Grades of data analysis tools: straight querying, reporting tools, multidimensional analysis and mining.