CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:

Slides:



Advertisements
Similar presentations
An Introduction to Data Mining
Advertisements

Introduction to KDD.
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Overview of Data Mining & The Knowledge Discovery Process Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Data Mining: Concepts and Techniques
Dr. Tahar Kechadi Dr. Joe Carthy
Data Mining By Archana Ketkar.
July 13, 2015ICS426: Introduction1 DATA WAREHOUSING AND DATA MINING.
Data Mining – Intro.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Data Warehousing 資料倉儲 Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University Dept. of Information ManagementTamkang.
Data Mining.
Business Intelligence
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
Data Mining Chapter 26.
10 Data Mining. What is Data Mining? “Data Mining is the process of selecting, exploring and modeling large amounts of data to uncover previously unknown.
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Chapter 1. Introduction Motivation: Why data mining?
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining: Introduction. Why Data Mining? l The Explosive Growth of Data: from terabytes to petabytes –Data collection and data availability  Automated.
Data Mining: Concepts and Techniques
Data Mining Techniques As Tools for Analysis of Customer Behavior Lecture 2:
Data Warehousing/Mining 1 Data Warehousing/Mining Comp 150 DW Chapter 1. Introduction Instructor: Dan Hebert.
Chapter 1 Introduction to Data Mining
Christoph F. Eick: Introduction Knowledge Discovery and Data Mining (KDD) 1 Knowledge Discovery in Data [and Data Mining] (KDD) Let us find something interesting!
2015年10月18日星期日 2015年10月18日星期日 2015年10月18日星期日 Introduction to Data Mining 1 Chapter 1 Introduction to Data Mining Chen. Chun-Hsien Department of Information.
October 18, 2015 Data Mining: Concepts and Techniques 1 DATA MINING Motivation: Why data mining? What is data mining? Data Mining: On what kind of data?
1 Introduction to Data Mining and Data Warehousing Muhammad Ali Yousuf DSC – ITM Friday, 9 th May 2003 Based on ©Jiawei Han and Micheline Kamber Intelligent.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Introduction to Data-Mining Marko Grobelnik Institut Jozef Stefan.
Data Mining: Concepts and Techniques. Overview 1.Introduction 2.Data Preprocessing 3.Data Warehouse and OLAP Technology: An Introduction 4.Advanced Data.
1 Knowledge Discovery from DataBases (KDD) A.K.A. Data Mining & by other names as well Carlo Zaniolo UCLA CS Dept.
January 8, 2016Data Mining: Concepts and Techniques1 Data Mining: Trends and Applications.
Conclusions. Why Data Mining? -- Potential Applications Database analysis and decision support – Market analysis and management target marketing, customer.
Academic Year 2014 Spring Academic Year 2014 Spring.
February 13, 2016 Data Mining: Concepts and Techniques 1 1 Data Mining: Concepts and Techniques These slides have been adapted from Han, J., Kamber, M.,
Business Intelligence Introduction & Overview. 2 of 25 Examples: Telecommunications Huge amount of data is collected daily: –Transactional data (about.
Waqas Haider Bangyal. 2 Source Materials “ Data Mining: Concepts and Techniques” by Jiawei Han & Micheline Kamber, Second Edition, Morgan Kaufmann, 2006.
DATA MINING It is a process of extracting interesting(non trivial, implicit, previously, unknown and useful ) information from any data repository. The.
Data Warehousing/Mining 1. 2 Chapter 1. Introduction v Motivation: Why data mining? v What is data mining? v Data Mining: On what kind of data? v Data.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
2016年6月12日星期日 2016年6月12日星期日 2016年6月12日星期日 Introduction to Data Mining 1 Chapter 1 Introduction to Data Mining Chen. Chun-Hsien Department of Information.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Lecture-2 Bscshelp.com.  Why Data Mining and What Kinds of Data Can Be Mined?  Potential Applications 2.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
There is an inherent meaning in everything. “Signs for people who can see.”
Data Mining.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 1 —
Data Mining – Intro.
Data Mining Motivation: “Necessity is the Mother of Invention”
Introduction to Data Mining
Introduction C.Eng 714 Spring 2010.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques Course Outline
כריית מידע -- מבוא ד"ר אבי רוזנפלד.
Data Warehousing and Data Mining
Data Mining Introduction
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining Concepts and Techniques
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining: Concepts and Techniques
Data Mining.
Data Mining: Concepts and Techniques
Presentation transcript:

CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:

Introduction to Data Mining and Data Warehousing

Data Mining and Data Warehousing Agenda  What is Data Mining?  What is Data Warehousing?  The source of invention of Data Mining and Data Warehousing.  Drowning in Data Starving for Knowledge.  Evolution of Database Technology to the current state. (Home Work)

What Is Data Mining? Data mining (knowledge discovery from data)  Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data  Data mining: a misnomer? Should have been named “knowledge mining from data” which is too long or “knowledge mining” not reflecting the emphasis on mining from huge data

What Is Data Mining? Many people treat data mining as a synonym for another popularly used term Knowledge Discovery from Data/Databases (KDD). KDD as the process is depicted below:

The KDD Process Cleaning & Integration Evaluation & Presentation Data Warehouse Databases Selection & Transformation Data Mining Knowledge

KDD Process 1) Data cleaning  To move noise and inconsistent data 2) Data integration  Where multiple data sources may be combined 3) Data selection  Where data relevant to the analysis task are retrieved from the database.

KDD Process 4) Data transformation  Where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations, for instance. 5) Data mining  An essential process where intelligent methods are applied in order to extract data pattern.

KDD Process 6) Pattern evaluation.  To identify the truly interesting pattern representing knowledge. 7) Knowledge presentation  Where visualization and knowledge representation techniques are used to present the mined knowledge to the users. 8) Use of discovered knowledge

Data Mining: On What Kinds Of Data? Relational database Data warehouse Transactional database Advanced database and information repository  Spatial and temporal data  Stream data  Multimedia database  Text databases & WWW

Data Mining Functionalities Association (correlation and causality)  Cheese & Bread Classification and Prediction  Construct models that describe and distinguish classes or concepts for future prediction  Predict some unknown or missing numerical values

Data Mining Functionalities (cont…) Cluster analysis  Class label is unknown: Group data to form new classes, e.g., cluster houses to find distribution patterns Outlier analysis  Outlier: a data object that does not comply with the general behavior of the data  Noise or exception? No! useful in fraud detection and rare event analysis

Necessity Is The Mother Of Invention Data explosion problem  Automated data collection tools and mature database technology lead to huge amounts of data accumulated We are drowning in data, but starving for knowledge! Solution: Data warehousing and data mining  Data warehousing and on-line analytical processing  Mining interesting knowledge (rules, regularities, patterns, constraints) from data in large databases

Evolution Of Database Technology 1960s:  Data collection, database creation, IMS and network DBMS 1970s:  Relational data model, relational DBMS implementation 1980s:  RDBMS, advanced data models (extended- relational, OO, deductive, etc.)  Application-oriented DBMS (spatial, scientific, engineering, etc.)

Evolution Of Database Technology 1990s:  Data mining, data warehousing, multimedia databases, and Web databases 2000s  Stream data management and mining  Data mining with a variety of applications  Web technology and global information systems

Potential Applications Data analysis and decision support  Market analysis and management  Risk analysis and management  Fraud detection and detection of unusual patterns Other applications  Text mining ( , documents) and Web mining  Stream data mining  DNA and bio-data analysis

Fraud Detection & Mining Unusual Patterns Applications: Health care, retail, credit card service, telecommunications  Auto insurance: ring of collisions  Money laundering: suspicious monetary transactions  Medical insurance Professional patients, ring of doctors, and ring of references Unnecessary or correlated screening tests  Telecommunications: phone-call fraud Phone call model: destination of the call, duration, time of day or week. Analyze patterns that deviate from an expected norm  Retail industry Analysts estimate that 38% of retail shrink is due to dishonest employees  Anti-terrorism Approaches: Clustering, model construction, outlier analysis, etc.

Other Applications Sports  IBM Advanced Scout analyzed NBA game statistics (shots blocked, assists, and fouls) to gain competitive advantage for New York Knicks and Miami Heat Internet Web Surf-Aid  IBM Surf-Aid applies data mining algorithms to Web access logs for market-related pages to discover customer preference and behavior to help analyzing effectiveness of Web marketing, improving Web site organization, etc.

What is Data Warehouse? Defined in many different ways, but not rigorously  A decision support database that is maintained separately from the organization’s operational database  Support information processing by providing a solid platform of consolidated, historical data for analysis “A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process” —Bill Inmon

The source of Invention of DW and Data Mining Data explosion problem  Automated data collection tools and mature database technology lead to huge amounts of data accumulated We are drowning in data, but starving for knowledge! Solution: Data warehousing and data mining  Data warehousing and on-line analytical processing  Mining interesting knowledge (rules, regularities, patterns, constraints) from data in large databases

Drowning In Data, Starving For Knowledge DATA KNOWLEDGE

Importance of Data Mining By performing data mining, interesting knowledge, regularities, or high-level information can be extracted from databases and viewed or browsed from different angles. The discovered knowledge can be applied to decision making process.