GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.

Slides:



Advertisements
Similar presentations
Data Mining Tools Overview Business Intelligence for Managers.
Advertisements

Clustering Basic Concepts and Algorithms
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
Week 9 Data Mining System (Knowledge Data Discovery)
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
SLIDE 1IS 257 – Fall 2008 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
Data Mining By Archana Ketkar.
Introduction. 1.Data Mining and Knowledge Discovery 2.Data Mining Methods 3.Supervised Learning 4.Unsupervised Learning 5.Other Learning Paradigms 6.Introduction.
Data Mining – Intro.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
Enterprise systems infrastructure and architecture DT211 4
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
CSCI 347 / CS 4206: Data Mining Module 01: Introduction Topic 03: Stages in Data Mining.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining Chun-Hung Chou
Tang: Introduction to Data Mining (with modification by Ch. Eick) I: Introduction to Data Mining A.Short Preview 1.Initial Definition of Data Mining 2.Motivation.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
COMP3503 Intro to Inductive Modeling
Chapter 1 Introduction to Data Mining
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Data Mining – A First View Roiger & Geatz. Definition Data mining is the process of employing one or more computer learning techniques to automatically.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
HW#2: A Strategy for Mining Association Rules Continuously in POS Scanner Data.
Data MINING Data mining is the process of extracting previously unknown, valid and actionable information from large data and then using the information.
Data Mining By Dave Maung.
Principles of Data Mining. Introduction: Topics 1. Introduction to Data Mining 2. Nature of Data Sets 3. Types of Structure Models and Patterns 4. Data.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
Copyright R. Weber Machine Learning, Data Mining INFO 629 Dr. R. Weber.
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
DATA MINING By Cecilia Parng CS 157B.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
MIS2502: Data Analytics Advanced Analytics - Introduction.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Mining and Decision Support
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
CENG 770. Data mining (knowledge discovery from data) – Extraction of interesting ( non-trivial, implicit, previously unknown and potentially useful)
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
Data Mining Functionalities
Data Mining.
MIS2502: Data Analytics Advanced Analytics - Introduction
MIS 451 Building Business Intelligence Systems
Adrian Tuhtan CS157A Section1
Data Mining: Concepts and Techniques Course Outline
Prepared by: Mahmoud Rafeek Al-Farra
Data Warehousing Data Mining Privacy
CSE591: Data Mining by H. Liu
Presentation transcript:

GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland

Data Mining in a Nutshell Knowledge discovery in databases (KDD) was initially defined as the ‘non-trivial extraction of implicit, previously unknown, and potentially useful information from data’ [Frawley, Piatetsky- Shapiro, Matheus, 1991]. A revised version of this definition states that ‘KDD is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data’ [Fayyad, Piatetsky-Shapiro, Smyth, 1996]. According to this definition, data mining is a step in the KDD process concerned with applying computational techniques (i.e., data mining algorithms implemented as computer programs) to actually find patters in the data. In a sense, data mining is the central step in the KDD process. The other steps in the KDD process are concerned with preparing data for data mining, as well as evaluating the discovered patterns, the results of data mining. I Data. The input to a data mining algorithm is most commonly a single flat table comprising a number of fields (columns) and records (rows). In general, each row represents an object and columns represent properties of objects. II Typical data mining tasks. - Classification and regression; the task is to predict the value of one field from other fields. If the class is continuous, the task is called regression. If the class is discrete the task is called classification. - Clustering is concerned with grouping objects into classes of similar objects. A cluster is a collection of objects that are similar to each other and are dissimilar to objects in other clusters. - Association analysis is the discovery of association rules. Association rules specify correlation between frequent item sets. - Data characterisation sums up the general characteristics or features of the target class of data: this class is typically collected by a database query.

- Outlier detection is concerned with finding data objects that do not fit the general behaviour or model of the data: these are called outliers. - Evaluation analysis describes and models regularities or trends whose behaviour changes over time. III Outputs of data mining procedures can be - Equations e.g. TotalSpent = x Age [€] - Decision trees, e.g. Income  € > € Yes Age  58 > 58 Yes No - Predictive rules of a form IF Conjunction of conditions THEN Conclusion, e.g. IF income is  € and Gender = Male THEN not a Big Spender - Association rules e.g. {Gender = ‘Female’, Age = ‘>52’}  {Big Spender = ‘Yes’} - Distance and similarity measures e.g. - Probabilistic models e.g. Bayesian networks (For more details see Saso Dzeroski’s Relational Data Mining) Our aim is to study in details a particular data mining method called GUHA and it’s computer implementation called LISp Miner. This approach is essentially as association analysis, however, classification, clustering and outlier detection tasks can be carried out by this method.