Copyright R. Weber Machine Learning, Data Mining INFO 629 Dr. R. Weber.

Slides:



Advertisements
Similar presentations
Decision Tree Approach in Data Mining
Advertisements

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
 Rosina Weber Knowledge acquisition and machine learning Reading textbook chapters 10 and 20 INFO 629 Dr. R. Weber.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Three kinds of learning
Machine Learning: Symbol-Based
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Classification.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Chapter 5 Data mining : A Closer Look.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Basic Data Mining Techniques
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Inductive learning Simplest form: learn a function from examples
COMP3503 Intro to Inductive Modeling
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Basic Data Mining Technique
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Copyright R. Weber Machine Learning, Data Mining, Genetic Algorithms, Neural Networks ISYS370 Dr. R. Weber.
Decision Trees. Decision trees Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is.
Data Management and Database Technologies 1 DATA MINING Extracting Knowledge From Data Petr Olmer CERN
Classification Techniques: Bayesian Classification
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Mining and Decision Support
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
Oracle Advanced Analytics
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining – Intro.
DATA MINING © Prentice Hall.
Adrian Tuhtan CS157A Section1
Data Mining Lecture 11.
Classification and Prediction
Sangeeta Devadiga CS 157B, Spring 2007
Prepared by: Mahmoud Rafeek Al-Farra
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Presentation transcript:

Copyright R. Weber Machine Learning, Data Mining INFO 629 Dr. R. Weber

Copyright R. Weber The picnic game How did you reason to find the rule? According to Michalski (1983) A theory and methodology of inductive learning. In Machine Learning, chapter 4, “inductive learning is a heuristic search through a space of symbolic descriptions (i.e., generalizations) generated by the application of rules to training instances.”

Copyright R. Weber Learning Rote Learning –Learn multiplication tables Supervised Learning –Examples are used to help a program identify a concept –Examples are typically represented with attribute-value pairs –Notion of supervision originates from guidance from examples Unsupervised Learning –Human efforts at scientific discovery, theory formation

Copyright R. Weber Inductive Learning Learning by generalization Performance of classification tasks –Classification, categorization, clustering Rules indicate categories Goal: –Characterize a concept

Copyright R. Weber Learner uses: –positive examples (instances ARE examples of a concept) and –negative examples (instances ARE NOT examples of a concept) Concept Learning is a Form of Inductive Learning

Copyright R. Weber Needs empirical validation Dense or sparse data determine quality of different methods Concept Learning

Copyright R. Weber The learned concept should be able to correctly classify new instances of the concept –When it succeeds in a real instance of the concept it finds true positives – When it fails in a real instance of the concept it finds false negatives Validation of Concept Learning i

Copyright R. Weber The learned concept should be able to correctly classify new instances of the concept –When it succeeds in a counterexample it finds true negatives –When it fails in a counterexample it finds false positives Validation of Concept Learning ii

Copyright R. Weber Basic classification tasks Classification Categorization Clustering

Copyright R. Weber Categorization

Copyright R. Weber Classification

Copyright R. Weber Clustering

Copyright R. Weber Clustering Data analysis method applied to data Data should naturally possess groupings Goal: group data into clusters Resulting clusters are collections where objects within a cluster are similar to each other Objects outside the cluster are dissimilar to objects inside Objects from one cluster are dissimilar to objects in other clusters Distance measures are used to compute similarity

Copyright R. Weber Rule Learning Learning widely used in data mining Version Space Learning is a search method to learn rules Decision Trees

Copyright R. Weber Version Space i A=1,B=1,C=1  Outcome=1 A=0,B=.5,C=.5  Outcome=0 A=0,B=0,C=.3  Outcome=.5 Creates tree that includes all possible combinations Does not learn for rules with disjunctions (i.e. OR statements) Incremental method, trains additional data without the need to retrain all data

Copyright R. Weber Decision trees Knowledge representation formalism Represent mutually exclusive rules (disjunction) A way of breaking up a data set into classes or categories Classification rules that determine, for each instance with attribute values, whether it belongs to one or another class

Decision trees consist of: - leaf nodes (classes) - decision nodes (tests on attribute values) - from decision nodes branches grow for each possible outcome of the test From Cawsey, 1997

Copyright R. Weber Decision tree induction Goal is to correctly classify all example data Several algorithms to induce decision trees: ID3 (Quinlan 1979), CLS, ACLS, ASSISTANT, IND, C4.5 Constructs decision tree from past data Not incremental Attempts to find the simplest tree (not guaranteed because it is based on heuristics)

Copyright R. Weber From: – a set of target classes –Training data containing objects of more than one class ID3 uses test to refine the training data set into subsets that contain objects of only one class each Choosing the right test is the key ID3 algorithm

Copyright R. Weber Information gain or ‘minimum entropy’ Maximizing information gain corresponds to minimizing entropy Predictive features (good indicators of the outcome) How does ID3 chooses tests

Copyright R. Weber ID3 algorithm No.StudentFirst last year?Male?Works hard?Drinks?First this year? 1Richardyes noyes 2Alanyes noyes 3Alisonno yesnoyes 4Jeffnoyesnoyesno 5Gailyesnoyes 6Simonnoyes no

Copyright R. Weber ID3 algorithm No.StudentFirst last year?Male?Works hard?Drinks?First this year? 1Richardyes noyes 2Alanyes noyes 3Alisonno yesnoyes 4Jeffnoyesnoyesno 5Gailyesnoyes 6Simonnoyes no

Copyright R. Weber ID3 algorithm No.StudentFirst last year?Male?Works hard?Drinks?First this year? 1Richardyes noyes 2Alanyes noyes 3Alisonno yesnoyes 4Jeffnoyesnoyesno 5Gailyesnoyes 6Simonnoyes no

Copyright R. Weber ID3 algorithm No.StudentFirst last year?Male?Works hard?Drinks?First this year? 1Richardyes noyes 2Alanyes noyes 3Alisonno yesnoyes 4Jeffnoyesnoyesno 5Gailyesnoyes 6Simonnoyes no

Copyright R. Weber ID3 algorithm No.StudentFirst last year?Male?Works hard?Drinks?First this year? 1Richardyes noyes 2Alanyes noyes 3Alisonno yesnoyes 4Jeffnoyesnoyesno 5Gailyesnoyes 6Simonnoyes no

Copyright R. Weber ID3 algorithm No.StudentFirst last year?Male?Works hard?Drinks?First this year? 1Richardyes noyes 2Alanyes noyes 3Alisonno yesnoyes 4Jeffnoyesnoyesno 5Gailyesnoyes 6Simonnoyes no

Copyright R. Weber ID3 algorithm No.StudentFirst last year?Male?Works hard?Drinks?First this year? 1Richardyes noyes 2Alanyes noyes 3Alisonno yesnoyes 4Jeffnoyesnoyesno 5Gailyesnoyes 6Simonnoyes no

Copyright R. Weber Explanation-based learning Incorporates domain knowledge into the learning process Feature values are assigned a relevance factor if their values are consistent with domain knowledge Features that are assigned relevance factors are considered in the learning process

Copyright R. Weber Familiar Learning Task Learn relative importance of features Goal: learn individual weights Commonly used in case-based reasoning Methods include a similarity measure to get feedback about verify their relative importance: feedback methods Search methods: gradient descent ID3

Copyright R. Weber Classification using Naive Bayes Naïve Bayes classifier uses two sources of information to classify a new instance –The distribution of the rtaining dataset (prior probability) –The region surrounding the new instance in the dataset (likelihood) Naïve because assumes conditional independence not always applicable It is made to simplify the computation and in this sense considered to be “Naïve”. Conditional independence reduces the requirement for large number of observations Bias in estimating probabilities often may not make a difference in practice - - it is the order of the probabilities, not their exact values, that determine the classifications. Comparable in performance with classification trees and with neural networks Highly accurate and fast when applied to large databases Some links: – –

Copyright R. Weber KDD : definition Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, and potential useful and understandable patterns in data. (R.Feldman,2000) KDD is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data (Fayad, Piatetsky- Shapiro, Smyth 1996 p. 6). Data mining is one of the steps in the KDD process. Text mining concerns applying data mining techniques to unstructured text.

The KDD Process DATA patterns interpretation filtering SELECTED DATA preprocessing PROCESSED DATA transformation Data mining browsing KNOWLEDGE TRANSFORMED DATA

Copyright R. Weber Predictive modeling/risk assessment Database segmentation Data mining tasks i Classification, decision trees Kohonen nets, clustering techniques

Copyright R. Weber Link analysis Deviation detection Data mining tasks ii Rules: Association generation Relationships between entities How things change over time, trends

Copyright R. Weber KDD applications Fraud detection –Telecom (calling cards, cell phones) –Credit cards –Health insurance Loan approval Investment analysis Marketing and sales data analysis Identify potential customers Effectiveness of sales campaign Store layout

Copyright R. Weber Text mining The problem starts with a query and the solution is a set of information (e.g., patterns, connections, profiles, trends) contained in several different texts that are potentially relevant to the initial query.

Copyright R. Weber Text mining applications IBM Text Navigator –Cluster documents by content; –Each document is annotated by the 2 most frequently used words in the cluster; Concept Extraction (Los Alamos) –Text analysis of medical records; –Uses a clustering approach based on trigram representation; –Documents in vectors, cosine for comparison;