CSE 8392 Spring 199951 CSE 8392 SPRING 1999 DATA MINING: CORE TOPICS Classification Professor Margaret H. Dunham Department of Computer Science and Engineering.

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

CHAPTER 9: Decision Trees
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Decision Tree Approach in Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Data Mining Classification: Alternative Techniques
Supervised Learning Recap
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Machine Learning Neural Networks
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Data Mining Techniques Outline
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Lecture 5 (Classification with Decision Trees)
Three kinds of learning
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Classification.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Chapter 5 Data mining : A Closer Look.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Introduction to machine learning
Learning Chapter 18 and Parts of Chapter 20
Bayesian Decision Theory Making Decisions Under uncertainty 1.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Inductive learning Simplest form: learn a function from examples
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Chapter 9 Neural Network.
Chapter 9 – Classification and Regression Trees
NEURAL NETWORKS FOR DATA MINING
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
K Nearest Neighbors Classifier & Decision Trees
1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Classification Techniques: Bayesian Classification
Copyright R. Weber Machine Learning, Data Mining INFO 629 Dr. R. Weber.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Lecture Notes for Chapter 4 Introduction to Data Mining
Data Mining and Decision Support
Classification Today: Basic Problem Decision Trees.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Part II - Classification© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II - Classification Margaret H. Dunham Department of Computer.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
k-Nearest neighbors and decision tree
Data Mining Lecture 11.
Classification and Prediction
DATA MINING Introductory and Advanced Topics Part II - Clustering
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Text Categorization Berlin Chen 2003 Reference:
Avoid Overfitting in Classification
A task of induction to find patterns
A task of induction to find patterns
Presentation transcript:

CSE 8392 Spring CSE 8392 SPRING 1999 DATA MINING: CORE TOPICS Classification Professor Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas (214) fax: (214) www: January 1999

CSE 8392 Spring “Classify a set of data based on their values in certain attributes” (R[2],p868) Each grouping is a class Equivalence Class Given a set K = {k 1, k 2, k 3 …k y } k x -> (A, B, C, D, E…) where this mapping partitions K Similar to estimation Classification

CSE 8392 Spring Classification Examples Financial market trends (bull, bear) Images (raster, vector) Loan approval (yes, no) Medical diagnosis Detecting faults in industry applications

CSE 8392 Spring Basic Classification Techniques (Kennedy,Ch3) Boundaries for decision regions –Ex: Loan threshold Probability Techniques –p(x|C) –Figure 3-4, p 3-5, Kennedy –May also use prior knowledge of class membership probabilities P(C) –Select class that maximizes P(C)p(x|C) –Probability x is in class C is proportional to the probability that any input is in C and class C contains x

CSE 8392 Spring Supervised Induction Techniques Output the probability of class membership based on input values using some estimation technique (Decision Trees, Neural Nets) –Use a sample database as a training set –Analyze training data –Develop model using attributes of data –Use these class descriptions on rest of data Note: May be many different class descriptions

CSE 8392 Spring Posterior Probabilities (Kennedy) P(C|x) - Probability input x belongs to class C Suppose m classes, look at: P(C 1 |x), …, P(C m |x) Classification by assigning x to the class with the highest posterior probability Look at training data and assign posterior probabilities to example patterns May not work well with complex tuples in large databases Fig 3-7, p3-9

CSE 8392 Spring Linear Regression Linear mapping of input attributes to desired output Error may exist: Here x i are input attributes Least-Squares Minimization: Sum of the squared error terms is minimized over database (training set) Find weights: EQ3,EQ4 Kennedy, p May be used as baseline comparison approach May not work well with complex databases : Not all data values known & May not be numeric values

CSE 8392 Spring Similarity Measures Describe each input tuple as vector D 1 = Define Sim(D 1,D 2 ) where: –Normalize (0-no similarity, 1- identical) –Usually assumes all values are numeric only Represent each class with vector C i –May be determined as centroid of vectors from training set Assign each tuple D j to class i where Sim(D j,C i ) is minimized

CSE 8392 Spring K Nearest Neighbors (Kennedy) Store all input-output pairs in training set Define distance function When a tuple needs to be classified, determine distance between it and all items in the training set Fig 10-12, p10-38 Base answer on K nearest items in training set. Algorithm p Memory intensive & Slow

CSE 8392 Spring Decision Trees (Kennedy, Section ) Similar to Twenty Questions: Fig 8-5, p144, Barquin Internal Nodes: Decision Points based on one attribute Leaves: Identify classes Classification Process: Input tuple and move through tree based on attribute values Difficult Part - Constructing tree (so that it is efficient) (Try playing twenty questions with a young child!) Training Set used to build tree

CSE 8392 Spring Decision Tree Issues Attributes Splits (Categorical, Discrete, Continuous) Ordering of attributes in tree Determining when to stop Must perform perfectly on training set Pruning of tree (remove branches)

CSE 8392 Spring Decision Tree Advantages/Disadvantages Advantages –Easy to understand –Efficient (time) Disadvantages –May be difficult to use with continuous data –Limited to problems solved by dividing into subrectangles –Not flexible (no automatic revisions if incorrect) –No way to handle missing data –May have overfitting –Pruning combats overfitting (but it may induce other errors)

CSE 8392 Spring ID-3 (R[2]) Decision Tree learning system based on information theory Attempts to minimize expected number of tests on a tuple Formalizes the approach adults have to twenty questions! Picks attributes with highest information gain first Entropy

CSE 8392 Spring CART (Kennedy,p10-56) Builds binary decision tree Exhaustive search to determine best tree where best defined by goodness of split : Optimal splitting:

CSE 8392 Spring Neural Networks Determine “predictions using ‘neurons’ (computations) and their interconnections (weighted inputs).” (p142,Barquin) Example: Fig 8-4, p143, Barquin Input values of attributes at left Weight associated with links between nodes Classification produced at output on right Neural Net and Decision Tree Peter Cabena, Pablo Hadjinian, Rolf Stadler, Jaap Verhees, and Alessandro Zanasi, Discovering Data Mining From Concept to Implementation, Prentice-Hall, 1998, p71,p74.

CSE 8392 Spring Neural Nets Number of processing layers between input and output Each processing unit (node) connected to all in the next layer Construct neural net: Determine network structure (modeler) Weights “learned” by applying tree to training set Backpropagation used to adjust weights Desired output provided with training data Actual network output subtracted from desired output and error produced Connection weights changed based on a minimization method called gradient descent

CSE 8392 Spring Historical Note: Neural network weights are adjusted based on whether or not the prediction is good. Earlier IR systems uses “feedback” to adjust weights in document vectors based on precision/recall values

CSE 8392 Spring Advantages –Low classification error rates –Robust in noisy environments –Rules may be more concise if strong relationships in attributes –Provides high degree of accuracy Disadvantages –Multiple passes over database - Very expensive –Classification process not apparent - “Black Box” Embedded within graph structure and link weights May be difficult to generate rules Difficult to infuse domain knowledge in neural net –May have overfitting –May fail to converge Neural Nets Advantages/Disadvantages

CSE 8392 Spring Cluster Enumerate ruleset for network outputs Enumerate ruleset for network inputs Merge input and output rulesets Algorithm p958 Neural Net Mining Example –Lu, section 2.2 Rule Extraction (RX) Algorithm (R[4])

CSE 8392 Spring Network training time Possible methods for incremental training of network Use of domain experts Reduction of inputs to network Neural Net Data Mining Research Issues

CSE 8392 Spring In Bayesian statistics, we measure the likelihood of observed data y given each value of x, e.g. f ( y | x ) Data mining goal: find the most probable set of class descriptions NASA AutoClass(Fayyad,Ch6) –Discover automatic classifications in data –Like clustering –No prior definition of classes –Classes may be unknown to experts Bayesian Classification (Fayyad,Ch6)

CSE 8392 Spring AutoClass Records represented as vectors of values pdf: Gives “probability of observing an instance possessing any particular attribute value vector.” (p156,Cheeseman) Model: finite mixture distribution Interclass Mixture Probability P = (Xi  Cj | V c, T c, S, I) Two levels of search: maximum posterior parameter (MAP), most probable density function Each model a product of independent probability distributions of attribute subsets

CSE 8392 Spring AutoClass Case Studies IRAS: Spectra Classification –“Minimum” information vs. full access –“Good” outputs are domain specific DNA Codes –Results may extend beyond limits of database LandSat Pixels –Parallelization –Undocumented preprocessing

CSE 8392 Spring Interaction between domain experts and machine –Essential for good results –Experts and machine each have unique strengths –Iterative process Ockham Factor –If a particular parameter does not noticeably improve model, reject models with this parameter AutoClass Issues

CSE 8392 Spring Classification Summary Watch out for preprocessing: key words: –Calibration –Corrected –Averaged –Normalized –Adjusted –Compressed –Subsets –Partitioned –Representative Uncover biases in data collection Engage in full partnership with experts and obtain domain specific results!