Data Mining Techniques Outline

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Unsupervised Learning
Outline input analysis input analyzer of ARENA parameter estimation
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Visual Recognition Tutorial
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
DATA MINING Introductory and Advanced Topics Part I
Data Mining CS 341, Spring 2007 Genetic Algorithm.
Statistical Methods Chichang Jou Tamkang University.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Basic Data Mining Techniques
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
CIS 674 Introduction to Data Mining
Chapter 5 Data mining : A Closer Look.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Data Mining Techniques
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
1 Persian Part Of Speech Tagging Mostafa Keikha Database Research Group (DBRG) ECE Department, University of Tehran.
1 DATA MINING Source : Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides for the text by.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.
A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.
Chapter 9 – Classification and Regression Trees
Zorica Stanimirović Faculty of Mathematics, University of Belgrade
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
© Prentice Hall1 CIS 674 Introduction to Data Mining Srinivasan Parthasarathy Office Hours: TTH 4:30-5:25PM DL693.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part I Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
 Based on observed functioning of human brain.  (Artificial Neural Networks (ANN)  Our view of neural networks is very simplistic.  We view a neural.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
Lecture 2: Statistical learning primer for biologists
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides for the text by Dr. M.H.Dunham, Data Mining,
Machine Learning 5. Parametric Methods.
1 DATA MINING Introductory and Advanced Topics Part I References from Dunham.
Genetic Algorithm Dr. Md. Al-amin Bhuiyan Professor, Dept. of CSE Jahangirnagar University.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Part II - Classification© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II - Classification Margaret H. Dunham Department of Computer.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Evolutionary Design of the Closed Loop Control on the Basis of NN-ANARX Model Using Genetic Algoritm.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Applied statistics Usman Roshan.
Chapter 7. Classification and Prediction
DATA MINING © Prentice Hall.
DATA MINING CSE 8331 Spring 2002 Part I
Classification of unlabeled data:
Data Mining Lecture 11.
DATA MINING Introductory and Advanced Topics Part I
DATA MINING Introductory and Advanced Topics Part I
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
DATA MINING Introductory and Advanced Topics Part I
Statistics II: An Overview of Statistics
DATA MINING Introductory and Advanced Topics Part I
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
DATA MINING Source : Margaret H. Dunham
Presentation transcript:

Data Mining Techniques Outline Goal: Provide an overview of basic data mining techniques Statistical Point Estimation Models Based on Summarization Bayes Theorem Hypothesis Testing Regression and Correlation Similarity Measures Decision Trees Neural Networks Activation Functions Genetic Algorithms

Point Estimation Point Estimate: estimate a population parameter. May be made by calculating the parameter for a sample. May be used to predict value for missing data. Ex: R contains 100 employees 99 have salary information Mean salary of these is $50,000 Use $50,000 as value of remaining employee’s salary. Is this a good idea?

Estimation Error Bias: Difference between expected value and actual value. Mean Squared Error (MSE): expected value of the squared difference between the estimate and the actual value: Why square? Root Mean Square Error (RMSE)

Jackknife Estimate Jackknife Estimate: estimate of parameter is obtained by omitting one value from the set of observed values. Ex: estimate of mean for X={x1, … , xn}

Maximum Likelihood Estimate (MLE) Obtain parameter estimates that maximize the probability that the sample data occurs for the specific model. Joint probability for observing the sample data by multiplying the individual probabilities. Likelihood function: Maximize L.

MLE Example Coin toss five times: {H,H,H,H,T} Assuming a perfect coin with H and T equally likely, the likelihood of this sequence is: However if the probability of a H is 0.8 then:

MLE Example (cont’d) General likelihood formula: Estimate for p is then 4/5 = 0.8

Expectation-Maximization (EM) Solves estimation with incomplete data. Obtain initial estimates for parameters. Iteratively use estimates for missing data and continue until convergence.

EM Example

EM Algorithm

Models Based on Summarization Visualization: Frequency distribution, mean, variance, median, mode, etc. Box Plot:

Scatter Diagram

Bayes Theorem Posterior Probability: P(h1|xi) Prior Probability: P(h1) Assign probabilities of hypotheses given a data value.

Bayes Theorem Example Credit authorizations (hypotheses): h1=authorize purchase, h2 = authorize after further identification, h3=do not authorize, h4= do not authorize but contact police Assign twelve data values for all combinations of credit and income: From training data: P(h1) = 60%; P(h2)=20%; P(h3)=10%; P(h4)=10%.

Bayes Example(cont’d) Training Data:

Bayes Example(cont’d) Calculate P(xi|hj) and P(xi) Ex: P(x7|h1)=2/6; P(x4|h1)=1/6; P(x2|h1)=2/6; P(x8|h1)=1/6; P(xi|h1)=0 for all other xi. Predict the class for x4: Calculate P(hj|x4) for all hj. Place x4 in class with largest value. Ex: P(h1|x4)=(P(x4|h1)(P(h1))/P(x4) =(1/6)(0.6)/0.1=1. x4 in class h1.

Hypothesis Testing Find model to explain behavior by creating and then testing a hypothesis about the data. Exact opposite of usual DM approach. H0 – Null hypothesis; Hypothesis to be tested. H1 – Alternative hypothesis

Chi Squared Statistic O – observed value E – Expected value based on hypothesis. Ex: O={50,93,67,78,87} E=75 c2=15.55 and therefore significant

Regression Predict future values based on past values Linear Regression assumes linear relationship exists. y = c0 + c1 x1 + … + cn xn Find values to best fit the data

Linear Regression

Correlation Examine the degree to which the values for two variables behave similarly. Correlation coefficient r: 1 = perfect correlation -1 = perfect but opposite correlation 0 = no correlation

Similarity Measures Determine similarity between two objects. Similarity characteristics: Alternatively, distance measure measure how unlike or dissimilar objects are.

Similarity Measures

Distance Measures Measure dissimilarity between objects

Twenty Questions Game

Decision Trees Decision Tree (DT): Tree where the root and each internal node is labeled with a question. The arcs represent each possible answer to the associated question. Each leaf node represents a prediction of a solution to the problem. Popular technique for classification; Leaf node indicates class to which the corresponding tuple belongs.

Decision Tree Example

Decision Trees A Decision Tree Model is a computational model consisting of three parts: Decision Tree Algorithm to create the tree Algorithm that applies the tree to data Creation of the tree is the most difficult part. Processing is basically a search similar to that in a binary search tree (although DT may not be binary).

Decision Tree Algorithm

DT Advantages/Disadvantages Easy to understand. Easy to generate rules Disadvantages: May suffer from overfitting. Classifies by rectangular partitioning. Does not easily handle nonnumeric data. Can be quite large – pruning is necessary.

Neural Networks Based on observed functioning of human brain. (Artificial Neural Networks (ANN) Our view of neural networks is very simplistic. We view a neural network (NN) from a graphical viewpoint. Alternatively, a NN may be viewed from the perspective of matrices. Used in pattern recognition, speech recognition, computer vision, and classification.

Neural Networks Neural Network (NN) is a directed graph F=<V,A> with vertices V={1,2,…,n} and arcs A={<i,j>|1<=i,j<=n}, with the following restrictions: V is partitioned into a set of input nodes, VI, hidden nodes, VH, and output nodes, VO. The vertices are also partitioned into layers Any arc <i,j> must have node i in layer h-1 and node j in layer h. Arc <i,j> is labeled with a numeric value wij. Node i is labeled with a function fi.

Neural Network Example

NN Node

NN Activation Functions Functions associated with nodes in graph. Output may be in range [-1,1] or [0,1]

NN Activation Functions

NN Learning Propagate input values through graph. Compare output to desired output. Adjust weights in graph accordingly.

Neural Networks A Neural Network Model is a computational model consisting of three parts: Neural Network graph Learning algorithm that indicates how learning takes place. Recall techniques that determine hew information is obtained from the network. We will look at propagation as the recall technique.

NN Advantages Learning Can continue learning even after training set has been applied. Easy parallelization Solves many problems

NN Disadvantages Difficult to understand May suffer from overfitting Structure of graph must be determined a priori. Input values must be numeric. Verification difficult.

Genetic Algorithms Optimization search type algorithms. Creates an initial feasible solution and iteratively creates new “better” solutions. Based on human evolution and survival of the fittest. Must represent a solution as an individual. Individual: string I=I1,I2,…,In where Ij is in given alphabet A. Each character Ij is called a gene. Population: set of individuals.

Genetic Algorithms A Genetic Algorithm (GA) is a computational model consisting of five parts: A starting set of individuals, P. Crossover: technique to combine two parents to create offspring. Mutation: randomly change an individual. Fitness: determine the best individuals. Algorithm which applies the crossover and mutation techniques to P iteratively using the fitness function to determine the best individuals in P to keep.

Crossover Examples

Genetic Algorithm

GA Advantages/Disadvantages Easily parallelized Disadvantages Difficult to understand and explain to end users. Abstraction of the problem and method to represent individuals is quite difficult. Determining fitness function is difficult. Determining how to perform crossover and mutation is difficult.