Data Mining – Algorithms: Linear Models Chapter 4, Section 4.6.

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

AP Statistics Section 3.2 C Coefficient of Determination
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
Extension The General Linear Model with Categorical Predictors.
AP Statistics Chapter 8 Linear Regression.
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
 Coefficient of Determination Section 4.3 Alan Craig
Indian Statistical Institute Kolkata
Classification Algorithms – Continued. 2 Outline  Rules  Linear Models (Regression)  Instance-based (Nearest-neighbor)
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Understanding the General Linear Model
x – independent variable (input)
Statistical Methods Chichang Jou Tamkang University.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.
Prediction with Regression Analysis (HK: Chapter 7.8) Qiang Yang HKUST.
1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Chapter 5 Data mining : A Closer Look.
Classification and Prediction: Regression Analysis
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Neural Networks Lecture 8: Two simple learning algorithms
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Data Mining – Algorithms: OneR Chapter 4, Section 4.1.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
COMP3503 Intro to Inductive Modeling
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 16– Linear and Logistic Regression) Pushpak Bhattacharyya CSE Dept., IIT Bombay.
Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict.
Machine Learning CSE 681 CH2 - Supervised Learning.
Data Mining – Algorithms: Prism – Learning Rules via Separating and Covering Chapter 4, Section 4.4.
CSC321: Neural Networks Lecture 2: Learning with linear neurons Geoffrey Hinton.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
ICS 178 Introduction Machine Learning & data Mining Instructor max Welling Lecture 6: Logistic Regression.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Regression Lines. Today’s Aim: To learn the method for calculating the most accurate Line of Best Fit for a set of data.
Data Preparation as a Process Markku Ursin
Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers Local Linear Regression Local Polynomial Regression 6.2 Selecting.
Linear Discrimination Reading: Chapter 2 of textbook.
11/26/2015 V. J. Motto 1 Chapter 1: Linear Models V. J. Motto M110 Modeling with Elementary Functions 1.5 Best-Fit Lines and Residuals.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Data Mining – Algorithms: Naïve Bayes Chapter 4, Section 4.2.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
Warm Up Feel free to share data points for your activity. Determine if the direction and strength of the correlation is as agreed for this class, for the.
M Machine Learning F# and Accord.net.
Linear Prediction Correlation can be used to make predictions – Values on X can be used to predict values on Y – Stronger relationships between X and Y.
1 Statistics & R, TiP, 2011/12 Neural Networks  Technique for discrimination & regression problems  More mathematical theoretical foundation  Works.
LSRLs: Interpreting r vs. r 2 r – “the correlation coefficient” tells you the strength and direction between two variables (x and y, for example, height.
PREDICTION Elsayed Hemayed Data Mining Course. Outline  Introduction  Regression Analysis  Linear Regression  Multiple Linear Regression  Predictor.
Machine Learning in CSC 196K
Chapter 4: Algorithms CS 795. Inferring Rudimentary Rules 1R – Single rule – one level decision tree –Pick each attribute and form a single level tree.
DISCRIMINANT ANALYSIS. Discriminant Analysis  Discriminant analysis builds a predictive model for group membership. The model is composed of a discriminant.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.
CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Chapter 7. Classification and Prediction
Data Mining – Algorithms: Instance-Based Learning
Regression Chapter 6 I Introduction to Regression
CSE 4705 Artificial Intelligence
Boosting and Additive Trees
A Simple Artificial Neuron
Linear Regression.
Artificial Intelligence 10. Neural Networks
Classification and Prediction
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

Data Mining – Algorithms: Linear Models Chapter 4, Section 4.6

Numeric Attributes Numeric prediction and/ or numeric attributes as predictors Linear regression is well established statistical technique –Designed to predict numeric value based on numeric attributes –Determines optimal set of coefficients for linear equation: pred = w 0 + w 1 a 1 + w 2 a 2 + … + w n a n –Optimal means prediction errors squared is minimized –For data mining, this would be done on training data so that it can be tested on test data –I hope that a CSC major could read a statistics book and then write the code to do this –However, there is no need to do this, since this method is so available, unless you are seeking to create an improved version of it

Example <Show Basketball Spreadsheet – Baskball sheet NOTE – input values, weights, prediction vs actual <Show testReg sheet – test on separate instances NOTE – how it did – prediction vs actual – difference, correlation

Using Regression for Classification Perform regression for each class Set output to be predicted = 1 for training instances that belong to a class Set output to be predicted = 0 for training instances that do NOT belong to the class Do this for each class, and you will have an “membership function” equation for each class On test, plug new instance into each equation, and highest value produced will be the prediction to make

Example <Show discretized sheet NOTE – prep of data – into low, medium, high NOTE – Weights for 3 regressions, high, med, low <Show Test sheet NOTE – Calcs Hi, Med, Low (doesn’t do that well, suspect that the data may not be from same source (NBA), and that the discretization was a bit of a problem (very few low)

More sophisticated Do as many pairwise competitions as necessary Training – two classes against each other: –temporarily toss training instances that are not one of the two –Set output = 1 for class to be predicted and –1 for other Test – do all pairwise competitions, winner of each gets a vote –E.g. say – –Medium beats High –Medium beats Low –High beats Low –Medium wins Conservative approach would be to predict nothing if no prediction dominates

In Context Has been used for decades for various applications (e.g. social science research) Bias – only searches for linear equations – no squares, cubes etc To work well, data must fit a linear model – e.g must be “ linearly separable ” – be able to divide with a line (in 2D, a plane in 3D, a hyperplane in multi-D) To work well, attributes should not be highly correlated with each other Depends on numeric attributes

Let’s Look at WEKA Linear Regression with Basketball data No Correctness measures –Correlations –Error Discretize Points per minute –Try logistic regression – a categorical prediction approach

End Section 4.6