COMP3503 Intro to Inductive Modeling

Slides:



Advertisements
Similar presentations
Slides from: Doug Gray, David Poole
Advertisements

CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2005.
Cooperating Intelligent Systems
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2004.
Learning From Data Chichang Jou Tamkang University.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Part I: Classification and Bayesian Learning
Classification and Prediction: Basic Concepts Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Chapter 5 Data mining : A Closer Look.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
1 Machine Learning: Lecture 5 Experimental Evaluation of Learning Algorithms (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
Data Mining Techniques
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Inductive learning Simplest form: learn a function from examples
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Learning from observations
Learning from Observations Chapter 18 Through
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.
1 COMP3503 Inductive Decision Trees with Daniel L. Silver Daniel L. Silver.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.
Ensemble Methods: Bagging and Boosting
Classification Techniques: Bayesian Classification
Learning from observations
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
CogNova Technologies 1 Evaluating Induced Models Evaluating Induced Models with Daniel L. Silver Daniel L. Silver Copyright (c), 2004 All Rights Reserved.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
CpSc 881: Machine Learning Evaluating Hypotheses.
CS Inductive Bias1 Inductive Bias: How to generalize on novel data.
1 CSI5388 Current Approaches to Evaluation (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Data Mining and Decision Support
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Machine Learning in CSC 196K
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Machine Learning for Computer Security
Chapter 7. Classification and Prediction
CS 9633 Machine Learning Inductive-Analytical Methods
CSE 4705 Artificial Intelligence
Data Mining Lecture 11.
Computational Learning Theory
Classification and Prediction
Computational Learning Theory
INTRODUCTION TO Machine Learning 3rd Edition
Presentation transcript:

COMP3503 Intro to Inductive Modeling with Daniel L. Silver

Agenda Deductive and Inductive Modeling Learning Theory and Generalization Common Statistical Methods

The KDD Process Data Mining Interpretation and Evaluation Knowledge Selection and Preprocessing p(x)=0.02 Data Consolidation Patterns & Models Data Warehouse Prepared Data Consolidated Data Data Sources

Deductive and Inductive Modeling

Induction versus Deduction Top-down verification Deduction Model or General Rule Example A Example B Example C Induction Bottom-up construction

Deductive Modeling Top-down (toward the data) verification of an hypothesis The hypothesis is generated within the mind of the data miner Exploratory tools such as OLAP and data visualization software are used Models tend to be used for description

Inductive Modeling Bottom-up (from the data) development of an hypothesis The hypothesis is generated by the technology directly from the data Statistical and machine learning tools such as regression, decision trees and artificial neural networks are used Models can be used for prediction

Inductive Modeling Objective: Develop a general model or hypothesis from specific examples Function approximation (curve fitting) Classification (concept learning, pattern recognition) f(x) x Show OH of multi-disciplinary nature of study of ANNs x1 x2 A B

Learning Theory and Generalization

Inductive Modeling = Learning Basic Framework for Inductive Learning Environment Testing Examples Training Examples Induced Model or Hypothesis Inductive Learning System ~ (x, f(x)) h(x) = f(x)? Output Classification A problem of representation and search for the best hypothesis, h(x). (x, h(x))

Inductive Modeling = Data Mining Ideally, an hypothesis (model) is: Complete – covers all potential examples Consistent – no conflicts Accurate - able to generalize to previously unseen examples Valid – presents a truth Transparent – human readable knowledge Show OH of multi-disciplinary nature of study of ANNs

Inductive Modeling Generalization The objective of learning is to achieve good generalization to new cases, otherwise just use a look-up table. Generalization can be defined as a mathematical interpolation or regression over a set of training points: f(x) x

Inductive Modeling Generalization Generalization accuracy can be guaranteed for a specified confidence level given sufficient number of examples Models can be validated for accuracy by using a previously unseen test set of examples

Learning Theory Probably Approximately Correct (PAC) theory of learning (Leslie Valiant, 1984) Poses questions such as: How many examples are needed for good generalization? How long will it take to create a good model? Answers depend on: Complexity of the actual function The desired level of accuracy of the model (75%) The desired confidence in finding a model with this the accuracy (19 times out of 20 = 95%)

Learning Theory c - - h + + - - - + Where c and h disagree - - - Space of all possible examples The true error of a hypothesis h is the probability that h will misclassify an instance drawn at random from X, error(h) = P[c(x)  h(x)]

Learning Theory Three notions of error: Training Error How often training set is misclassified Test Error How often an independent test set is misclassified True Error How often the entire population of possible examples would be misclassified Must be estimated from the Test Error

Linear and Non-Linear Problems Linear functions Linearly separable classifications Non-linear Problems Non-linear functions Not linearly separable f(x) x A B x2 x1 f(x) B A B x2 x1

Inductive Bias Every inductive modeling system has an Inductive Bias Consider a simple set of training examples like the following: Go to Excel Spreadsheet a check using f(x) x Go to generalize.xls

Inductive Bias Can you think of any biases that you commonly use when you are learning something new? Is there one best inductive bias? KISS - Occam’s Razor Simple linear function – least squares fit

Inductive Modeling Methods Automated Exploration/Discovery e.g.. discovering new market segments distance and probabilistic clustering algorithms Prediction/Classification e.g.. forecasting gross sales given current factors statistics (regression, K-nearest neighbour) artificial neural networks, genetic algorithms Explanation/Description e.g.. characterizing customers by demographics inductive decision trees/rules rough sets, Bayesian belief nets x1 x2 A B f(x) x At the university, the corporate sponsors and within the greater business community ... Research Investigate and advance state-of-the art KDD technologies and processes and their application to marketing information Application Identify problems and apply appropriate KDD technologies and processes Facilitate project management though the transfer of success factors Promotion and Resource Act as a catalyst and resource on KDD Publish research and case study results Establish Atlantic Canada as a leader Education Develop and deliver KDD seminar material at the academic and industry level if age > 35 and income < $35k then ...

Common Statistical Methods

Linear Regression Y = b0 + b1 X1 + b2 X2 +... The coefficients b0, b1 … determine a line (or hyperplane for higher dim.) that fits the data Closed form solution via least squares (computes the smallest sum of squared distances between the examples and predicted values of Y) Inductive bias: The solution can be modeled by a straight line or hyperplane

Linear Regression Y = b0 + b1 X1 + b2 X2 +... A great way to start since it assumes you are modeling a simple function … Why?

Logistic Regression Y = 1/(1+e-Z) Where Z = b0 + b1 X1 + b2 X2 +… Z Y = 1/(1+e-Z) Where Z = b0 + b1 X1 + b2 X2 +… Output is [0,1] and represents probability The coefficients b0, b1 … determine an S-shaped non-linear curve that best fits data The coefficients are estimated using an iterative maximum-likelihood method Inductive bias: The solution can be modeled by this S-shaped non-linear surface

Logistic Regression Can be used for classification problems 1 Y Z Y = 1/(1+e-Z) Where Z = b0 + b1 X1 + b2 X2 +… Can be used for classification problems The output can be used as the probability of being of the class (or positive) Alternatively, any value above a cut-off (typically 0.5) is classified as being a positive example … A logistic regression Javascript page

THE END danny.silver@acadiau.ca

Learning Theory Example Space X(x,c(x)) c - h x = input attributes c = true class function (e.g. “likes product”) h = hypothesis (model) + - + - - Where c and h disagree The true error of a hypothesis h is the probability that h will misclassify an instance drawn at random from X, err(h) = P[c(x)  h(x)]

PAC - A Probabilistic Guarantee Generalization PAC - A Probabilistic Guarantee H = # possible hypotheses in modeling system  = desired true error, where (0 <  < 1)  = desired confidence (1- ), where (0 <  < 1) The the number of training examples required to select (with confidence ) a hypothesis h with err(h) <  is given by