CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias.

Slides:



Advertisements
Similar presentations
Concept Learning and the General-to-Specific Ordering
Advertisements

Completeness and Expressiveness
2. Concept Learning 2.1 Introduction
Mathematics in Engineering Education 1. The Meaning of Mathematics 2. Why Math Education Have to Be Reformed and How It Can Be Done 3. WebCT: Some Possibilities.
Computer Science CPSC 322 Lecture 25 Top Down Proof Procedure (Ch 5.2.2)
Online Scheduling with Known Arrival Times Nicholas G Hall (Ohio State University) Marc E Posner (Ohio State University) Chris N Potts (University of Southampton)
Concept Learning DefinitionsDefinitions Search Space and General-Specific OrderingSearch Space and General-Specific Ordering The Candidate Elimination.
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
Università di Milano-Bicocca Laurea Magistrale in Informatica
Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007 Machine Learning Lecture 2: Concept Learning and Version Spaces 1.
Machine Learning: Symbol-Based
MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Machine Learning -Ramya Karri -Rushin Barot. Machine learning Rough Set Theory in Machine Learning? Knower’s knowledge – Closed World Assumption – Open.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
THE TRANSITION FROM ARITHMETIC TO ALGEBRA: WHAT WE KNOW AND WHAT WE DO NOT KNOW (Some ways of asking questions about this transition)‏
Machine Learning Version Spaces Learning. 2  Neural Net approaches  Symbolic approaches:  version spaces  decision trees  knowledge discovery  data.
For Friday Read chapter 18, sections 3-4 Homework: –Chapter 14, exercise 12 a, b, d.
 Knowledge Acquisition  Machine Learning. The transfer and transformation of potential problem solving expertise from some knowledge source to a program.
1 Machine Learning What is learning?. 2 Machine Learning What is learning? “That is what learning is. You suddenly understand something you've understood.
Machine Learning Chapter 11.
Machine Learning CSE 681 CH2 - Supervised Learning.
Theory of Computation, Feodor F. Dragan, Kent State University 1 NP-Completeness P: is the set of decision problems (or languages) that are solvable in.
General-to-Specific Ordering. 8/29/03Logic Based Classification2 SkyAirTempHumidityWindWaterForecastEnjoySport SunnyWarmNormalStrongWarmSameYes SunnyWarmHighStrongWarmSameYes.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 22, 2001 William.
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
CpSc 810: Machine Learning Concept Learning and General to Specific Ordering.
Concept Learning and the General-to-Specific Ordering 이 종우 자연언어처리연구실.
Outline Inductive bias General-to specific ordering of hypotheses
Overview Concept Learning Representation Inductive Learning Hypothesis
Copyright © Zeph Grunschlag, Induction Zeph Grunschlag.
START OF DAY 1 Reading: Chap. 1 & 2. Introduction.
1 Inductive Learning (continued) Chapter 19 Slides for Ch. 19 by J.C. Latombe.
KU NLP Machine Learning1 Ch 9. Machine Learning: Symbol- based  9.0 Introduction  9.1 A Framework for Symbol-Based Learning  9.2 Version Space Search.
Automated Reasoning Early AI explored how to automated several reasoning tasks – these were solved by what we might call weak problem solving methods as.
For Monday Finish chapter 19 Take-home exam due. Program 4 Any questions?
CS Inductive Bias1 Inductive Bias: How to generalize on novel data.
Machine Learning: Lecture 2
Machine Learning Concept Learning General-to Specific Ordering
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Copyright © Zeph Grunschlag, Induction Zeph Grunschlag.
More Symbolic Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
CS464 Introduction to Machine Learning1 Concept Learning Inducing general functions from specific training examples is a main issue of machine learning.
Concept Learning and The General-To Specific Ordering
Computational Learning Theory Part 1: Preliminaries 1.
Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2).
Inductive Learning (2/2) Version Space and PAC Learning Russell and Norvig: Chapter 18, Sections 18.5 through 18.7 Chapter 18, Section 18.5 Chapter 19,
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Of 29 lecture 15: description logic - introduction.
Chapter 2 Concept Learning
ece 720 intelligent web: ontology and beyond
Concept Learning Machine Learning by T. Mitchell (McGraw-Hill) Chp. 2
CS 9633 Machine Learning Concept Learning
Analytical Learning Discussion (4 of 4):
Ordering of Hypothesis Space
Mathematical Induction Recursion
Data Mining Lecture 11.
Version Spaces Learning
Concept Learning.
Machine Learning: Lecture 6
Concept Learning Berlin Chen 2005 References:
Machine Learning: UNIT-3 CHAPTER-1
Machine Learning Chapter 2
Inductive Learning (2/2) Version Space and PAC Learning
Implementation of Learning Systems
Version Space Machine Learning Fall 2018.
Machine Learning Chapter 2
Presentation transcript:

CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

Learning Rote: – Until you discover the rule/concept(s), the very BEST you can ever expect to do is: Remember what you observed Guess on everything else Inductive: – What you do when you GENERALIZE from your observations and make (accurate) predictions Claim: “All [most of] the laws of nature were discovered by inductive reasoning”

The Big Question All you have is what you have OBSERVED Your generalization should at least be consistent with those observations But beyond that… – How do you know that your generalization is any good? – How do you choose among various candidate generalizations?

The Answer Is BIAS Any basis for choosing one decision over another, other than strict consistency with past observations

Why Bias? If you have no bias you cannot go beyond mere memorization – Mitchell’s proof using UGL and VS The power of a generalization system follows directly from its biases Progress towards understanding learning mechanisms depends upon understanding the sources of, and justification for, various biases

Concept Learning Given: A language of observations/instances A language of concepts/generalizations A matching predicate A set of observations Find generalizations that: 1. Are consistent with the observations, and 2. Classify instances beyond those observed

Claim The absence of bias makes it impossible to solve part 2 of the Concept Learning problem, i.e., learning is limited to rote learning

Unbiased Generalization Language Generalization  set of instances it matches An Unbiased Generalization Language (UGL), relative to a given language of instances, allows describing every possible subset of instances UGL = power set of the given instance language

Unbiased Generalization Procedure Uses Unbiased Generalization Language Computes Version Space (VS) relative to UGL VS  set of all expressible generalizations consistent with the training instances

Version Space (I) Let: – S be the set of maximally specific generalizations consistent with the training data – G be the set of maximally general generalizations consistent with the training data

Version Space (II) Intuitively – S keeps generalizing to accommodate new positive instances – G keeps specializing to avoid new negative instances The key issue is that they only do that to the smallest extent necessary to maintain consistency with the training data, that is, G remains as general as possible and S remains as specific as possible.

Version Space (III) The sets S and G precisely delimit the version space (i.e., the set of all plausible versions of the emerging concept). A generalization g is in the version space represented by S and G if and only if: – g is more specific than or equal to some member of G, and – g is more general than or equal to some member of S

Version Space (IV) Initialize G to the most general concept in the space Initialize S to the first positive training instance For each new positive training instance p – Delete all members of G that do not cover p – For each s in S If s does not cover p – Replace s with its most specific generalizations that cover p – Remove from S any element more general than some other element in S – Remove from S any element not more specific than some element in G For each new negative training instance n – Delete all members of S that cover n – For each g in G If g covers n – Replace g with its most general specializations that do not cover n – Remove from G any element more specific than some other element in G – Remove from G any element more specific than some element in S If G=S and both are singletons – A single concept consistent with the training data has been found If G and S become empty – There is no concept consistent with the training data

Lemma 1 Any new instance, NI, is classified as positive if and only if NI is identical to some observed positive instance

Proof of Lemma 1 (  ). If NI is identical to some observed positive instance, then NI is classified as positive – Follows directly from the definition of VS (  ). If NI is classified as positive, then NI is identical to some observed positive instance – Let g={p: p is an observed positive instance} UGL  g  VS NI matches all of VS  NI matches g

Lemma 2 Any new instance, NI, is classified as negative if and only if NI is identical to some observed negative instance

Proof of Lemma 2 (  ). If NI is identical to some observed negative instance, then NI is classified as negative – Follows directly from the definition of VS (  ). If NI is classified as negative, then NI is identical to some observed negative instance – Let G={all subsets containing observed negative instances} UGL  G  VS=UGL NI matches none in VS  NI was observed

Lemma 3 If NI is any instance which was not observed, then NI matches exactly one half of VS, and so cannot be classified

Proof of Lemma 3 (  ). If NI was not observed, then NI matches exactly one half of VS, and so cannot be classified – Let g={p: p is an observed positive instance} – Let G’={all subsets of unobserved instances} UGL  VS={g  g’: g’  G’} NI was not observed  NI matches exactly ½ of G’  NI matches exactly ½ of VS

Theorem An unbiased generalization procedure can never make the inductive leap necessary to classify instances beyond those it has observed

Proof of the Theorem The result follows immediately from Lemmas 1, 2 and 3 Practical consequence: If a learning system is to be useful, it must have some form of bias

Sources of Bias in Learning The representation language cannot express all possible classes of observations The generalization procedure is biased – Domain knowledge (e.g., double bonds rarely break) – Intended use (e.g., ICU – relative cost) – Shared assumptions (e.g., crown, bridge – dentistry) – Simplicity and generality (e.g., white men can’t jump) – Analogy (e.g., heat vs. water flow, thin ice) – Commonsense (e.g., social interactions, pain, etc.)

Fall 2004CS Machine Learning 23 Representation Language Decrease number of expressible generalizations  Increase ability to make the inductive leap Example: Restrict generalizations to conjunctive constraints on features in a Boolean domain

Fall 2004CS Machine Learning 24 Proof of Concept (I) Let N = number of features 2 2 N subsets of instances Let GL = {0, 1, *} can only denote subsets of size 2 p for 0  p  N

Fall 2004CS Machine Learning 25 Proof of Concept (II) For each p, there are only 2 N-p expressible subsets Fix N-p features (there are ways of choosing which) Set values for the selected features (there are 2 N-p possible settings)

Fall 2004CS Machine Learning 26 Proof of Concept (III) Excluding the empty set, the ratio of expressible to total generalizations is given by:

Fall 2004CS Machine Learning 27 Proof of Concept (IV) For example, if N=5 then only about 1 in 10 7 subsets may be represented Strong bias Two-edge sword: representation could be too sparse

Generalization Procedure Domain knowledge (e.g., double bonds rarely break) Intended use (e.g., ICU – relative cost) Shared assumptions (e.g., crown, bridge – dentistry) Simplicity and generality (e.g., white men can’t jump) Analogy (e.g., heat vs. water flow, thin ice) Commonsense (e.g., social interactions, pain, etc.)

Fall 2004CS Machine Learning 29 Conclusion Absence of bias = rote learning Efforts should focus on combined use of prior knowledge and observations in guiding the learning process Make biases and their use as explicit as observations and their use