CS 60050 Machine Learning 15 Jan 2008. Inductive Classification.

Slides:



Advertisements
Similar presentations
Explanation-Based Learning (borrowed from mooney et al)
Advertisements

Concept Learning and the General-to-Specific Ordering
2. Concept Learning 2.1 Introduction
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Text Categorization CSC 575 Intelligent Information Retrieval.
Università di Milano-Bicocca Laurea Magistrale in Informatica
CS 391L: Machine Learning: Inductive Classification
Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007 Machine Learning Lecture 2: Concept Learning and Version Spaces 1.
Machine Learning CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning.
Scientific Thinking - 1 A. It is not what the man of science believes that distinguishes him, but how and why he believes it. B. A hypothesis is scientific.
Part I: Classification and Bayesian Learning
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Classification and Prediction: Basic Concepts Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Introduction to Social Science Research
1 Naïve Bayes A probabilistic ML algorithm. 2 Axioms of Probability Theory All probabilities between 0 and 1 True proposition has probability 1, false.
For Monday No reading Homework: –Chapter 18, exercises 1 and 2.
CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias.
1 Inductive Classification Based on the ML lecture by Raymond J. Mooney University of Texas at Austin.
For Wednesday No new reading Homework: –Chapter 18, exercises 3, 4, 7.
For Monday Read chapter 18, sections 5-6 Homework: –Chapter 18, exercises 1-2.
1 Inductive Classification. Machine learning tasks Classification Problem Solving –Classification/categorization: the set of categories is given (e.g.
For Friday Read chapter 18, sections 3-4 Homework: –Chapter 14, exercise 12 a, b, d.
1 Machine Learning What is learning?. 2 Machine Learning What is learning? “That is what learning is. You suddenly understand something you've understood.
Machine Learning Chapter 11.
Machine Learning CSE 681 CH2 - Supervised Learning.
For Wednesday Read chapter 18, section 7 Homework: –Chapter 18, exercise 11.
For Friday No reading No homework. Program 4 Exam 2 A week from Friday Covers 10, 11, 13, 14, 18, Take home due at the exam.
For Friday Finish Chapter 18 Homework: –Chapter 18, exercises 1-2.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
1 CS 391L: Machine Learning: Bayesian Learning: Naïve Bayes Raymond J. Mooney University of Texas at Austin.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
1 Inductive Classification Based on the ML lecture by Raymond J. Mooney University of Texas at Austin.
1 Concept Learning By Dong Xu State Key Lab of CAD&CG, ZJU.
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 22, 2001 William.
Chapter 2: Concept Learning and the General-to-Specific Ordering.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
Classification Techniques: Bayesian Classification
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
CpSc 810: Machine Learning Concept Learning and General to Specific Ordering.
Concept Learning and the General-to-Specific Ordering 이 종우 자연언어처리연구실.
Outline Inductive bias General-to specific ordering of hypotheses
Overview Concept Learning Representation Inductive Learning Hypothesis
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
For Monday Finish chapter 19 Take-home exam due. Program 4 Any questions?
CS Inductive Bias1 Inductive Bias: How to generalize on novel data.
Theories and Hypotheses. Assumptions of science A true physical universe exists Order through cause and effect, the connections can be discovered Knowledge.
Machine Learning: Lecture 2
Machine Learning Concept Learning General-to Specific Ordering
Data Mining and Decision Support
1 Text Categorization CSE Categorization Given: –A description of an instance, x  X, where X is the instance language or instance space. –A fixed.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Concept Learning and The General-To Specific Ordering
Computational Learning Theory Part 1: Preliminaries 1.
Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2).
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
1 CS 343: Artificial Intelligence Machine Learning Raymond J. Mooney University of Texas at Austin.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Chapter 2 Concept Learning
Introduction to Research Methodology
Concept Learning Machine Learning by T. Mitchell (McGraw-Hill) Chp. 2
CS 9633 Machine Learning Concept Learning
Introduction to Research Methodology
Introduction to Research Methodology
Inductive Classification
Lecture 14 Learning Inductive inference
Machine Learning Chapter 2
Version Space Machine Learning Fall 2018.
Machine Learning Chapter 2
Presentation transcript:

CS Machine Learning 15 Jan 2008

Inductive Classification

Classification (Categorization) Given: A description of an instance, x  X, where X is the instance language or instance space. A fixed set of categories: C={c 1, c 2,…c n } Determine: The category of x: c(x)  C Training Given a set of training examples, D consisting of Find a hypothesized categorization function, h(x), such that: Consistent 1. Restrict learned functions a priori to a given hypothesis space, H, of functions h(x) that can be considered as definitions of c(x). Bias Any criteria other than consistency with the training data that is used to select a hypothesis.

Sample Category Learning Problem Instance language: size  {small, medium, large} color  {red, blue, green} shape  {square, circle, triangle} C = {positive, negative} D: ExampleSizeColorShapeCategory 1smallredcirclepositive 2largeredcirclepositive 3smallredtrianglenegative 4largebluecirclenegative

Inductive Learning Hypothesis Any function that is found to approximate the target concept well on a sufficiently large set of training examples will also approximate the target function well on unobserved examples. Assumes that the training and test examples are drawn independently from the same underlying distribution. Evaluation of Classification Learning Classification accuracy (% of instances classified correctly). Measured on an independent test data. Training time (efficiency of training algorithm). Testing time (efficiency of subsequent classification).

Category Learning as Search Category learning can be viewed as searching the hypothesis space for one (or more) hypotheses that are consistent with the training data. Consider an instance space consisting of n binary features  2 n instances. The target binary categorization function in principle could be any of the possible 2 2^n functions on n input bits. For conjunctive hypotheses, there are 4 choices for each feature: Ø, T, F, ?, so there are 4 n syntactically distinct hypotheses. However, all hypotheses with 1 or more Øs are equivalent, so there are 3 n +1 semantically distinct hypotheses. Therefore, conjunctive hypotheses are a small subset of the space of possible functions, but both are intractably large. All reasonable hypothesis spaces are intractably large or even infinite.

Using the Generality Structure By exploiting the structure imposed by the generality of hypotheses, an hypothesis space can be searched for consistent hypotheses without enumerating or explicitly exploring all hypotheses. An instance, x  X, is said to satisfy an hypothesis, h, iff h(x)=1 (positive) Given two hypotheses h 1 and h 2, h 1 is more general than or equal to h 2 (h 1  h 2 ) iff every instance that satisfies h 2 also satisfies h 1. Given two hypotheses h 1 and h 2, h 1 is (strictly) more general than h 2 (h 1 >h 2 ) iff h 1  h 2 and it is not the case that h 2  h 1. Generality defines a partial order on hypotheses.

Examples of Generality Conjunctive feature vectors is more general than Neither of and is more general than the other. Axis-parallel rectangles in 2-d space A is more general than B Neither of A and C are more general than the other. A B C

Sample Generalization Lattice Size: {sm, big} Color: {red, blue} Shape: {circ, squr} Number of hypotheses = = 28

Version Space Given an hypothesis space, H, and training data, D, the version space is the complete subset of H that is consistent with D.

Inductive Bias A hypothesis space that does not include all possible classification functions on the instance space incorporates a bias in the type of classifiers it can learn. Any means that a learning system uses to choose between two functions that are both consistent with the training data is called inductive bias. Inductive bias can take two forms: Language bias: The language for representing concepts defines a hypothesis space that does not include all possible functions (e.g. conjunctive descriptions). Search bias: The language is expressive enough to represent all possible functions (e.g. disjunctive normal form) but the search algorithm embodies a preference for certain consistent functions over others (e.g. syntactic simplicity).

Unbiased Learning For instances described by n features each with m values, there are m n instances. If these are to be classified into c categories, then there are c m^n possible classification functions. For n=10, m=c=2, there are approx. 3.4x10 38 possible functions, of which only 59,049 can be represented as conjunctions (an incredibly small percentage!) However, unbiased learning is futile since if we consider all possible functions then simply memorizing the data without any real generalization is as good an option as any.

Futility of Bias-Free Learning A learner that makes no a priori assumptions about the target concept has no rational basis for classifying any unseen instances. Inductive bias can also be defined as the assumptions that, when combined with the observed training data, logically entail the subsequent classification of unseen instances. Training-data + inductive-bias |― novel-classifications

No Panacea No Free Lunch (NFL) Theorem (Wolpert, 1995) Law of Conservation of Generalization Performance (Schaffer, 1994) One can prove that improving generalization performance on unseen data for some tasks will always decrease performance on other tasks (which require different labels on the unseen instances). Averaged across all possible target functions, no learner generalizes to unseen data any better than any other learner. There does not exist a learning method that is uniformly better than another for all problems. Given any two learning methods A and B and a training set, D, there always exists a target function for which A generalizes better (or at least as well) as B. Train both methods on D to produce hypotheses h A and h B. Construct a target function that labels all unseen instances according to the predictions of h A. Test h A and h B on any unseen test data for this target function and conclude that h A is better.

Logical View of Induction Deduction is inferring sound specific conclusions from general rules (axioms) and specific facts. Induction is inferring general rules and theories from specific empirical data. Induction can be viewed as inverse deduction. Find a hypothesis h from data D such that h  B |― D where B is optional background knowledge Abduction is similar to induction, except it involves finding a specific hypothesis, h, that best explains a set of evidence, D, or inferring cause from effect. Typically, in this case B is quite large compared to induction and h is smaller and more specific to a particular event.

Induction and the Philosophy of Science Bacon ( ), Newton ( ) and the sound deductive derivation of knowledge from data. Hume ( ) and the problem of induction. Inductive inferences can never be proven and are always subject to disconfirmation. Popper ( ) and falsifiability. Inductive hypotheses can only be falsified not proven, so pick hypotheses that are most subject to being falsified. Kuhn ( ) and paradigm shifts. Falsification is insufficient, an alternative paradigm must be available that is clearly elegant and more explanatory must be available. Ptolmaic epicycles and the Copernican revolution Orbit of Mercury and general relativity Solar neutrino problem and neutrinos with mass Postmodernism: Objective truth does not exist; relativism; science is a social system of beliefs that is no more valid than others (e.g. religion).

Ockham (Occam)’s Razor William of Ockham ( ) was a Franciscan friar who applied the criteria to theology: “Entities should not be multiplied beyond necessity” (Classical version but not an actual quote) “The supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience.” (Einstein) Requires a precise definition of simplicity. Acts as a bias which assumes that nature itself is simple. Role of Occam’s razor in machine learning remains controversial.