1. Introduction Consistency of learning processes To explain when a learning machine that minimizes empirical risk can achieve a small value of actual.

Slides:



Advertisements
Similar presentations
Advanced topics in Financial Econometrics Bas Werker Tilburg University, SAMSI fellow.
Advertisements

Ulams Game and Universal Communications Using Feedback Ofer Shayevitz June 2006.
Introduction to Proofs
Boyce/DiPrima 9th ed, Ch 2.8: The Existence and Uniqueness Theorem Elementary Differential Equations and Boundary Value Problems, 9th edition, by William.
A2 Psychology: Unit 4: Part C
Weakening the Causal Faithfulness Assumption
Hoare’s Correctness Triplets Dijkstra’s Predicate Transformers
Chapter 6 The Structural Risk Minimization Principle Junping Zhang Intelligent Information Processing Laboratory, Fudan University.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Copyright © Cengage Learning. All rights reserved. CHAPTER 5 SEQUENCES, MATHEMATICAL INDUCTION, AND RECURSION SEQUENCES, MATHEMATICAL INDUCTION, AND RECURSION.
Introduction and Overview
Sociology as a Science. Natural Sciences  Biology and Chemistry are probably the first subjects which spring to mind when considering “what is science”
10 Further Time Series OLS Issues Chapter 10 covered OLS properties for finite (small) sample time series data -If our Chapter 10 assumptions fail, we.
ISBN Chapter 3 Describing Syntax and Semantics.
The Art and Science of Teaching (2007)
Instructor : Dr. Saeed Shiry
By : L. Pour Mohammad Bagher Author : Vladimir N. Vapnik
Instructor : Saeed Shiry
دانشگاه صنعتي اميرکبير Author: Vladimir N. Vapnik
Beginning the Research Design
Models -1 Scientists often describe what they do as constructing models. Understanding scientific reasoning requires understanding something about models.
CSE115/ENGR160 Discrete Mathematics 01/31/12 Ming-Hsuan Yang UC Merced 1.
Learning From Data Chichang Jou Tamkang University.
So far we have learned about:
THE PROCESS OF SCIENCE. Assumptions  Nature is real, understandable, knowable through observation  Nature is orderly and uniform  Measurements yield.
Scientific method - 1 Scientific method is a body of techniques for investigating phenomena and acquiring new knowledge, as well as for correcting and.
Describing Syntax and Semantics
Introduction to Proofs Goals 1.Introduce notion of proof & basic proof methods. 2.Distinguish between correct & incorrect arguments 3.Understand & construct.
TR1413: Discrete Mathematics For Computer Science Lecture 1: Mathematical System.
THE TRANSITION FROM ARITHMETIC TO ALGEBRA: WHAT WE KNOW AND WHAT WE DO NOT KNOW (Some ways of asking questions about this transition)‏
De Finetti’s ultimate failure Krzysztof Burdzy University of Washington.
Inferential Statistics
Introduction to Earth Science Doing Science.  Scientific method – a systemic approach to answering questions about the natural world  Sufficient observation.
Definitions of Reality (ref . Wiki Discussions)
Statistical Hypothesis Testing. Suppose you have a random variable X ( number of vehicle accidents in a year, stock market returns, time between el nino.
MGF 1107 Mathematics of Social Choice Part 1a – Introduction, Deductive and Inductive Reasoning.
T. Mhamdi, O. Hasan and S. Tahar, HVG, Concordia University Montreal, Canada July 2010 On the Formalization of the Lebesgue Integration Theory in HOL On.
Introduction to Proofs
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
The importance of sequences and infinite series in calculus stems from Newton’s idea of representing functions as sums of infinite series.  For instance,
CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.
1 Issues in Assessment in Higher Education: Science Higher Education Forum on Scientific Competencies Medellin-Colombia Nov 2-4, 2005 Dr Hans Wagemaker.
Basic Concepts of Discrete Probability (Theory of Sets: Continuation) 1.
March 3, 2015Applied Discrete Mathematics Week 5: Mathematical Reasoning 1Arguments Just like a rule of inference, an argument consists of one or more.
Chapter 1 Logic Section 1-1 Statements Open your book to page 1 and read the section titled “To the Student” Now turn to page 3 where we will read the.
Formal Models in AGI Research Pei Wang Temple University Philadelphia, USA.
T. Poggio, R. Rifkin, S. Mukherjee, P. Niyogi: General Conditions for Predictivity in Learning Theory Michael Pfeiffer
Biological Science.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
1 Introduction to Abstract Mathematics Chapter 2: The Logic of Quantified Statements. Predicate Calculus Instructor: Hayk Melikya 2.3.
Definitions of Reality (ref. Wiki Discussions). Reality Two Ontologic Approaches What exists: REALISM, independent of the mind What appears: PHENOMENOLOGY,
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Chapter 6 - Standardized Measurement and Assessment
1.3: Scientific Thinking & Processes Key concept: Science is a way of thinking, questioning, and gathering evidence.
2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)
CS104:Discrete Structures Chapter 2: Proof Techniques.
2D Complex-Valued Eikonal Equation Peijia Liu Department of Mathematics The University of Texas at Austin.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.
Philosophy of science What is a scientific theory? – Is a universal statement Applies to all events in all places and time – Explains the behaviour/happening.
Chapter 1 Logic and proofs
Direct Proof and Counterexample III: Divisibility
Random Testing: Theoretical Results and Practical Implications IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2012 Andrea Arcuri, Member, IEEE, Muhammad.
Chapter 3 The Real Numbers.
Research Process №5.
Complex Variables. Complex Variables Open Disks or Neighborhoods Definition. The set of all points z which satisfy the inequality |z – z0|
Virtual University of Pakistan
Presentation transcript:

1

Introduction Consistency of learning processes To explain when a learning machine that minimizes empirical risk can achieve a small value of actual risk (to generalize) and when it can not. Equivalently, to describe necessary and sufficient conditions for the consistency of learning processes that minimize the empirical risk. It is extremely important to use concepts that describe necessary and sufficient conditions for consistency. This guarantees that the constructed theory is general and cannot be improved from the conceptual point of view. 2

3

The classical definition of consistency Let Q(z,α l ) be a function that minimizes the empirical risk functional for a given set of i.i.d. observations z1,…, z l. 4

The classical definition of consistency The ERM principle is consistent for the set of functions Q(z, α),αЄΛ, and for the p.d.f. F (z) if the following two sequences converge in probability to the same limit Equation (1) asserts that the values of achieved risks converge to the best possible. Equation (2) asserts that one can estimate on the basis of the values of empirical risk the minimal possible value of the risk. 5

The classical definition of consistency 6

Goal To obtain conditions of consistency for the ERM method in terms of general characteristics of the set of functions and the probability measure. This is an impossible task because the classical definition of consistency includes cases of trivial consistency. 7

Trivial consistency 8

9

ERM consistency In order to create a theory of consistency of the ERM method depending only on the general properties (capacity) of the set of functions, a consistency definition excluding trivial consistency cases is needed. This is done by non-trivial (strict) consistency definition. 10

Non-trivial consistency 11

12

Key theorem of learning theory 13

Consistency of the ERM principle 14

Consistency of the ERM principle In contrast to the so-called uniform two-sided convergence defined by the equation 15

Consistency of the ML Method 16

Consistency of the ML Method 17

18

Empirical process 19

Consistency of an empirical process The necessary and sufficient conditions for an empirical process to converge in probability to zero imply that the equality holds true. 20

Law of large numbers and its generalization 21

Law of large numbers and its generalization 22

Entropy 23

Entropy of the set of indicator functions (Diversity) 24

Entropy of the set of indicator functions (Diversity) 25

Entropy of the set of indicator functions (Random entropy) 26

Entropy of the set of real functions (Diversity) 27

28

VC Entropy of the set of real functions 29

VC Entropy of the set of real functions (Random entropy and VC entropy) 30

Conditions for uniform two-sided convergence 31

Conditions for uniform two-sided convergence (Corollary) 32

33

Uniform one-sided convergence Uniform two-sided convergence can be described as which includes uniform one-sided convergence, and it's sufficient condition for ERM consistency. But for consistency of ERM method, the second condition on the left-hand side of (9) can be violated. 34

Conditions for uniform one-sided convergence 35

Conditions for uniform one-sided convergence 36

37

Models of reasoning Deductive Moving from general to particular. The ideal approach is to obtain corollaries (consequences) using a system of axioms and inference rules. Guarantees that true consequences are obtained from true premises. Inductive Moving from particular to general. Formation of general judgments from particular assertions. Judgments obtained from particular assertions are not always true. 38

Demarcation problem Proposed by Kant, it is a central question of inductive theory. Demarcation problem What is the difference between the cases with a justified inductive step and those for which the inductive step is not justified? Is there a formal way to distinguish between true theories and false theories? 39

Example Assume that meteorology is a true theory and astrology is a false one. What is the formal difference between them? The complexity of the models? The predictive ability of their models? Their use of mathematics? The level of formality of inference? None of the above gives a clear advantage to either of these theories. 40

Criterion for demarcation Suggested by Popper (1930), a necessary condition for justifiability of a theory is the feasibility of its falsification. By falsification, Popper means the existence of a collection of particular assertions that cannot be explained by the given theory although they fall into its domain. If the given theory can be falsified it satisfies the necessary conditions of a scientific theory. 41

42

Nature of the ERM principle What happens if the condition of one-side convergence (theorem 11) is not valid? Why is the ERM method not consistent is this case? Answer: if uniform two-sided convergence does not take place, then the method of minimizing the empirical risk is non-falsifiable. 43

Complete (Popper's) non- falsifiability 44

Complete (Popper's) non- falsifiability 45

Complete (Popper's) non- falsifiability 46

Partial non-falsifiability 47

Partial non-falsifiability 48

Potential non-falsifiability (Definition) 49

Potential non-falsifiability (Graphical representation) A potentially non-falsifiable learning machine 50

Potential non-falsifiability (Generalization) 51

Potential non-falsifiability 52

Refrences Vapnik, Vladimir,”The Nature of Statistical Learning Theory”, 2000 Miguel A. Veganzones, “Consistency and bounds on the rate of convergence for ERM methods” 53

54