Statistics between Inductive Logic and Empirical Science Jan Sprenger University of Bonn Tilburg Center for Logic and Philosophy of Science 3 rd PROGIC.

Slides:



Advertisements
Similar presentations
Probability models- the Normal especially.
Advertisements

Statistical Decision Theory Abraham Wald ( ) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.
Rulebase Expert System and Uncertainty. Rule-based ES Rules as a knowledge representation technique Type of rules :- relation, recommendation, directive,
Pattern Recognition and Machine Learning
Model Assessment and Selection
Model Assessment, Selection and Averaging
Chapter 4: Linear Models for Classification
Visual Recognition Tutorial
1. Introduction Consistency of learning processes To explain when a learning machine that minimizes empirical risk can achieve a small value of actual.
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Maximum likelihood (ML) and likelihood ratio (LR) test
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Estimation from Samples Find a likely range of values for a population parameter (e.g. average, %) Find a likely range of values for a population parameter.
Neural Networks Marco Loog.
Programme in Statistics (Courses and Contents). Elementary Probability and Statistics (I) 3(2+1)Stat. 101 College of Science, Computer Science, Education.
Classification with reject option in gene expression data Blaise Hanczar and Edward R Dougherty BIOINFORMATICS Vol. 24 no , pages
Learning From Data Chichang Jou Tamkang University.
Why sample? Diversity in populations Practicality and cost.
Presenting: Assaf Tzabari
Machine Learning CMPT 726 Simon Fraser University
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Thanks to Nir Friedman, HU
Maximum likelihood (ML)
Lecture 9: p-value functions and intro to Bayesian thinking Matthew Fox Advanced Epidemiology.
De Finetti’s ultimate failure Krzysztof Burdzy University of Washington.
Online Learning Algorithms
1 September 4, 2003 Bayesian System Identification and Structural Reliability Soheil Saadat, Research Associate Mohammad N. Noori, Professor & Head Department.
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Statistical Decision Theory
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Education 793 Class Notes Welcome! 3 September 2003.
Chapter 2: The Scientific Method and Environmental Sciences.
Basics of Probability. A Bit Math A Probability Space is a triple, where  is the sample space: a non-empty set of possible outcomes; F is an algebra.
Introduction Osborn. Daubert is a benchmark!!!: Daubert (1993)- Judges are the “gatekeepers” of scientific evidence. Must determine if the science is.
Chapter 1 Measurement, Statistics, and Research. What is Measurement? Measurement is the process of comparing a value to a standard Measurement is the.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Part II: Model Class Selection Given: Dynamic data from system and set of candidate model classes where each model class defines a set of possible predictive.
LECTURE 1 - SCOPE, OBJECTIVES AND METHODS OF DISCIPLINE "ECONOMETRICS"
Systems Realization Laboratory Criteria for evaluating uncertainty representations ASME DETC/CIE 2006 Philadelphia, PA Workshop on Uncertainty Representation.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
BCS547 Neural Decoding.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
1 Optimal design which are efficient for lack of fit tests Frank Miller, AstraZeneca, Södertälje, Sweden Joint work with Wolfgang Bischoff, Catholic University.
NTU & MSRA Ming-Feng Tsai
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Introduction to Research
Lecture №4 METHODS OF RESEARCH. Method (Greek. methodos) - way of knowledge, the study of natural phenomena and social life. It is also a set of methods.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
From NARS to a Thinking Machine Pei Wang Temple University.
Philosophy of science What is a scientific theory? – Is a universal statement Applies to all events in all places and time – Explains the behaviour/happening.
Model Comparison. Assessing alternative models We don’t ask “Is the model right or wrong?” We ask “Do the data support a model more than a competing model?”
© F. Anceau, Page 1 Sept , Models of Consciousness Exploration Workshop A Model for Consciousness Based on the Sequential Behavior of the Conscious.
On triangular norms, metric spaces and a general formulation of the discrete inverse problem or starting to think logically about uncertainty On triangular.
Statistica /Statistics Statistics is a discipline that has as its goal the study of quantity and quality of a particular phenomenon in conditions of.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Intro to Research Methods
CHAPTER 5 Handling Uncertainty BIC 3337 EXPERT SYSTEM.
Sampling Why use sampling? Terms and definitions
Predictive Learning from Data
Regression Models - Introduction
Henrik Singmann Karl Christoph Klauer Sieghard Beller
Pattern Recognition and Machine Learning
CS639: Data Management for Data Science
Generalized Diagnostics with the Non-Axiomatic Reasoning System (NARS)
Presentation transcript:

Statistics between Inductive Logic and Empirical Science Jan Sprenger University of Bonn Tilburg Center for Logic and Philosophy of Science 3 rd PROGIC Workshop, Canterbury

I. The Logical Image of Statistics

Inductive Logic Deductive logic discerns valid, truth-preserving inferences Deductive logic discerns valid, truth-preserving inferences P; P  Q  Q

Inductive Logic Deductive logic discerns valid, truth-preserving inferences Deductive logic discerns valid, truth-preserving inferences P; P  Q  Q Inductive logic generalizes that idea to non- truth-preserving inferences Inductive logic generalizes that idea to non- truth-preserving inferences P; P supports Q  (more) probably Q

Inductive Logic Inductive logic: truth of premises indicates truth of concluions Inductive logic: truth of premises indicates truth of concluions Main concepts: confirmation, evidential support

Inductive Logic Inductive logic: truth of premises indicates truth of concluions Inductive logic: truth of premises indicates truth of concluions Inductive inference: objective and independent of external factors Inductive inference: objective and independent of external factors Main concepts: confirmation, evidential support

The Logical Image of Statistics Statistics infers from particular data to general models Statistics infers from particular data to general models

The Logical Image of Statistics Statistics infers from particular data to general models Statistics infers from particular data to general models Formal theory of inductive inference, governed by general, universally applicable principles Formal theory of inductive inference, governed by general, universally applicable principles

The Logical Image of Statistics Statistics infers from particular data to general models Statistics infers from particular data to general models Formal theory of inductive inference, governed by general, universally applicable principles Formal theory of inductive inference, governed by general, universally applicable principles Separation of statistics and decision theory (statistics summarizes data in a way that makes a decision-theoretic analysis possible) Separation of statistics and decision theory (statistics summarizes data in a way that makes a decision-theoretic analysis possible)

The Logical Image of Statistics Contains theoretical (mathematics, logic) as well as empirical elements (problem-based engineering of useful methods, interaction with „real science“) Contains theoretical (mathematics, logic) as well as empirical elements (problem-based engineering of useful methods, interaction with „real science“) Where to locate on that scale?

The Logical Image of Statistics Pro: mathematical, „logical“ character of theoretical statistics Pro: mathematical, „logical“ character of theoretical statistics

The Logical Image of Statistics Pro: mathematical, „logical“ character of theoretical statistics Pro: mathematical, „logical“ character of theoretical statistics Pro: mechanical character of a lot of statistical practice (SPSS & Co.) Pro: mechanical character of a lot of statistical practice (SPSS & Co.)

The Logical Image of Statistics Pro: mathematical, „logical“ character of theoretical statistics Pro: mathematical, „logical“ character of theoretical statistics Pro: mechanical character of a lot of statistical practice (SPSS & Co.) Pro: mechanical character of a lot of statistical practice (SPSS & Co.) Pro: Connection between Bayesian statistics and probabilistic logic Pro: Connection between Bayesian statistics and probabilistic logic

The Logical Image of Statistics Pro: mathematical, „logical“ character of theoretical statistics Pro: mathematical, „logical“ character of theoretical statistics Pro: mechanical character of a lot of statistical practice (SPSS & Co.) Pro: mechanical character of a lot of statistical practice (SPSS & Co.) Pro: Connection between Bayesian statistics and probabilistic logic Pro: Connection between Bayesian statistics and probabilistic logic Cons: presented in this work... Cons: presented in this work...

II. Parameter Estimation

A Simple Experiment Five random numbers are drawn from {1, 2,..., N} (N unknown): Five random numbers are drawn from {1, 2,..., N} (N unknown): 21, 4, 26, 18, 12 21, 4, 26, 18, 12 What is the optimal estimate of N on the basis of the data? What is the optimal estimate of N on the basis of the data?

A Simple Experiment Five random numbers are drawn from {1, 2,..., N} (N unknown): Five random numbers are drawn from {1, 2,..., N} (N unknown): 21, 4, 26, 18, 12 21, 4, 26, 18, 12 What is the optimal estimate of N on the basis of the data? What is the optimal estimate of N on the basis of the data? That depends on the loss function!

Estimation and Loss Functions Aim: estimated parameter value close to true value Aim: estimated parameter value close to true value Loss function measures distance between estimated and true value Loss function measures distance between estimated and true value

Estimation and Loss Functions Aim: estimated parameter value close to true value Aim: estimated parameter value close to true value Loss function measures distance between estimated and true value Loss function measures distance between estimated and true value Choice of loss function sensitive to external constraints Choice of loss function sensitive to external constraints

A Bayesian approach Elicit prior distribution for the parameter N Elicit prior distribution for the parameter N Use incoming data for updating via conditionalization Use incoming data for updating via conditionalization Summarize data in a posterior distribution (credal set, etc.) Summarize data in a posterior distribution (credal set, etc.) Perform a decision-theoretic analysis Perform a decision-theoretic analysis

III. Model Selection

Model Selection True model usually „out of reach“ True model usually „out of reach“ Main idea: minimzing discrepancy between the approximating and the true model Main idea: minimzing discrepancy between the approximating and the true model Discrepancy can be measured in various ways Discrepancy can be measured in various ways cf. choice of a loss function cf. choice of a loss function Kullback-Leibler divergence, Gauß distance, etc. Kullback-Leibler divergence, Gauß distance, etc.

Model Selection A lot of model selection procedures focuses on estimating the discrepancy between the candidate model and the true model A lot of model selection procedures focuses on estimating the discrepancy between the candidate model and the true model Choose the model with the lowest estimated discrepancy to the true model Choose the model with the lowest estimated discrepancy to the true model That is easier said than done...

Problem-specific Premises Asymptotic behavior Asymptotic behavior Small or large candidate model set? Small or large candidate model set? Nested vs. non-nested models Nested vs. non-nested models Linear vs. non-linear models Linear vs. non-linear models Random error structure Random error structure

Problem-specific Premises Asymptotic behavior Asymptotic behavior Small or large candidate model set? Small or large candidate model set? Nested vs. non-nested models Nested vs. non-nested models Linear vs. non-linear models Linear vs. non-linear models Random error structure Random error structure Scientific understanding required to fix the premises!

Bayesian Model Selection Idea: Search for the most probable model (or the model that has the highest Bayes factor) Idea: Search for the most probable model (or the model that has the highest Bayes factor) Variety of Bayesian methods (BIC, intrinsic and fractional Bayes Factors,...) Variety of Bayesian methods (BIC, intrinsic and fractional Bayes Factors,...)

Bayesian Model Selection Idea: Search for the most probable model (or the model that has the highest Bayes factor) Idea: Search for the most probable model (or the model that has the highest Bayes factor) Variety of Bayesian methods (BIC, intrinsic and fractional Bayes Factors,...) Variety of Bayesian methods (BIC, intrinsic and fractional Bayes Factors,...) Does Bayes show a way out of the problems?

Bayesian Model Selection If the true model is not contained in the set of candidate models: must Bayesian methods be justified by their distance-minimizing properties? If the true model is not contained in the set of candidate models: must Bayesian methods be justified by their distance-minimizing properties?

Bayesian Model Selection If the true model is not contained in the set of candidate models: must Bayesian methods be justified by their distance-minimizing properties? If the true model is not contained in the set of candidate models: must Bayesian methods be justified by their distance-minimizing properties? It is not trivial that a particular distance function (e.g. K-L divergence) is indeed minimized by the model with the highest posterior! It is not trivial that a particular distance function (e.g. K-L divergence) is indeed minimized by the model with the highest posterior! Bayesian probabilities = probabilities of being close to the true model? Bayesian probabilities = probabilities of being close to the true model?

Model Selection and Parameter Estimation In the elementary parameter estimation case, posterior distributions were independent of decision-theoretic elements (utilities/loss functions) In the elementary parameter estimation case, posterior distributions were independent of decision-theoretic elements (utilities/loss functions) The reasonableness of a posterior distribution in Bayesian model selection is itself relative to the choice of a distance/loss function The reasonableness of a posterior distribution in Bayesian model selection is itself relative to the choice of a distance/loss function

IV. Conclusions

Conclusions (I) Quality of a model selection method subject to a plethora of problem-specific premises Quality of a model selection method subject to a plethora of problem-specific premises Model selection methods must be adapted to a specific problem (“engineering“) Model selection methods must be adapted to a specific problem (“engineering“)

Conclusions (I) Quality of a model selection method subject to a plethora of problem-specific premises Quality of a model selection method subject to a plethora of problem-specific premises Model selection methods must be adapted to a specific problem (“engineering“) Model selection methods must be adapted to a specific problem (“engineering“) Bayesian methods in model selection should have an instrumental interpretation Bayesian methods in model selection should have an instrumental interpretation Difficult to separate proper statistics from decision theory Difficult to separate proper statistics from decision theory

Conclusions (II) Optimality of an estimator is a highly ambiguous notions Optimality of an estimator is a highly ambiguous notions Statistics more alike to scientific modelling than to a branch of mathematics? Statistics more alike to scientific modelling than to a branch of mathematics? More empirical science than inductive logic? More empirical science than inductive logic?

Thanks a lot for your attention!!! © by Jan Sprenger, Tilburg, September 2007