1 Information Geometry of Self-organizing maximum likelihood Shinto Eguchi ISM, GUAS This talk is based on joint research with Dr Yutaka Kano, Osaka Univ.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Parametric Families of Distributions and Their Interaction with the Workshop Title Chris Jones The Open University, U.K.
Point Estimation Notes of STAT 6205 by Dr. Fan.
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
The General Linear Model. The Simple Linear Model Linear Regression.
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
WAGOS Conformal Changes of Divergence and Information Geometry Shun-ichi Amari RIKEN Brain Science Institute.
Maximum likelihood (ML) and likelihood ratio (LR) test
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
Maximum likelihood (ML)
Estimation of parameters. Maximum likelihood What has happened was most likely.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Presenting: Assaf Tzabari
Parametric Inference.
2. Point and interval estimation Introduction Properties of estimators Finite sample size Asymptotic properties Construction methods Method of moments.
Maximum-Likelihood estimation Consider as usual a random sample x = x 1, …, x n from a distribution with p.d.f. f (x;  ) (and c.d.f. F(x;  ) ) The maximum.
Visual Recognition Tutorial
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Minimal sufficient statistic.
7. Nonparametric inference  Quantile function Q  Inference on F  Confidence bands for F  Goodness- of- fit tests 1.
1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.
Maximum likelihood (ML)
Review of Lecture Two Linear Regression Normal Equation
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Adaptation Techniques in Automatic Speech Recognition Tor André Myrvoll Telektronikk 99(2), Issue on Spoken Language Technology in Telecommunications,
1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced.
Natural Gradient Works Efficiently in Learning S Amari (Fri) Computational Modeling of Intelligence Summarized by Joon Shik Kim.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Random Sampling, Point Estimation and Maximum Likelihood.
1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.
Founded 1348Charles University. Johann Kepler University of Linz FSV UK STAKAN III Institute of Economic Studies Faculty of Social Sciences Charles University.
Chapter 7 Point Estimation
Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part:
CS Statistical Machine learning Lecture 10 Yuan (Alan) Qi Purdue CS Sept
By Sharath Kumar Aitha. Instructor: Dr. Dongchul Kim.
1 Lecture 16: Point Estimation Concepts and Methods Devore, Ch
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Image cryptosystems based on PottsNICA algorithms Meng-Hong Chen Jiann-Ming Wu Department of Applied Mathematics National Donghwa University.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
RIKEN Brain Science Institute
1 Standard error Estimated standard error,s,. 2 Example 1 While measuring the thermal conductivity of Armco iron, using a temperature of 100F and a power.
BCS547 Neural Decoding.
Founded 1348Charles University 1. FSV UK STAKAN III Institute of Economic Studies Faculty of Social Sciences Institute of Economic Studies Faculty of.
Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
1 Double the confidence region S. Eguchi, ISM & GUAS This talk is a part of co-work with J. Copas, University of Warwick.
Chapter 7 Sufficient Statistics. 7.1 Measures of Quality of Estimators.
Statistical Data Analysis 2011/2012 M. de Gunst Lecture 5.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
M ODEL IS W RONG ?! S. Eguchi, ISM & GUAS. What is MODEL? No Model is True ! Feature of interests can reflect on Model Patterns of interests can incorporate.
Fisher Information and Applications MLCV Reading Group 3Mar16.
Computacion Inteligente Least-Square Methods for System Identification.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
STATISTICS POINT ESTIMATION
Visual Recognition Tutorial
Information Geometry: Duality, Convexity, and Divergences
Data Mining Lecture 11.
Statistical Learning Dong Liu Dept. EEIS, USTC.
Lecture 8 Generalized Linear Models &
Linear Systems.
دسته بندی با استفاده از مدل های خطی
Summarizing Data by Statistics
The Multivariate Normal Distribution, Part 2
Parametric Methods Berlin Chen, 2005 References:
Homeworks 1 PhD Course.
Applied Statistics and Probability for Engineers
Presentation transcript:

1 Information Geometry of Self-organizing maximum likelihood Shinto Eguchi ISM, GUAS This talk is based on joint research with Dr Yutaka Kano, Osaka Univ Bernoulli 2000 Conference at Riken on 27 October, 2000

2 Consider a statistical model: -MLE Maximum Likelihood Estimation (MLE)( Fisher, 1922), Take an increasing function. Consistency, efficiency sufficiency, unbiasedness invariance, information

3 -MLE MLE Normal density -MLE given data

outlier MLE -MLE Normal density

5

6 Examples (1 ) (2 ) (3 ) -divergence KL-divergence

7

8 Pythagorian theorem (1,1) (1,0) (0,1) ( t, s ). (0,0) f g h

9 (Pf)

10 Differential geometry of Riemann metric Affine connection Conjugate affine connection Ciszsar’s divergence

11 -divergence Amari’s -divergence

12 -likelihood function M-estimation ( Huber, 1964, 1983) Kullback-Leibler and maximum likelihood

13 Another definition of  -likelihood Take a positive function  (x,  ) and define  -likelihood equation is a weighted score with integrabity.

14 Consistency of  -MLE

15 Influence function Fisher consistency  -contamination model of Asymptotic efficiency Robustness or Efficiency

16 Generalized linear model Regression model Estimating equation

17 Bernoulli regression Logistic regression

18 Misclassification model MLE

19 Group II Group II from Group I = from Logistic Discrimination Mislabel Group I 5 Group II 35 Group I

20 Misclassification Group I 5 data Group II 35 data

21 Poisson regression -likelihood function Canonical link -contamination model

22 Neural network

23 Input Output

24 -maximum likelihood Maximum likelihood

25 Classical procedure for PCA Self-organizing procedure Let off-line data.

26

27 Classic procedure Self-organizing procedure

28 Independent Component Analysis (Minami & Eguchi, 2000) F F

29 S S F Theorem (Semiparametric consistency) (Pf)

30 -likelihood satisfies the semiparametric consistency

31

32 Usual methodself-organizing method Blue dots Blue & red dots

the exponential power 50

34 Concluding remark Bias potential function ? !  -Regression analysis  -Discriminant analysis  -PCA  -ICA  -sufficiency  -factoriziable  -exponential family  -EM algorithm