Lwando Kondlo Supervisor: Prof. Chris Koen University of the Western Cape 12/3/2008 SKA SA Postgraduate Bursary Conference Estimation of the parameters.

Slides:



Advertisements
Similar presentations
Probability models- the Normal especially.
Advertisements

Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Sampling: Final and Initial Sample Size Determination
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
Sampling Distributions (§ )
Chapter 5 Estimating Parameters From Observational Data Instructor: Prof. Wilson Tang CIVL 181 Modelling Systems with Uncertainties.
Estimation A major purpose of statistics is to estimate some characteristics of a population. Take a sample from the population under study and Compute.
QUANTITATIVE DATA ANALYSIS
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
Resampling techniques
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
2. Point and interval estimation Introduction Properties of estimators Finite sample size Asymptotic properties Construction methods Method of moments.
- 1 - Summary of P-box Probability bound analysis (PBA) PBA can be implemented by nested Monte Carlo simulation. –Generate CDF for different instances.
Market Risk VaR: Historical Simulation Approach
Maximum likelihood (ML)
Lecture II-2: Probability Review
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Standard error of estimate & Confidence interval.
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
On Model Validation Techniques Alex Karagrigoriou University of Cyprus "Quality - Theory and Practice”, ORT Braude College of Engineering, Karmiel, May.
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
Traffic Modeling.
Random Sampling, Point Estimation and Maximum Likelihood.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
An Empirical Likelihood Ratio Based Goodness-of-Fit Test for Two-parameter Weibull Distributions Presented by: Ms. Ratchadaporn Meksena Student ID:
Stochastic Linear Programming by Series of Monte-Carlo Estimators Leonidas SAKALAUSKAS Institute of Mathematics&Informatics Vilnius, Lithuania
1 Statistical Distribution Fitting Dr. Jason Merrick.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
Managerial Economics Demand Estimation & Forecasting.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Statistical Methods II&III: Confidence Intervals ChE 477 (UO Lab) Lecture 5 Larry Baxter, William Hecker, & Ron Terry Brigham Young University.
Inferential Statistics A Closer Look. Analyze Phase2 Nature of Inference in·fer·ence (n.) “The act or process of deriving logical conclusions from premises.
BPS - 3rd Ed. Chapter 131 Confidence Intervals: The Basics.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Confidence Interval & Unbiased Estimator Review and Foreword.
Summarizing Risk Analysis Results To quantify the risk of an output variable, 3 properties must be estimated: A measure of central tendency (e.g. µ ) A.
Expectation-Maximization (EM) Algorithm & Monte Carlo Sampling for Inference and Approximation.
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
CLASSICAL NORMAL LINEAR REGRESSION MODEL (CNLRM )
POLS 7000X STATISTICS IN POLITICAL SCIENCE CLASS 5 BROOKLYN COLLEGE-CUNY SHANG E. HA Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
10.1 – Estimating with Confidence. Recall: The Law of Large Numbers says the sample mean from a large SRS will be close to the unknown population mean.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Lecture 8: Measurement Errors 1. Objectives List some sources of measurement errors. Classify measurement errors into systematic and random errors. Study.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Chapter 7: The Distribution of Sample Means
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
CHAPTER 6: SAMPLING, SAMPLING DISTRIBUTIONS, AND ESTIMATION Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society.
ESTIMATION.
Inference: Conclusion with Confidence
Statistical Data Analysis
Statistical Methods For Engineers
Volume 111, Issue 2, Pages (July 2016)
Statistical Data Analysis
Inference on the Mean of a Population -Variance Known
Sampling Distributions (§ )
How Confident Are You?.
Presentation transcript:

Lwando Kondlo Supervisor: Prof. Chris Koen University of the Western Cape 12/3/2008 SKA SA Postgraduate Bursary Conference Estimation of the parameters of a truncated Pareto distribution when the sample is contaminated by measurement errors

Introduction 12/3/2008 SKA SA Postgraduate Bursary Conference  The Pareto distribution is a simple model for positive data.  The truncated version has a wide range of application in several field in data analysis [1].  In astronomy and many physical and social sciences, the parameters of this truncated Pareto are estimated to draw inference about the processes underlying the phenomena: 1. To scale up the local observations to global patterns 2. To test theoretical models

Introduction(Cont’d) 12/3/2008 SKA SA Postgraduate Bursary Conference Therefore, it is essential that these parameters be estimated accurately. Unfortunately, the binning-based method traditionally used in astronomy and other fields perform quite poorly [2]. In this presentation, we discuss a more sophisticated method for fitting these parameters based on MLE.

Measurement error model 12/3/2008 SKA SA Postgraduate Bursary Conference The model for a variable measured with error is Where the measurement error is assumed to be independent of X. X is true value, but X is not directly observed, Y is observed instead.

Measurement error model (Cont’d) 12/3/2008 SKA SA Postgraduate Bursary Conference  The PDF or parameters of X are of interest when the objective is to estimate characteristics of the population excluding within-variability.  The estimation of PDF or parameters in the presence of measurement error is also known as deconvolution.

Objectives 12/3/2008 SKA SA Postgraduate Bursary Conference Develop a numerical methodology for deconvolution when the distribution is of Pareto form. Apply the methodology to the real data. Data of cloud masses (GMC) in various galaxies.

Convolution 12/3/2008 SKA SA Postgraduate Bursary Conference If X has the PDF g(.) and has the PDF h(.) Then, the sum Y has the PDF given by the convolution integral The forms of the densities g(.) and h(.) are assumed known, in this case.

Convolution (Cont’d) 12/3/2008 SKA SA Postgraduate Bursary Conference g(.) has a power-law form (i.e., truncated Pareto distribution)

Convolution (Cont’d) 12/3/2008 SKA SA Postgraduate Bursary Conference h(.) has a normal distribution with zero mean

Error-contaminated distribution 12/3/2008 SKA SA Postgraduate Bursary Conference

Non-contaminated distribution 12/3/2008 SKA SA Postgraduate Bursary Conference

Convolution 12/3/2008 SKA SA Postgraduate Bursary Conference The density f(.) is given by This is called an error-contaminated truncated Pareto density function. The only unknowns are the specific parameter values

Parameter Estimation 12/3/2008 SKA SA Postgraduate Bursary Conference MLE – preferred method for estimating parameter values. MLE determines the parameter values that maximise the likelihood of the observed data given the model Specifically, MLE finds the value of L, U, a and that maximise the product of the probabilities of each observed values Y

Parameter Estimation (Cont’d) 12/3/2008 SKA SA Postgraduate Bursary Conference The log-likelihood is The best values of the parameters are obtained by maximising. Optimisation – iterative procedure.

Simulation 12/3/2008 SKA SA Postgraduate Bursary Conference To validate our method, we generated data-sets of sizes n = 200, 400, 600 random points drawn from a truncated Pareto distribution with added normal errors to simulate the effects of measurement error.

Simulation Results 12/3/2008 SKA SA Postgraduate Bursary Conference True values Estimated values LUa nLUa Bias error2.87%1.77%24.08%25.00% Bias error0.35%0.53%21.70%28.45% Bias error0.94%0.88%22.54%5.20%

Application 12/3/2008 SKA SA Postgraduate Bursary Conference We apply the method to the statistical analysis of cloud masses from a survey in various galaxies obtained by radio telescope somewhere. It is known that cloud masses follows a power-law (Pareto) distribution. But the methods used to measure cloud masses are subject to measurement error [5] Instrumental error – chemical evolution, temperature, etc.

Second survey of Molecular clouds, by Fukui et. al., /3/2008 SKA SA Postgraduate Bursary Conference The figure shows the M33 region

Frequency Distribution 12/3/2008 SKA SA Postgraduate Bursary Conference

Results 12/3/2008 SKA SA Postgraduate Bursary Conference L Ua MLE Std errors info. Matrix Std errors Jackknife The lowest mass L we measure for a GMC in M33 is The highest mass U for a GMC in M33 is The masses with power law exponent for a GMC in M33 is The std errors are used to provide an indication of the size of the uncertainty, but its formal use is to provide confidence intervals

Assessing quality of fit 12/3/2008 SKA SA Postgraduate Bursary Conference The cumulative distribution function (CDF) of Y is given as: Then

Assessing quality of fit Graphical AssessmentsGoodness of Fit test 12/3/2008 SKA SA Postgraduate Bursary Conference Probability-Probability (P- P) Plots Kolmogorov-Smirnov (K- S) test

P-P Plots P-P plot compares the theoretical and empirical CDF in terms of their probabilities 12/3/2008 SKA SA Postgraduate Bursary Conference The coordinates of a point on a P- P plot are

K-S GoF test 12/3/2008 SKA SA Postgraduate Bursary Conference The K-S test statistic is based on the maximum distance between the theoretical CDF and the empirical CDF. The K- S statistic

K-S GoF Test 12/3/2008 SKA SA Postgraduate Bursary Conference

Conclusion 12/3/2008 SKA SA Postgraduate Bursary Conference The deconvolution method recovers the properties of the truncated Pareto distribution with very little/no bias. Produces reasonable error estimates from inverse Fisher information matrix and the Jackknife. The probability plot is approximately linear, indicating that the sample comes from the postulated distribution.

Future objectives 12/3/2008 SKA SA Postgraduate Bursary Conference Other distributions Truncated data Comparison with results for other methods.

Acknowledgments 12/3/2008 SKA SA Postgraduate Bursary Conference Acknowledge all people who contributed to the work presented and the funding sources SKA.

References 12/3/2008 SKA SA Postgraduate Bursary Conference [1] Zeninetti, L. and Ferraro, M.(2008). On the truncated Pareto distribution with applications. Central European Journal of Physics. Vol. 6 (1). 1-6 [2] White, E. P., et. al. (2008). “On estimating the exponent of power-law frequency distributions. Ecology. Vol [3] Engargiola, G., et. al., "Giant molecular clouds in M33. I. BIMA all-disk survey, [4] Cordy, C. B. and Thomas, D. ". ApJS 149 (1997). “ Deconvolution of a distribution function”. American Statistical Association. Vol [5] Rosolowsky, E (2005). “The Mass Spectra of Giant Molecular Clouds in the Local Group ” The Astronomical Society of the Pacific, Vol. 117,

12/3/2008 SKA SA Postgraduate Bursary Conference Thank you