Additional Slides on Bayesian Statistics for STA 101 Prof. Jerry Reiter Fall 2008.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Previous Lecture: Distributions. Introduction to Biostatistics and Bioinformatics Estimation I This Lecture By Judy Zhong Assistant Professor Division.
Review of Probability. Definitions (1) Quiz 1.Let’s say I have a random variable X for a coin, with event space {H, T}. If the probability P(X=H) is.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
A Brief Introduction to Bayesian Inference Robert Van Dine 1.
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one.
Parameter Estimation using likelihood functions Tutorial #1
Ch 6 Introduction to Formal Statistical Inference.
Bayesian learning finalized (with high probability)
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Today Today: Chapter 9 Assignment: 9.2, 9.4, 9.42 (Geo(p)=“geometric distribution”), 9-R9(a,b) Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Visual Recognition Tutorial
Thanks to Nir Friedman, HU
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
Likelihood probability of observing the data given a model with certain parameters Maximum Likelihood Estimation (MLE) –find the parameter combination.
PBG 650 Advanced Plant Breeding
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
AP Statistics Chapter 9 Notes.
Statistical Decision Theory
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Random Sampling, Point Estimation and Maximum Likelihood.
Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
STA Lecture 181 STA 291 Lecture 18 Exam II Next Tuesday 5-7pm Memorial Hall (Same place) Makeup Exam 7:15pm – 9:15pm Location TBA.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
STA Lecture 191 STA 291 Lecture 19 Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm – 9:15pm Location CB 234.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Bayesian Inference, Review 4/25/12 Frequentist inference Bayesian inference Review The Bayesian Heresy (pdf)pdf Professor Kari Lock Morgan Duke University.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
The final exam solutions. Part I, #1, Central limit theorem Let X1,X2, …, Xn be a sequence of i.i.d. random variables each having mean μ and variance.
Confidence Interval & Unbiased Estimator Review and Foreword.
Statistics 300: Elementary Statistics Sections 7-2, 7-3, 7-4, 7-5.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Review Normal Distributions –Draw a picture. –Convert to standard normal (if necessary) –Use the binomial tables to look up the value. –In the case of.
Week 41 How to find estimators? There are two main methods for finding estimators: 1) Method of moments. 2) The method of Maximum likelihood. Sometimes.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
Sampling considerations within Market Surveillance actions Nikola Tuneski, Ph.D. Department of Mathematics and Computer Science Faculty of Mechanical Engineering.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Bayesian Approach Jake Blanchard Fall Introduction This is a methodology for combining observed data with expert judgment Treats all parameters.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Conditional Expectation
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
Bayesian Estimation and Confidence Intervals Lecture XXII.
Data Modeling Patrice Koehl Department of Biological Sciences
Oliver Schulte Machine Learning 726
Bayesian Estimation and Confidence Intervals
Probability Theory and Parameter Estimation I
Inference: Conclusion with Confidence
Parameter Estimation 主講人:虞台文.
Bayes Net Learning: Bayesian Approaches
CONCEPTS OF ESTIMATION
An introduction to Bayesian reasoning Learning from experience:
More about Posterior Distributions
EC 331 The Theory of and applications of Maximum Likelihood Method
Statistical NLP: Lecture 4
CS 594: Empirical Methods in HCC Introduction to Bayesian Analysis
Parametric Methods Berlin Chen, 2005 References:
CS639: Data Management for Data Science
Applied Statistics and Probability for Engineers
MBA 510 Lecture 4 Spring 2013 Dr. Tonya Balan 10/30/2019.
Presentation transcript:

Additional Slides on Bayesian Statistics for STA 101 Prof. Jerry Reiter Fall 2008

Can we use this method to learn about means and percentages? To learn about population averages and percentages, we’ve used data (like the DNA test results), but not prior information (like the list of suspects). We show how to combine data and prior information in class.

Combining the prior beliefs and the data using Bayes Rule In Bayes rule problem before break, we combine the prior beliefs and the data using Bayes rule. Pr(p|X=1) represents our posterior beliefs about µ.

Estimation of unknown parameters in statistical models (Bayesian and non-Bayesian) Suppose we posit a probability distribution to model data. How do we estimate its unknown parameters? Example: assume data follow regression model. Where do the estimates of the regression coefficients come from? Classical statistics: maximum likelihood estimation. Bayesian statistics: Bayes rule.

Estimating percentage of Dukies who plan to get advanced degree Suppose we want to estimate the percentage of Duke students who plan to get an advanced degree (MBA, JD, MD, PhD, etc.). Call this percentage p. We sample 20 people at random, and 8 of them say they plan to get an advanced degree. What should be our estimate of p?

Estimating the average IQ of Duke professors Let µ be the population average IQ of Duke profs. Suppose we randomly sample 25 Duke profs and record their IQs. What should be our estimate of µ?

Maximum likelihood estimation: A principled approach to estimation Usually we can use subject-matter knowledge to specify a distribution for the data. But, we don’t know the parameters of that distribution. 1) Number out of 20 who want advanced degree: binomial distribution. 2) Profs’ IQs: normal distribution.

Maximum likelihood estimation We need to estimate the parameters of the distribution. Why do we care? A) So we can make probability statements about future events. B) The parameters themselves may be important.

Maximum likelihood estimation The maximum likelihood estimate of the unknown parameter is the value for which the data were most likely to have occurred. Let’s see how this works in the examples.

Advanced degree example Let Y be the random variable for the number of people out of 20 that plan to get an advanced degree. Y has a binomial distribution with n = 20, and unknown probability p. In the data, Y= 8. If we knew p, the value of the probability distribution function at Y= 8 would be:

MLE for degree example Let’s graph Pr(Y = 8) as a function of the unknown p. Label the function L(p). L(p) is called the likelihood function for p.

Maximum likelihood The maximum likelihood estimate of p is the value of p that maximizes L(p). This is a reasonable estimate because it is the value of p for which the observed data (y= 8 ) had the greatest chance of occurring.

Finding the MLE for degree example To maximize the likelihood function, we need to take the derivative of with respect to p, set it equal to zero, and finally solve for p. You get the sample percentage!

Estimating the average IQ of Duke professors Let µ be the population average IQ of Duke profs. Suppose we randomly sample 25 Duke profs and record their IQs. What should be our estimate of µ?

Model for Professors’ IQs The mathematical function for a normal curve for any prof’s IQ, which we label Y, is: All normal curves have this form, with different means and SDs. Here, we’ll assume the σ = 15. We don’t know µ, which is what we’re after.

Model for all 25 IQs We need the function for all 25 IQs. Assuming each prof’s IQ is independent of other profs’ IQs, we have

Model for all 25 IQs With some algebra and simplifications, the likelihood function is:

Likelihood function and maximum likelihood estimates A graph of the likelihood function looks something like this: The function is maximized when µ is the sample average. So, we use as our estimate of the average Duke prof’s IQ. This sample average is the MLE for µ in any normal curve.

The Bayesian approach to estimation of means Let’s show how to combine data and prior information to address the following motivating question: What is a likely range for the average IQ of Duke professors?

Combining the prior beliefs and the data using Bayes Rule We combine our prior beliefs and the data using Bayes rule. f(µ|data) represents our posterior beliefs about µ.

Formalizing a model for prior information Let’s assign a distribution for µ that reflects our a priori beliefs about its likely range. Label this f(µ). Using the data you supplied in class, the curve describing our beliefs about µ is the normal curve with mean = 128 SD = 15

Mathematical equation for normal curve We can write down the equation for this normal curve.

Model for the data (25 IQs) If we knew µ, the model for the data (the professors’ IQs) is

Estimating the average IQ of Duke professors Let µ be the population average IQ of Duke profs. Suppose we randomly sample 25 Duke profs and record their IQs. What should be our estimate of µ?

Combining the prior beliefs and the data using Bayes Rule We combine the model for the prior beliefs and the model for the data using Bayes rule. f(µ|data) represents our posterior beliefs about µ.

Posterior distribution Using calculus, one can show that f(µ|data) is a normal curve with mean = SD =

Posterior distribution For our data and prior beliefs, the posterior beliefs, f(µ|data), is a normal curve with mean = SD =

Using the posterior distribution to summarize beliefs about µ Because f(µ|data) describes beliefs about µ, we can make probability statements about µ. For example, using a normal curve with mean equal to and SD equal to 2.314, Pr(µ > 130 | data) =.813 A 95% posterior interval for µ stretches from to

Bayesian statistics in general Bayesian methods exist for any population parameter, including percentiles, maxima and minima, ratios, etc. The method is general: 1) specify a mathematical curve that reflects prior beliefs about the population parameter. 2) specify a mathematical curve that describes the distribution of the data, given a value of the population parameter. 3) combine the curves from 1 and 2 mathematically to get posterior beliefs for the parameter, updated for the data.

Differences between frequentist and Bayesian FREQUENTIST  Parameters are not random.  Confidence intervals. BAYESIAN oParameters are random. oPosterior distributions.