Probability Model Fitting Steps

Slides:



Advertisements
Similar presentations
DESCRIBING DISTRIBUTION NUMERICALLY
Advertisements

Lecture 17 Sec Wed, Feb 13, 2008 Boxplots.
Descriptive Measures MARE 250 Dr. Jason Turner.
Statistics It is the science of planning studies and experiments, obtaining sample data, and then organizing, summarizing, analyzing, interpreting data,
Beginning the Visualization of Data
Ka-fu Wong © 2003 Chap 8- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 3: Central Tendency And Dispersion.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 4: The Normal Distribution and Z-Scores.
Percentiles Def: The kth percentile is the value such that at least k% of the measurements are less than or equal to the value. I.E. k% of the measurements.
Understanding and Comparing Distributions
Use of Quantile Functions in Data Analysis. In general, Quantile Functions (sometimes referred to as Inverse Density Functions or Percent Point Functions)
Quartiles & Extremes (displayed in a Box-and-Whisker Plot) Lower Extreme Lower Quartile Median Upper Quartile Upper Extreme Back.
Chapter 2 Describing Data with Numerical Measurements
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
REPRESENTATION OF DATA.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Fundamental Graphics in R Prof. Ke-Sheng Cheng Dept. of Bioenvironmental Systems Eng. National Taiwan University.
Rules of Data Dispersion By using the mean and standard deviation, we can find the percentage of total observations that fall within the given interval.
Numerical Descriptive Techniques
Review Measures of central tendency
Lesson 2 - R Review of Chapter 2 Describing Location in a Distribution.
Measures of Relative Standing Percentiles Percentiles z-scores z-scores T-scores T-scores.
1 Further Maths Chapter 2 Summarising Numerical Data.
Math 3680 Lecture #1 Graphical Representation of Data.
EAS31116/B9036: Statistics in Earth & Atmospheric Sciences Lecture 3: Probability Distributions (cont’d) Instructor: Prof. Johnny Luo
Lecture 7 Sections 2.3 – 2.4 Objectives: More Detailed Summary Quantities − Quartiles and IQR − Boxplots − Quantile Plots.
1 WHY WE USE EXPLORATORY DATA ANALYSIS DATA YES NO ESTIMATES BASED ON NORMAL DISTRIB. KURTOSIS, SKEWNESS TRANSFORMATIONS QUANTILE (ROBUST) ESTIMATES OUTLIERS.
Lesson 2 - R Review of Chapter 2 Describing Location in a Distribution.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Displaying the Observed Distribution of Quantitative Variables Histogram –Divide the range of the variable into equally spaced intervals - called bins.
Example - Fax Here are the number of pages faxed by each fax sent from our Math and Stats department since April 24 th, in the order that they occurred.
Normalizing Transformations and fitting a marginal distribution
Exploratory Data Analysis
Box and Whiskers with Outliers
Probability & Statistics
Chapter 16: Exploratory data analysis: numerical summaries
BAE 6520 Applied Environmental Statistics
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Chapter 8: Introduction to Statistics CIS Computational Probability.
Get out your notes we previously took on Box and Whisker Plots.
Welcome to Week 04 Tues MAT135 Statistics
BAE 5333 Applied Water Resources Statistics
Correlation, Bivariate Regression, and Multiple Regression
Chapter 5 : Describing Distributions Numerically I
Sampling distribution
Boxplots.
Assumption of normality
Chapter 16: Exploratory data analysis: Numerical summaries
Two Concepts of Probability
Probablity Density Functions
AP Lab Skills Guide Data will fall into three categories:
NUMERICAL DESCRIPTIVE MEASURES
2.6: Boxplots CHS Statistics
Alafia river: Autocorrelation Autocorrelation of standardized flow.
Numerical Measures: Skewness and Location
Understanding and Comparing Distributions
Chapter 5 Stories Quantitative Data Tell.
Fundamental Graphics in R
ANATOMY OF A BOXPLOT: Traditional Boxplot
Goodness-of-Fit Tests Applications
Tutorial 9 Suppose that a random sample of size 10 is drawn from a normal distribution with mean 10 and variance 4. Find the following probabilities:
Exploratory data analysis: numerical summaries
Types of (random) variables
Continuous Statistical Distributions: A Practical Guide for Detection, Description and Sense Making Unit 3.
Describing a Skewed Distribution Numerically
Boxplots.
Walter Jetz, Dustin R. Rubenstein  Current Biology 
. . Box and Whisker Measures of Variation Measures of Variation 8 12
Boxplots.
MATH 2311 Section 1.5.
Professor Ke-Sheng Cheng
Presentation transcript:

Probability Model Fitting Steps For a given data x1, x2, …, xN Plot the histogram with the default bin width >hist(x, probability=T, …, …) Also plot the boxplot Select candidate PDFs based on the histogram and boxplot Fit the PDFs – i.e., compute the parameters of the PDFs Evaluate the PDF at the observational points or on a finer grid (for better plotting) E.g., to fit a Normal PDF to the data >theta1 = mean(x) >theta2 = sd(x) >fitnormpdf = dnorm(sort(x), theta1, theta2) Similarly fit all the candidate PDFs to the data – using the appropriate commands

Goodness of Fit The goodness of the fitted probability model can be evaluated in two ways – visual and Quantitative Visual Overlay the fitted PDF on the histogram >lines(sort(x), fitnormpdf, col=“red”) Quantile plots Compute the quantiles from the fitted PDFs and plot them agains the empirical quantiles. If they fall on a straight line then the model is a good fit. Empirical quantile P_i = (i- a)/(N + 1 – 2a) i is the ‘rank’ of the observation x_i and a = 0 If the observations are sorted then their ranks are simply the sequential order >N = length(x) >empquant = (1:N)/(N+1) >fitnormquant = qnorm(empquant, theta1, theta2) >plot(fitnormquant, sort(x), xlab=“Model Quantiles”, ylab= “Empirical Quantiles”)

Goodness of Fit Quantitative Perform a Kolmogrov-Smirnov (K-S) test on the empirical CDF and the fitted model CDF >ksnorm = ks.test(x, “pnorm”, mean=theta1, sd=theta2) This compares the empirical CDF with the fitted Normal PDF If ksnorm$p.value is ‘greater than or equal’ to 0.05 it implies that At 95% confidence level the empirical and the fitted distribution are not different Use the quantitative and visual metrics to decide on the best model

Boxplot One step = 1.5*IQR IQR = 33 –26 = 7 °F One step = 1.5*7 = 10.5 °F Lower inner fence = 26 – 10.5 = 15.5 °F Upper inner fence = 33 + 10.5 = 43.5 °F The whiskers are drawn to the most extreme temperatures inside the inner fences, 37 and 17 °F. The whiskers are therefore shortened to extend only to the last observation within one step beyond either end of the box (“adjacent values”). One step = 1.5*IQR

Histogram Histogram of Ithaca temperature, January 1987.

Histogram and Probability Density Function