From Probability to Distributions

Slides:



Advertisements
Similar presentations
Probability Distributions CSLU 2850.Lo1 Spring 2008 Cameron McInally Fordham University May contain work from the Creative Commons.
Advertisements

Statistics It is the science of planning studies and experiments, obtaining sample data, and then organizing, summarizing, analyzing, interpreting data,
Sampling Distributions (§ )
Copyright © 2010 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing.
CHAPTER 6 Statistical Analysis of Experimental Data
Random Variables and Probability Distributions
© Copyright McGraw-Hill CHAPTER 6 The Normal Distribution.
B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 PROBABILITIES FOR CONTINUOUS RANDOM VARIABLES THE NORMAL DISTRIBUTION CHAPTER 8_B.
Copyright © 2010 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
1 Psych 5500/6500 Standard Deviations, Standard Scores, and Areas Under the Normal Curve Fall, 2008.
8 Sampling Distribution of the Mean Chapter8 p Sampling Distributions Population mean and standard deviation,  and   unknown Maximal Likelihood.
The use of statistics in psychology. statistics Essential Occasionally misleading.
Medical Statistics as a science
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 6- 1.
Sampling Distribution of the Sample Mean
INTRODUCTION TO STATISTICS
Introductory Statistics and Data Analysis
Practice & Communication of Science From Probability to Distributions
Statistical Methods Michael J. Watts
Sampling Distribution Models
BAE 6520 Applied Environmental Statistics
Modeling Distributions of Data
Normal Distribution and Parameter Estimation
BAE 5333 Applied Water Resources Statistics
Unit II: Research Method: Statistics
Practice & Communication of Science From Distributions to Confidence
Normal Distribution.
Statistical Methods Michael J. Watts
Psychology Unit Research Methods - Statistics
Using the t-distribution
Chapter 6. Continuous Random Variables
Practice & Communication of Science
How Scientific Practice How
Z-scores & Shifting Data
Introduction to Summary Statistics
Sampling Distributions and The Central Limit Theorem
Central Limit Theorem, z-tests, & t-tests
Introduction to Summary Statistics
From Distributions to Confidence
Introduction to Summary Statistics
Introduction to Summary Statistics
Scientific Practice How Variation Varies.
The Detail of the Normal Distribution
Introduction to Summary Statistics
The normal distribution
Social Science Statistics Module I Gwilym Pryce
The Normal Distribution
AP Biology Intro to Statistic
Introduction to Summary Statistics
AP Biology Intro to Statistic
AP Biology Intro to Statistic
Normal Distribution Z-distribution.
Lecture 1 Cameron Kaplan
Introduction to Summary Statistics
Further Stats 1 Chapter 5 :: Central Limit Theorem
Introduction to Summary Statistics
Sampling Distributions (§ )
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Random Variables and Probability Distributions
Introduction to Summary Statistics
Sampling Distributions and The Central Limit Theorem
Probability and Statistics
Scientific Practice The Detail of the Normal
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

From Probability to Distributions Practice & Communication of Science From Probability to Distributions

Where Does Variation Come From? Measurements in complex systems (eg living creature) vary… Between organisms, and within (eg over time) As a result of our measurement of it (eg error) Or sometimes as a result of experiments we do For example, aspects of lung function… Forced Vital Capacity (FVC) – total volume of air you can shift in one maximal, forced exhalation dependent on size/elasticity of lungs/thorax Forced Expiratory Volume at 1 second (FEV1) – volume of air you shift in first second of above dependent on airway resistance

Thought Experiment Measure lung function of all 7 billion people on the planet with a very, very, very sensitive spirometer (sub-µl)… 7 billion different sets of readings? Variation due to innumerable influences… Age, height, weight, gender, blood pressure, blood glucose, hormone levels, nutrition at age 5, nutrition at age 6, gene variant tvc15, etc, etc Sitting, standing, air pressure, temperature, voltage, tubing angle, experimenter tiredness, enthusiasm, hydration, position of Mars, etc, etc

Visualising the Variation Here are some actual readings for FVC (litres)… 2.159, 2.065, 1.518, 2.227, 2.09, 2.451, 1.871, 2.571, 2.532, 2.545, 2.538, 2.795, 2.102, 1.804, 2.432, 2.704, 2.258, 2.282, 1.663, 2.795, 2.238, 1.953, 2.382, 2.344, 2.967, 2.68, 2.413, 2.444, 1.953, 2.314, 2.15, 2.634, 2.598, 2.09, 2.641, 2.92, 2.727, 2.307, 2.76, 2.439, 2.259, 2.111, 2.58, 2.602, 2.461, 3.128, 2.241, 2.602, 3.177 We could plot each value along an x-axis… Note how they bunch in middle (and if very close together get displaced upward)

Frequency Distributions Divide the x-axis up into ‘bins’ of a given width Put each reading in the appropriate ‘bin’ In each bin, points sit on top of each other So height of stack represents ‘frequency’ of the range of readings represented by the bin This frequency-distribution has a curved shape bunching in the middle a few extreme values

Distribution of a Big Dataset The 50 readings part of a larger set of data around 2400 readings each point represents up to 10 readings famous ‘Bell Curve’ (an idealised prob density)

Origin of the Bell Curve An individual’s FVC determined by lots of things not all influences are equally influential, but… each will either have a positive or negative effect of the reading So, an individual’s FVC determined by concerted action of countless, seemingly random, +ve and –ve ‘nudges’ throughout their life For a population, the influence of the large number of ‘random’ +ve or –ve effects on each individual  Bell Curve aka Gaussian or Normal Distribution

Modelling Effect of +/- Nudges Falling balls - a simple model (quincunx) No obstacles; ball comes to rest directly below No ‘nudges’ left or right on the way down An obstacle placed in its path will deflect it L or R ‘random’ Layers of obstacles  cumulative effect eg ++-+--+-++ = ++ governed by probability

Rolling Dice  Same Outcome One die… outcome values are 1, 2, 3, 4, 5 or 6 each equally probable (1 in 6) distribution is… boring!

Rolling Dice Two dice… outcome values are 2,3,4,5,6,7,8,9,10,11,12 36 ways of making these each not equally probable only 1 way to get 2 (1+1), 3 ways to get 4, etc distribution is… slightly less boring!

Rolling Dice Three dice… outcome values are 3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18 216 ways of making these each not equally probable 27 ways to throw a 10 or 11, only 1 to get a 3 or 18 distribution is… starting to curve

Rolling Dice Four dice… outcome values are 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,18,20,21, 22,23,24 1926 ways of making these each not equally probable distribution is… looking familiar!

Rolling Dice 24 dice… outcome values are 24 144 4.73838134 × 1018 ways of making these! each not equally probable distribution is… looking very familiar! 120 ‘discrete’ outcomes approximates the Normal Distribution (is, if ‘infinite’)

Why Do Balls & Dice Model FVC?!!! The models generating the ND are uniform ‘pegs’ & layers are the same; balls hit pegs one at a time, one layer at a time; all dice are the same The factors influencing FVC are not like that! so why does a simple model mirror complex reality? The Central Limit Theorem horribly technical, but basically… any time you have a quantity which is bumped around by a large number of random processes, you end up with a bell curve distribution for that quantity. And it really doesn’t matter what those random processes are. They themselves don’t have to follow the Gaussian distribution. So long as there’s lots of them and they’re small, the overall effect is Gaussian (http://scienceblogs.com/ builtonfacts /2009/02/05 /the-central-limit-theorem-made/)

The CLT/ND is everywhere! A picture of the steps in the old Magistrates Court in Lampeter, Wales Each footstep  miniscule erosion Many factors influence position of each footstep essentially ‘random’ and conform to the CLT  ND

Uses of The Normal Distribution Many things in biosystems normally distributed A normal distribution is fully defined by 2 things a mean (average value) a standard deviation (an indication of ‘spread’) So, the 2445 FVC measurements can be summarised as 3.53 ± 0.92 L (n=2445) If we know our data fits a distribution, then do an experiment, statistics can estimate prob that distribution has (not) changed ie an ‘independent’ test of whether our expt worked so distributions/stats central to scientific method!

Distributions, Experiments & Stats Expts ‘prod’ nature to see if s/he responds need to compare ‘before’ and ‘after’ measurements to decide if/how nature responded starting point is nature didn’t respond Null Hypothesis ie distribution of observations identical before & after Things like the CLT tell us how measurements in our expts should distribute statistics looks at the ‘overlap’ between before & after distributions to estimate the probability that… both belong to the same distribution (Null Hypo) the two distributions are different (Alternative Hypo) ie nature responded to whatever we did!

Summary Variation often arises due to complexity of system studied (eg humans) eg lung function varies since factors affecting it vary Displaying data as frequency distributions often  Bell Curve/Normal Distribution Simple systems (quincunx, dice)  Norm Dist Complex systems also  Norm Dist because of the Central Limit Theorem If we know how we expect our data to distribute, we can estimate probability that distribution changes when we do experiments ie we can do science!