Introduction to Distributions and Probability

Slides:



Advertisements
Similar presentations
Acknowledgement: Thanks to Professor Pagano
Advertisements

Probability Distributions CSLU 2850.Lo1 Spring 2008 Cameron McInally Fordham University May contain work from the Creative Commons.
Lecture (7) Random Variables and Distribution Functions.
Segment 3 Introduction to Random Variables - or - You really do not know exactly what is going to happen George Howard.
© 2003 Prentice-Hall, Inc.Chap 5-1 Business Statistics: A First Course (3 rd Edition) Chapter 5 Probability Distributions.
Modeling Process Quality
Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Probability & Statistical Inference Lecture 3
Biostatistics Unit 4 Probability.
Biostatistics Unit 4 - Probability.
Probability theory Much inspired by the presentation of Kren and Samuelsson.
Statistics. Large Systems Macroscopic systems involve large numbers of particles.  Microscopic determinism  Macroscopic phenomena The basis is in mechanics.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview Parameters and Statistics Probabilities The Binomial Probability Test.
CHAPTER 6 Statistical Analysis of Experimental Data
QMS 6351 Statistics and Research Methods Probability and Probability distributions Chapter 4, page 161 Chapter 5 (5.1) Chapter 6 (6.2) Prof. Vera Adamchik.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution Business Statistics: A First Course 5 th.
Examples of continuous probability distributions: The normal and standard normal.
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
Chapter 4 Continuous Random Variables and Probability Distributions
Previous Lecture: Sequence Database Searching. Introduction to Biostatistics and Bioinformatics Distributions This Lecture By Judy Zhong Assistant Professor.
Prof. SankarReview of Random Process1 Probability Sample Space (S) –Collection of all possible outcomes of a random experiment Sample Point –Each outcome.
Problem A newly married couple plans to have four children and would like to have three girls and a boy. What are the chances (probability) their desire.
Random Variables and Probability Distributions Modified from a presentation by Carlos J. Rosas-Anderson.
© Copyright McGraw-Hill CHAPTER 6 The Normal Distribution.
QA in Finance/ Ch 3 Probability in Finance Probability.
B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma.
Probability theory 2 Tron Anders Moger September 13th 2006.
2.1 Random Variable Concept Given an experiment defined by a sample space S with elements s, we assign a real number to every s according to some rule.
PROBABILITY & STATISTICAL INFERENCE LECTURE 3 MSc in Computing (Data Analytics)
Statistics for Engineer Week II and Week III: Random Variables and Probability Distribution.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
1. Population Versus Sample 2. Statistic Versus Parameter 3. Mean (Average) of a Sample 4. Mean (Average) of a Population 5. Expected Value 6. Expected.
Permutations & Combinations and Distributions
Probability The definition – probability of an Event Applies only to the special case when 1.The sample space has a finite no.of outcomes, and 2.Each.
Theory of Probability Statistics for Business and Economics.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 34 Chapter 11 Section 1 Random Variables.
Applied Quantitative Analysis and Practices LECTURE#11 By Dr. Osman Sadiq Paracha.
OPIM 5103-Lecture #3 Jose M. Cruz Assistant Professor.
Random Variables. A random variable X is a real valued function defined on the sample space, X : S  R. The set { s  S : X ( s )  [ a, b ] is an event}.
 A probability function is a function which assigns probabilities to the values of a random variable.  Individual probability values may be denoted by.
ENGR 610 Applied Statistics Fall Week 3 Marshall University CITE Jack Smith.
Biostatistics, statistical software III. Population, statistical sample. Probability, probability variables. Important distributions. Properties of the.
Worked examples and exercises are in the text STROUD PROGRAMME 28 PROBABILITY.
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
Ch5-6: Common Probability Distributions 31 Jan 2012 Dr. Sean Ho busi275.seanho.com HW3 due Thu 10pm Dataset description due next Tue 7Feb Please download:
June 11, 2008Stat Lecture 10 - Review1 Midterm review Chapters 1-5 Statistics Lecture 10.
The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables.
40S Applied Math Mr. Knight – Killarney School Slide 1 Unit: Statistics Lesson: ST-5 The Binomial Distribution The Binomial Distribution Learning Outcome.
Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we.
Review of Chapter
Analisis Non-Parametrik Antonius NW Pratama MK Metodologi Penelitian Bagian Farmasi Klinik dan Komunitas Fakultas Farmasi Universitas Jember.
Probability and Distributions. Deterministic vs. Random Processes In deterministic processes, the outcome can be predicted exactly in advance Eg. Force.
Probability Theory Modelling random phenomena. Permutations the number of ways that you can order n objects is: n! = n(n-1)(n-2)(n-3)…(3)(2)(1) Definition:
Binomial Distributions Chapter 5.3 – Probability Distributions and Predictions Mathematics of Data Management (Nelson) MDM 4U.
Chapter 20 Statistical Considerations Lecture Slides The McGraw-Hill Companies © 2012.
Probability, Sampling, and Inference Q560: Experimental Methods in Cognitive Science Lecture 5.
Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions Basic Business.
Binomial Distributions Chapter 5.3 – Probability Distributions and Predictions Mathematics of Data Management (Nelson) MDM 4U Authors: Gary Greer (with.
Theoretical distributions: the Normal distribution.
MECH 373 Instrumentation and Measurements
Statistical Modelling
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
Elementary Statistics
Chapter 4 Continuous Random Variables and Probability Distributions
Econometric Models The most basic econometric model consists of a relationship between two variables which is disturbed by a random error. We need to use.
Probability.
M248: Analyzing data Block A UNIT A3 Modeling Variation.
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Introduction to Distributions and Probability Statistics for Health Research Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics

Overview Distributions History of probability Definitions of probability Random variable Probability density function Normal, Binomial and Poisson distributions

Introduction to Probability Density Functions Normal Distribution / Gaussian / Bell curve Poisson named after French Mathematician Binomial related to binary factors (Bernoulli Trials)

Early use of Normal Distribution Gauss was a German mathematician who solved mystery of where Ceres would appear after it disappeared behind the Sun. He assumed the errors formed a Normal distribution and managed to accurately predict the orbit of Ceres

What is the relationship between the Normal or Gaussian distribution and probability?

Probability “I cannot believe that God plays dice with the cosmos” Albert Einstein “The probable is what usually happens” Aristotle

Origins of Probability Early interest in permutations Vedic literature 400 BC Distinguished origins in betting and gambling! Pascal and Fermat studied division of stakes in gambling (1654) Enlightenment – seen as helping public policy, social equity Astronomy – Gauss (1801) Social and genetic – Galton (1885) Experimental design – Fisher (1936)

Types of Probability Two basic definitions: 1) Frequentist Classical Proportion of times an event occurs in a long series of ‘trials’ 2) Subjectivist Bayesian Strength of belief in event happening

Frequentists Consider tossing a fair coin In any trial, event may be a ‘head’ or ‘tail’ i.e. binary Repeated tossing gives series of ‘events’ In long run prob of heads=0.5 THTTHHHHTHHHTHHHTTHTTTHHTTHTTHHHTTTHHTHHHTTTTTHHH 0.6 0.56 0.52

Frequentist Probability Note the difference between ‘long run’ probability and an individual trial In an individual trial a head either occurs (X=1) or does not occur (X=0) Patient either survives or dies following an MI Prob of dying after MI ≈ 30% based on a previous long series from a population of individuals who experienced MI

Random Variable Consider rolling 2 dice and we want to summarise the probabilities of all possible outcomes We call the outcome a random variable X which can have any value in this case from 2 to 12 Enumerate all probabilities in sample space S P (2) = 1/6x1/6 = 1/36, P (3)=2/36, P (4) = 3/36, etc…..

Probability Density Function for rolling two dice 2 3 4 5 6 7 8 9 10 11 12 6/36 5/36 4/36 3/36 2/36 1/36 2 3 4 5 6 7 8 9 10 11 12 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36

Probability Density Function for rolling two dice What is probability of getting 12? Answer 1/36 What is probability of getting more than 8? Ans. 10/36 2 3 4 5 6 7 8 9 10 11 12 6/36 5/36 4/36 3/36 2/36 1/36

Probability Density Function for continuous variable 2 3 4 5 6 7 8 9 10 11 12 6/36 5/36 4/36 3/36 2/36 1/36 2 3 4 5 6 7 8 9 10 11 12 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36

Consider distribution of weight in kg; all values possible not just discrete 20…….30……40…… 50 ……60…….70…….80…..90….100….110…… 120 Probability 2 3 4 5 6 7 8 9 10 11 12 Weight in kilograms

Probability Density Function in SPSS Use Analyze / Descriptive Statistics / Frequencies and select no table and charts box as below

Probability Density Function in SPSS Data from ‘LDL Data.sav’ of baseline LDL cholesterol

Normal Distribution Note that a Normal or Gaussian curve is defined by two parameters: Mean µ and Standard Deviation σ And often written as N ( µ, σ ) Hence any Normal distribution has mathematical form Impossible to be integrated so area under the curve obtained by numerical integration and tabulated!

Normal Distribution As noted earlier the curve is symmetrical about the mean and so p ( x ) > mean = 0.5 or 50% And p ( x ) < mean = 0.5 or 50% And p (a < x < b) = p(b) – p(a) 50% 50%

Normal Distribution and Probabilities So we now have a way of working out the probability of any value or range of values of a variables IF a Normal distribution is a reasonable fit to the data p (a < x < b) = p(b) – p(a) which is the area under the curve between a and b 50% 50%

Normal Distribution Most of area lies between +1 and -1 SD (64%) The large majority lie between +2 and -2 SDs (95%)

Normal Distribution Probability Density Function (PDF) =

How well does my data fit a Normal Distribution? Note median and mean virtually the same Skewness = 0.039, close to zero Skewness is measure of symmetry (0=perfect symmetry) Eyeball test - fitted normal curve looks good!

Try Q-Q plot in Analyze / Descriptive Statistics/ Q-Q plot Plot compares Expected Normal distribution with real data and if data lies on line y = x then the Normal Distribution is a good fit Note still an eyeball test! Is this a good fit?

I used to be Normal until I discovered Kilmogorov-Smirnoff! Eyeball Test indicates distribution is approximately Normal but K-S test is significant indicating discrepancy compared to Normal WARNING: DO NOT RELY ON THIS TEST

Consider the distribution of survival times following surgery for colorectal cancer Note median=835 days and mean=848 Skewness = 2.081, very skewed (> 1.0) Strong tail to right! Approximately Normal?

Try a log transformation for right positive skewed data? Better but now slightly skewed to left!

Examples of skewed distributions in Health Research Discrete random variables – hospital admissions, cigarettes smoked, alcohol consumption, costs Continuous RV – BMI, cholesterol, BP 30%

The Binomial Distribution ‘Binomial’ means ‘two numbers’. Outcomes of health research are often measured by whether they have occurred or not i.e binary. For example, recovered from disease, admitted to hospital, died, etc May be modelled by assuming that the number of events n has a binomial distribution with a fixed probability of event p

The Binomial Distribution Based on work of Jakob Bernoulli, a Swiss mathematician Refused a church appointment and instead studied mathematics Early use was for games of chance but now used in every human endeavour When n = 1 this is called a Bernoulli trial Binomial distribution is distribution for a series of Bernoulli trials

The Binomial Distribution Binomial distribution written as B ( n , p) where n is the total number of events and p = prob of an event This is a Binomial Distribution with p=0.25 and n=20

The Binomial Distribution Binomial distributions used for binary factors and so used to assess percentages or proportions Utilised in Cross-tabulation and logistic regression Note as N gets larger or P ~0.5 then Binomial is Equal to Normal Distr. B(n,p) ~ N (np, np(1-p))

The Poisson Distribution Poisson distribution (1838), named after its inventor Simeon Poisson who was a French mathematician. He found that if we have a rare event (i.e. p is small) and we know the expected or mean ( or µ) number of occurrences, the probabilities of 0, 1, 2 ... events are given by:

The Poisson Distribution Note similarity to Binomial In fact when p is small and n is large B(n, p) ~ P (µ = np) Also for large values of µ: P (µ) ~ N ( µ, µ ) Hence if n and p not known could use Poisson instead

The Poisson Distribution In health research often used to model the number of events assumed to be random: Number of hip replacement failures, Number of cases of C. diff infection, Diagnoses of leukaemia around nuclear power stations, Number of H1N1 cases in Scotland, Etc.

Summary Many of variables measured in Health Research form distributions which approximate to common distributions with known mathematical properties Normal, Poisson, Binomial, etc… Note a relationship for all centred around the exponential distribution Where e = 2.718 All belong to the Exponential Family of distributions These probability distributions are critical to applying statistical methods

SPSS Practical Read in data file ‘LDL Data.sav’ Consider adherence to statins, baseline LDL, min Chol achieved, BMI, duration of statin use Assess distributions for normality If non-normal consider a transformation Try to carry out Q-Q plots