USC3002 Picturing the World Through Mathematics Wayne Lawton Department of Mathematics S14-04-04, 65162749 Theme for Semester I, 2007/08.

Slides:



Advertisements
Similar presentations
“Students” t-test.
Advertisements

CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Statistics review of basic probability and statistics.
Introduction to Statistics
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Lecture 5 Outline – Tues., Jan. 27 Miscellanea from Lecture 4 Case Study Chapter 2.2 –Probability model for random sampling (see also chapter 1.4.1)
Topic 2: Statistical Concepts and Market Returns
Evaluating Hypotheses
Lecture Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population.
Inference about a Mean Part II
Part III: Inference Topic 6 Sampling and Sampling Distributions
Experimental Evaluation
Copyright © 2014, 2013, 2010 and 2007 Pearson Education, Inc. Chapter Hypothesis Tests Regarding a Parameter 10.
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
5-3 Inference on the Means of Two Populations, Variances Unknown
“There are three types of lies: Lies, Damn Lies and Statistics” - Mark Twain.
Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Sampling Theory Determining the distribution of Sample statistics.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
Statistical Inference for Two Samples
AM Recitation 2/10/11.
Hypothesis Testing:.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
STAT 5372: Experimental Statistics Wayne Woodward Office: Office: 143 Heroy Phone: Phone: (214) URL: URL: faculty.smu.edu/waynew.
Sampling Theory Determining the distribution of Sample statistics.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
1 Dec 2011COMP80131-SEEDSM81 Scientific Methods 1 Barry & Goran ‘Scientific evaluation, experimental design & statistical methods’ COMP80131 Lecture 8:
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Random Sampling, Point Estimation and Maximum Likelihood.
Mid-Term Review Final Review Statistical for Business (1)(2)
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
LECTURER PROF.Dr. DEMIR BAYKA AUTOMOTIVE ENGINEERING LABORATORY I.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
One-Sample Tests of Hypothesis Chapter 10 McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved.
USC3002 Picturing the World Through Mathematics Wayne Lawton Department of Mathematics S , Theme for Semester I, 2008/09.
Introduction to the Practice of Statistics Fifth Edition Chapter 6: Introduction to Inference Copyright © 2005 by W. H. Freeman and Company David S. Moore.
Physics 270 – Experimental Physics. Let say we are given a functional relationship between several measured variables Q(x, y, …) x ±  x and x ±  y What.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
USC3002 Picturing the World Through Mathematics Wayne Lawton Department of Mathematics S , Theme for Semester I, 2008/09.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Inference: Probabilities and Distributions Feb , 2012.
Inen 460 Lecture 2. Estimation (ch. 6,7) and Hypothesis Testing (ch.8) Two Important Aspects of Statistical Inference Point Estimation – Estimate an unknown.
SP2170 Doing Science Lecture 3: Random Variables, Distributions, Inductive & Abductive Reasoning, Experiments Wayne M. Lawton Department of Mathematics.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
© Copyright McGraw-Hill 2004
Review of Statistics.  Estimation of the Population Mean  Hypothesis Testing  Confidence Intervals  Comparing Means from Different Populations  Scatterplots.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
T tests comparing two means t tests comparing two means.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.
Chapter 9 Introduction to the t Statistic
Chapter Nine Hypothesis Testing.
Virtual University of Pakistan
3. The X and Y samples are independent of one another.
Chapter 8: Inference for Proportions
Towson University - J. Jung
Chapter 9 Hypothesis Testing.
Problems: Q&A chapter 6, problems Chapter 6:
Sampling Distributions
Presentation transcript:

USC3002 Picturing the World Through Mathematics Wayne Lawton Department of Mathematics S , Theme for Semester I, 2007/08 : The Logic of Evolution, Mathematical Models of Adaptation from Darwin to Dawkins

Probability and Statistics play an increasingly crucial role in evolution research MOTIVATION SGWID= /ai_n /pg_16

SP2170 Doing Science Lecture 3: Random Variables, Distributions, Inductive & Abductive Reasoning, Experiments SOURCE OF LECTURE VUFOILS

[1] Rudolph Carnap, An Introduction to the Philosophy of Science, Dover, N.Y., [2] Leong Yu Kang, Living With Mathematics, McGraw Hill, Singapore, (GEM Textbook) (1 Reasoning, 2 Counting, 3 Graphing, 4 Clocking, 5 Coding, 6 Enciphering, 7 Chancing, 8 Visualizing) MATLAB Demo Random Variables & Distributions Discuss Topics in Chap. 2-4 in [1], Chap. 1, 7 in [2]. Baye’s Theorem & The Envelope Problem, Deductive, Inductive, and Abductive Reasoning. Assign computational tutorial problems. REFERENCES

RANDOM VARIABLES The number that faces up on an ‘unloaded’ dice rolled on a flat surface is in the set { 1, 2, 3, 4, 5, 6 } and the probability of each number is equal and hence = 1/6 After rolling a dice, the number is fixed to those who know it but remains an unknown, or random variable to those who do not know it. Even while it is still rolling, a person with a laser sensor connected with a sufficiently powerful computer may be able to predict with some accuracy the number that will come up. This happened and the Casino was not amused !

MATLAB PSEUDORANDOM VARIABLES The MATLAB (software) function rand generates decimal numbers d / that behaves as if d is a random variable with values in the set {0,1,2,…,9999} with equal probability. It is a pseudorandom variable. It provides an approximation of a random variable x with values in the interval [0,1] of real numbers such that for all 0 < a < b < 1 the probability that x is in the interval [a,b] equals b-a = length of [a,b]. These are called uniformly distributed random variables.

PROBABILITY DISTRIBUTIONS Random variables with values in a set of integers are described by discrete distributions Uniform (Dice), Prob(x = k) = 1/6 for k = 1,…,6 Poisson Prob(x = k) = a^k exp(-a) / k! for k > -1 where k is the event that k-atoms of radium decay if a is the average number of atoms expected to decay. Binomial Prob(x = k) = a^k (1-a)^(n-k) n!/(n-k)!k! for k = 0,1,…,n where an event that has probability a occurs k times out of a maximum of n times and k! = 1*2…*(k-1)*k is called k factorial.

PROBABILITY DISTRIBUTIONS Random variables with values in a set of real numbers are described by continuous distributions Uniform over the interval [0,1] Gaussian or Normal here and

MATLAB HELP COMMAND >> help hist HIST Histogram. N = HIST(Y) bins the elements of Y into 10 equally spaced containers and returns the number of elements in each container. If Y is a matrix, HIST works down the columns. N = HIST(Y,M), where M is a scalar, uses M bins. >> help rand RAND Uniformly distributed random numbers. RAND(N) is an N-by-N matrix with random entries, chosen from a uniform distribution on the interval (0.0,1.0). RAND(M,N) is a M-by-N matrix with random entries.

MATLAB DEMONSTRATION 1 Why do these histograms look different ?

MATLAB DEMONSTRATION 2 >> x = rand(10000,1); >> hist(x,41)

MORE MATLAB HELP COMMANDS >> help randn RANDN Normally distributed random numbers. RANDN(N) is an N-by-N matrix with random entries, chosen from a normal distribution with mean zero, variance one and standard deviation one. RANDN(M,N) is a M-by-N matrix with random entries. >> help sum SUM Sum of elements. For vectors, SUM(X) is the sum of the elements of X. For matrices, SUM(X) is a row vector with the sum over each column.

MATLAB DEMONSTRATION 3 >> s = -4:.001:4; >> plot(s,exp(s.^2/2)/(sqrt(2*pi))) >> grid

MATLAB DEMONSTRATION 3 >> x = randn(10000,1); >> hist(x,41)

MATLAB DEMONSTRATION 3 >> x = rand(5000,10000); >> y = sum(x); >> hist(y,41)

CENTRAL LIMIT THEOREM The sum of N real-valued random variables y = x(1) + x(2) + … + x(N) will be a random variable. If the x(j) are independent and have the same distribution then as N increases the distributions of y will approach (means gets closer and closer to) a Gaussian distribution. The mean of this Gaussian distribution = N times the (common) mean of the x(j) The variance of this Gaussian distribution = N times the (common) variance of the x(j)

CONDITIONAL PROBABILITY Recall that on my dice the ‘numbers’ 1 and 4 are red and the numbers 2, 3, 5, 6 are blue. I roll one dice without letting you see how it rolls. What is the probability that I rolled a 4 ? I repeat the procedure BUT tell you that the number is red. What is the probability that I rolled a 4 ? This probability is called the conditional probability that x = 4 given that x is red (i.e. x in {1,4})

CONDITIONAL PROBABILITY If A and B are two events then event that BOTH event A and event B happen. Common sense implies the following LAW: denotes the Example Consider the roll of a dice. Let A be the event x = 4 and let B be the event x is red (= 1 or 4) Question What does the LAW say here ?

BAYE’s THEOREM Prob(A) and Prob(B) are called marginal distributions. for an event A, denotes the event not A Question Why does

INDUCTIVE & ABDUCTIVE REASONING Inductive reasoning is the process of reasoning in which the premises of an argument support the conclusion but do not ensure it. This is in contrast to Deductive reasoning in which the conclusion is necessitated by, or reached from, previously known facts. The philosopher Charles Peirce introduced abduction into modern logic. In his works before 1900, he mostly uses the term to mean the use of a known rule to explain an observation, e.g., “if it rains the grass is wet” is a known rule used to explain that the grass is wet. He later used the term to mean creating new rules to explain new observations, emphasizing that abduction is the only logical process that actually creates anything new. Namely, he described the process of science as a combination of abduction, deduction and implication, stressing that new knowledge is only created by abduction. Abductive reasoning, is the process of reasoning to the best explanations. In other words, it is the reasoning process that starts from a set of facts and derives their most likely explanations.

EXPERIMENTS Carnap p. 41 [1] “One of the great distinguishing features of modern science, as compared to the science of earlier periods, is its emphasis on what is called the “experimental method”. “ Question How does the experimental method differ from the method of observation ? Question What fields favor the experimental methods and what fields do not and why ? Ideal Gas Law - one of the greatest experiments !

TUTORIAL QUESTIONS Question 1. The uniform distribution on [0,1] has mean ½ and variance 1/12. Use the Central Limit Theorem to compute the mean and variance of the random variable y whose histogram is shown in vufoil # 13. Question 2. I roll a dice to get a random variable x in {1,2,3,4,5,6}, then put x dollars in one envelope and put 2x in another envelope then flip a coin to decide which envelope to give you (so that you receive the smaller or larger amount with equal probability). Use Baye’s Theorem to compute the probability that you received the smaller amount CONDITIONED on YOUR FINDING THAT YOU HAVE 1,2,3,4,5,6,8,10,12 dollars. Then use these conditional probabilities to explain the Envelope Paradox.

USC2170 Lecture 4: Hypothesis Testing SOURCE OF LECTURE VUFOILS

1.Populations and Samples 2.Sample Population Statistics 3.Statistical Hypothesis 4. Test Statistics for Gaussian Hypotheses Sample Mean for Parameter Estimation z-Test and t-Test Statistics Rejection/Critical Region for z-Test Statistic Hypothesis Test for Mean Height 5. General Hypotheses Tests Type I and Type II Errrors Null and Alternative Hypotheses 6. Assign Tutorial Problems PLAN FOR LECTURE

Population - a specified collection of quantities: e.g. heights of males in a country, glucose levels of a collection of blood samples, batch yields of an industrial compound for a chemical plant over a specified time with and without the use of a catalyst Sample Population – a population from which samples are taken to be used for statistical inference Sample - the subset of the sample population consisting of the samples that are taken. POPULATIONS AND SAMPLES

Sample SAMPLE POPULATION PARAMETERS Sample Size Sample Parameters Sample Mean Sample Variance Sample Standard Deviation

Theorem 1 The variance of a population is related to its mean and average squared values by SAMPLE POPULATION PARAMETERS Proof Since Question How can the proof be completed ? Why ?

are assertions about a population that describe some statistical properties of the population. STATISTICAL HYPOTHESES For Gaussian distributions there are four possibilities: Typically, statistical hypotheses assert that a population consists of independent samples of a random variable that has a certain type of distribution and some of the parameters that describe this distribution may be specified. Neither the mean nor the variance is specified. Only the variance is specified. Only the mean is specified. Both the mean and the variance are specified.

The sample mean for TEST STATISTICS for Hypothesis with Gaussian Distributions unknown, is Gaussian with mean 0 and variance 1/n. Proof (Outline) We let denote the mean of a random variable Y. Then clearly known Independence and Theorem 1 gives

The sample mean for PARAMETER ESTIMATION for Hypothesis with Gaussian Distributions unknown, can be used to estimate the mean since the estimate error known is unbiased and converges in the statistical sense that

The One Sample z-Test for MORE TEST STATISTICS for Hypothesis with Gaussian Distributions known is a Gaussian random variable with mean 0,variance 1. The One Sample t-Test for known, unknown is a t-distributed random variable with n-1 degrees of freedom.

z-TEST STATISTIC ALPHAS

CRITICAL REGION FOR alpha=0.05

HEIGHT HISTOGRAMS

HYPOTHESIS TEST FOR MEAN HEIGHT You suspect that the height of males in a country has increased due to diet or a Martian conspiracy, you aim to support your Alternative Hypothesis by testing the Null Hypothesis You compute a sample mean using 20 samples then compute If the Null Hypothesis is true the probability that is Question Should the Null Hypothesis be rejected ?

GENERAL HYPOTHESES TESTS and more complicated test statistics, such as the One Sample t-Test statistic, whose distribution is determined even though the distributions of the Gaussian random samples, used to compute it, is not. Type I Error: prob rejecting null hypothesis if its true, also called the significance level Type II Error: prob failing to reject null hypothesis if its false, also called the power of a test, requires an Alternative Hypothesis that determines the distribution of the test statistic. involve

TUTORIAL QUESTIONS where and are the same as for the null hypothesis and 20 samples are used and the significance 1. Compute the power of a hypothesis test whose null hypothesis is that in vufoil #13, the alternative hypothesis asserts that heights are normally distributed Suggestion: if the alternative hypothesis is true, what What is the probability that is the distribution of test statistic with 2. Use a t-statistic table to describe how to test the null hypothesis that heights are normal with mean and unknown variance based on 20 samples.

EXTRA TOPIC: CONFIDENCE INTERVALS Given a sample meanfor largewe can assume, by the central limit theorem that it is Gaussian with We say that where p(x) is the probability density of a Gaussian meanmean of the original population and variance Furthermore, variance of the original population. sample variance and if the with confidence with meanand standard deviation Theorem If is a random variable unif. on [-L,L] then Bayes Theorem  population is {0,1}-valued

EXTRA TOPIC: TWO SAMPLE TESTS A null hypothesis may assert a that two populations have the same means, a special case for {0,1}-valued populations asserts equalily of population proportions. Under these assumptions and if the variances of both populations are known, hypothesis testing uses the Two-Sample z-Test Statistic whereis the sample mean, variance, and sample size for one population, tilde’s for the other. For unkown variances and other cases consult:

EXTRA TOPIC: CHI-SQUARED TESTS are used to determine goodness-or-fit for various distributions. They employ test statistics of the form where observations & null hyp.  expected value Answer: The expected values are 250, 750, 750, 2250 Example [1,p.216] A geneticist claims that four species of fruit flies should appear in the ratio 1:3:3:9. Suppose that the sample of 4000 flies contained 226, 764, 733, and 2277 flies of each species, respectively. For alpha =.1, is there sufficient evidence to reject the geneticist’s claim ? are independent and chi-squared distrib. with d-1 degrees of freedom. hence NO since 3 deg. freed. & alpha =.1 

EXTRA TOPIC: POISSON APPROXIMATION The Binomial Distribution It has mean is the probability that k-events happen in n-trials if Ifthen The right side is the Poisson Distribution and variance and

REFERENCES 1.Martin Sternstein, Statistics, Barrows College Review Series, New York, Survey textbook covers probability distributions, hypotheses tests, populations,samples, chi-squared analysis, regression. 3. J.Neyman and E.S. Pearson, Joint Statistical Papers, Cambridge University Press, Source materials. 2. E. L. Lehmann, Testing Statistical Hypotheses, New York, Detailed development of the Neyman-Pearson theory of hypotheses testing. 4. Jan von Plato, Creating Modern Probability, Cambridge University Press, Charts the history and development of modern probability theory.