Statistics flavors Descriptive Inferential – Non-parametric / Monte Carlo – Parametric Model estimation (sometimes used for inference) – Maximum likelihood.

Slides:



Advertisements
Similar presentations
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Advertisements

Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
10-3 Inferences.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Statistics: Purpose, Approach, Method. The Basic Approach The basic principle behind the use of statistical tests of significance can be stated as: Compare.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
1 Analysis of Variance This technique is designed to test the null hypothesis that three or more group means are equal.
Elementary hypothesis testing
Elementary hypothesis testing
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
MARE 250 Dr. Jason Turner Hypothesis Testing III.
1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Chapter 9 Hypothesis Testing.
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
Regression and Correlation Methods Judy Zhong Ph.D.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Statistical Analysis A Quick Overview. The Scientific Method Establishing a hypothesis (idea) Collecting evidence (often in the form of numerical data)
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
MARE 250 Dr. Jason Turner Hypothesis Testing III.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
1 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson GE 5 Tutorial 5.
Collaboration and Data Sharing What have I been doing that’s so bad, and how could it be better? August 1 st, 2010.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Lecture 17 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Chapter 8 Delving Into The Use of Inference 8.1 Estimating with Confidence 8.2 Use and Abuse of Tests.
Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat.
KNR 445 Statistics t-tests Slide 1 Introduction to Hypothesis Testing The z-test.
1 URBDP 591 A Lecture 12: Statistical Inference Objectives Sampling Distribution Principles of Hypothesis Testing Statistical Significance.
Chapter 8: Simple Linear Regression Yang Zhenlin.
PCB 3043L - General Ecology Data Analysis.
Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Introduction to Medical Statistics. Why Do Statistics? Extrapolate from data collected to make general conclusions about larger population from which.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
1 Lecture 5 Introduction to Hypothesis Tests Slides available from Statistics & SPSS page of Social Science Statistics Module.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Collaboration and data sharing in the Distributed Graduate Seminars Role of NCEAS in the DGS Coordination Collaborative webspace Grad Research Associate.
15 Inferential Statistics.
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Chapter 13 Simple Linear Regression
Chapter 14 Introduction to Multiple Regression
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Correlation and Simple Linear Regression
STA 291 Spring 2010 Lecture 18 Dustin Lueker.
PCB 3043L - General Ecology Data Analysis.
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Simulation-Based Approach for Comparing Two Means
CHAPTER 12 More About Regression
Chapter 9 Hypothesis Testing.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Quantitative Methods in HPELS HPELS 6210
Chapter 10 Analyzing the Association Between Categorical Variables
Hypothesis tests for the difference between two proportions
CHAPTER 12 More About Regression
Hypothesis Tests for a Standard Deviation
CHAPTER 12 More About Regression
STA 291 Summer 2008 Lecture 18 Dustin Lueker.
Presentation transcript:

Statistics flavors Descriptive Inferential – Non-parametric / Monte Carlo – Parametric Model estimation (sometimes used for inference) – Maximum likelihood – Bayesian

What is an average? X=(140,150,160,170,180) mean = sum(X)/N mean = that value X* such that sum(X-X*) = 0 mean = that value X* such that sum(X-X*)^2 is minimized

Descriptive vs. inferential statistics Are the five students shorter (on average) than the average Berkeley student? Our 5 have a mean height of 160 cm Let’s say that the average height of Berkeley students is 180 cm Is this group of people shorter than the average?

Inferential statistics Rephrase our question: are the five students shorter than would be expected for a random sample drawn from the student population? We can think of the random sample being drawn from 1) all actual students or 2) all theoretically possible students, i.e. the general properties of the population of people who could be students at Berkeley

(1000 actual students)

mean = 182.3

random samples

454 are less than or equal to 160

What is the conclusion? What was the question? Initial question: are the five students shorter than would be expected for a random sample drawn from the student population? Revised question: Can we reject the null hypothesis that these 5 students represent a random sample from the general population? Alternative hypothesis 1: students are non- random sample (2-tailed) Alternative hypothesis 2: students are non- random and shorter than average (1-tailed)

454 are less than or equal to are greater than or equal to random samples NON-PARAMETRIC TEST

Conclusion 2-tailed: The probability of drawing a random sample of 5 with a mean height that is as – or more – different from the general population of 100 students, relative to our actual sample with mean = 160: ( )/ = tailed: The probability of drawing this random sample that is as – or more – different from the general population AND is smaller than the mean is 454/ = You can only test a 1-tailed hypothesis is you are willing to ignore the alternative possibility – if you had started with the hypothesis that this group would be larger than expected, you cannot then conclude from this sample that they are actually smaller

(1000 actual students) mean = sd = 19.5

What is the standard deviation sd = sum(X-X*)^2 sum of squared residuals it’s the value we minimized to find the mean!

Theoretical distribution (i.e. normal distribution) of height for an infinite population of students with mean = and sd =19.5

Standard deviation of the distribution of means for samples with N=5 drawn from the overall population: this is called the standard error of the mean SE = SD/sqrt(N)

Here is our observed mean of 160 for our sample of 5 students

area under lower tail = The probability that these 5 students represent a random sample from a population with mean = and sd = 19.5 is: 1-tail: p <= tail: p <= PARAMETRIC TEST

454 are less than or equal to are greater than or equal to random samples NON-PARAMETRIC TEST

Is this result significant? We have decided as a sociological convention that we don’t want to accept random patterns as evidence that something is actually going on more than 5% of the time!

Statistics flavors Descriptive Inferential – Non-parametric / Monte Carlo – Parametric Model estimation (sometimes used for inference) – Maximum likelihood – Bayesian

Parametric vs. likelihood Parametric: What the data aren’t – rejecting null hypotheses Likelihood: What the data are – what is the most likely model that these data represent

The relative likelihood of values drawn from a normal distribution with mean = 150 and sd = 20 (i.e., the density function of the normal distribution)

The relative likelihood of values drawn from a normal distribution with mean = 150 and sd = 20 (i.e., the density function of the normal distribution) Relative likelihoods of our five observations

Calculating likelihoods What is the overall likelihood of a model with mean = 150 and sd = 20 based on our 5 observations? The joint probability of independent events is the product of their independent probabilities P = π(p i ) The log of a product = the sum of the log: –log(xy) = log(x) + log(y) log(P) = π(p i ) = sum(log(p i ))

Log likelihood of mean = 150 and sd = 20 based on our 5 observations =

m = 150 sd = 20 L = m = 160 sd = 20 L = m = 150 sd = 10 L = m = 160 sd = 15.8 L = -20.4

Likelihood surfaces for the mean and sd Assume sd = 10Assume mean = 160

max likelihood for the mean and sd of X

What is Bayesian stats all about? A philosophy – we have prior knowledge of systems, and our prior beliefs should (and do) inform how we interpret data An efficient algorithm for complex likelihood problems – Gibbs samplers, MCMC searches

What is R? R is a system for statistical computation and graphics. R: initially written by Ross Ihaka and Robert Gentleman at Dep. of Statistics of U of Auckland, New Zealand during 1990s. Since 1997: international “R-core” team of ca. 15 people with access to common CVS archive. From Wolfgang Huber – An introduction to R

What is R? GNU General Public License (GPL) – can be used by anyone for any purpose – contagious – not platform specific! Open Source – quality control! – efficient bug tracking and fixing system supported by the user community From Wolfgang Huber – An introduction to R

“Data Entropy” Michener et al. 1997, Ecological Monographs Information content Time Time of publication Specific details General details Retirement or career change Death Accident

Personal data management problems are magnified in collaboration Data organization – standardize Data documentation – standardize metadata Data analysis - document Data & analysis preservation - protect Collaboration and data sharing

Why use R? Analysis automation: code your analyses once and rerun as your data increases Analysis repeatability: provide R script with data and anyone can run through and repeat the analyses

Code in R-Editor (or Tinn-R/notepad or from your head) R console Graphics window Files Output R console

### Simple Linear Regression - 0+ age Trout in Hoh River, WA against Temp Celsius ### Load data HohTrout<-read.csv("Hoh_Trout0_Temp.csv") ### See full metadata in Rosenberger, E.E., S.L. Katz, J. McMillan, G. Pess., and S.E. Hampton. In prep. Hoh River trout habitat associations. ### ### Look at the data HohTrout plot(TROUT ~ TEMPC, data=HohTrout) ### Log Transform the independent variable (x+1) - this method for transform creates a new column in the data frame HohTrout$LNtrout<-log(HohTrout$TROUT+1) ### Plot the log-transformed y against x ### First I'll ask R to open new windows for subsequent graphs with the windows command windows() plot(LNtrout ~ TEMPC, data=HohTrout) ### Regression of log trout abundance on log temperature mod.r <- lm(LNtrout ~ TEMPC, data=HohTrout) ### add a regression line to the plot. abline(mod.r) ### Check out the residuals in a new plot layout(matrix(1:4, nr=2)) windows() plot(mod.r, which=1) ### Check out statistics for the regression summary.lm(mod.r) Compare this method to: Copy and paste from Excel Log-transform in Excel Copy and paste new file Graph in SigmaPlot Analyze in Systat Graph in SigmaPlot

The benefits of R are: R is free. R is open-source and runs on UNIX, Windows and Macintosh. R has an excellent built-in help system. R has excellent graphing capabilities, and with experience one can produce publication quality figures. R’s language has a powerful syntax with many built-in statistical functions. The language is easy to extend with user-written functions. Numerous functions and libraries are being written and introduced each year, including many focused on ecology and evolution. R is a computer programming language. For programmers it will feel more familiar than others and for new computer users, the next leap to programming will not be so large. Analyses in R are written in computer scripts which can be saved and easily rerun. This greatly simplifies: reanalysis of updated data; running the same analyses on new data sets; documenting and sharing workflows

What are disadvantages of R? It has a steep initial learning curve. It has a limited graphical interface (S-Plus has a good one). It has a steep initial learning curve. There is no commercial support. (Although one can argue the international mailing list is even better) Did I mention that it has a steep initial learning curve? The command language is a programming language so students must learn to appreciate syntax issues, and must learn and remember many functions and commands.

Brownian motion steps from N(0,1) trait value time

Brownian motion steps from N(0,1) trait value time

Brownian motion steps from N(0,1) trait value time trait variance

Brownian motion on a phylogeny Traits of related species will be similar in proportion to the fraction of their history that’s shared

cophenetic.phylo(): D = phyletic distances: vcv() = variance-covariance matrix = 1 – D/max(D)

2000 reps, brownian motion t4 vs. t6 values trait values have an expected covariance, also called ‘non-independence of species’