Central Tendency Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.

Central Tendency Harry R. Erwin, PhD School of Computing and Technology University of Sunderland

Resources Crawley, MJ (2005) Statistics: An Introduction Using R. Wiley. Gonick, L., and Woollcott Smith (1993) A Cartoon Guide to Statistics. HarperResource (for fun).

Clustering Most data cluster around an intermediate value. If the data values you measure are actually a sum of multiple independent random variables, you can prove this is the case. This is known as the Central Limit Theorem: the sum of a large number of independent random variables has a normal (bell-shaped) distribution. In particular, this is why estimates of the mean (or ‘average’) are distributed normally. This will be the case in repeated experiments.

Example: Normal Distribution

Other Measures of Clustering The median is the middle value of a sample or a distribution. The mode is the most frequent value in a sample or a distribution. These can be convenient to use, especially if the data are not normally distributed.

Application to Experimental Design One way you to disprove a null hypothesis: –show the mean (average) value of your experimental data is far enough different from the mean value implied by the null hypothesis that its chance of occurring is very small. –You first need to show that your data are normally distributed to be able to estimate this chance.

To Check the Data are Normal yvals<-read.table("c:\\wherever\\yvalues.txt", header = T) attach(yvals) hist(y) qqnorm(y) qqline(y,lty=2)

What it Looks Like

Normal Data y<-rnorm(1000) hist(y) qqnorm(y) qqline(y,lty=2)

Appearance of Normal Data

Non-Normal Data y<-seq(0,1.0,0.001) hist(y) qqnorm(y) qqline(y,lty=2)

Appearance of Non-Normal Data

Geometric Mean This is used when the data are generated as the product rather than the sum of independent random variables. An example might be a series of risks, each being the product of a rate, a probability of success, and an estimate of the consequences. The geometric mean is calculated as (∏y i ) 1/n Where there are n elements being averaged over. In R, you calculate this as exp(mean(log(data)))

Harmonic Mean If your concern is not the absolute value of the random variables, but rather their ratios, the mean of interest is the harmonic mean. An example might be current population relative to the ‘carrying capacity’ of a region. This is the ‘reciprocal of the average of the reciprocals’. To calculate this in R, use 1/mean(1/data))

R Demonstrations of all this… From the book.

Central Tendency Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.

Similar presentations

Presentation on theme: "Central Tendency Harry R. Erwin, PhD School of Computing and Technology University of Sunderland."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Central Tendency Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.

Similar presentations

Presentation on theme: "Central Tendency Harry R. Erwin, PhD School of Computing and Technology University of Sunderland."— Presentation transcript:

Similar presentations

About project

Feedback