Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Data Analysis - Lecture 06 14/03/03

Similar presentations


Presentation on theme: "Statistical Data Analysis - Lecture 06 14/03/03"— Presentation transcript:

1 Statistical Data Analysis - Lecture 06 14/03/03
Bivariate Data Bivariate data is simply data where we have measurements on two variables. The variables can be either discrete or continuous, leading to three possible combinations: discrete/discrete, continuous/ discrete, continuous/continuous The type of analysis depends on the combination you have. Note if one of your variables is in continuous time units (such as years or seconds) then time series analysis is usually the appropriate treatment. We will not cover this subject Statistical Data Analysis - Lecture 06 14/03/03

2 Displaying discrete/discrete data
Statistical Data Analysis - Lecture 06 14/03/03 Displaying discrete/discrete data Some texts recommend a three dimensional barplot Statistical Data Analysis - Lecture 06 14/03/03

3 Statistical Data Analysis - Lecture 06 14/03/03

4 3D perspective bar charts
Difficult to read Increase in categories leads to increase in complexity Increase in categories leads to more obscuration of data/features Perspective distorts the information Don’t do it! Statistical Data Analysis - Lecture 06 14/03/03

5 Displaying discrete/discrete data
Two-way tables are usually the best Think about what your aims are: If the data are for someone else to use in an analysis or for reference in an appendix, then apart from organising the data in a legible fashion don’t do anything to it. If the data are for presentation or summary then apply Ehrenberg’s rules Statistical Data Analysis - Lecture 06 14/03/03

6 Displaying tabular data – Ehrenberg’s rules
Round drastically “2 busy digits” Arrange the numbers so that comparisons are made column-wise and not row-wise Order the columns by size or use ordinal information Use row and column averages or sums as a focus Use white space well Provide verbal summaries Statistical Data Analysis - Lecture 06 14/03/03

7 Statistical Data Analysis - Lecture 06 14/03/03

8 Statistical Data Analysis - Lecture 06 14/03/03

9 Statistical analysis of discrete data
There are two possible situations when dealing with bivariate discrete data Single sample cross-classified by two variables E.g. 221 students classified by age group and birth order Multiple samples classified by a single variable Genotype of individuals in different ethnic DNA databases As you would expect the methods of analysis are quite different Statistical Data Analysis - Lecture 06 14/03/03

10 Multiple samples classified by a single categorical variable
If the samples are independent (and they usually are) then the category proportions can be compared across samples For example, in the US the FBI has DNA databases for African Americans, Caucasians and South Western Hispanics (Florida keeps it’s own database for South Eastern Hispanics) Statistical Data Analysis - Lecture 06 14/03/03

11 Analysing multiple sample discrete data
Say we wished to compare genotype frequencies at a particular DNA locus called Gc Gc has 3 alleles A, B, and C. We have one allele from our mother and one from our father to make a genotype, so there are six possible genotype frequencies: A/A, A/B, A/C, B/B, B/C, and C/C How do we go about comparing the proportions of each genotype? We can plot them – always a good idea Statistical Data Analysis - Lecture 06 14/03/03

12 Statistical Data Analysis - Lecture 06 14/03/03

13 Analysing multiple sample discrete data
How about a statistical test? It boils down to the question we want to answer. If that question is: “Is there a difference in genotype probabilities between the races” then we can answer it. How? A chi-square analysis. Note, it turns out that this is the same analysis that we apply to discrete data from a single sample cross classified by two (or more) categorical variables, so we will assume we are talking about either case from now on. Statistical Data Analysis - Lecture 06 14/03/03

14 Statistical Data Analysis - Lecture 06 14/03/03
Chi-square Analysis First, we need to construct a hypothesis to test. In this case it is: H0: The genotype proportions are the same regardless of race 2. To test this hypothesis we need some way of measuring evidence against it. This is what the chi-square test statistic does Statistical Data Analysis - Lecture 06 14/03/03

15 Statistical Data Analysis - Lecture 06 14/03/03
For any chi-square test the test statistic is given by where Ei are the expected counts and Oi are the observed counts. What are the expected counts? Let’s look at it step by step. Statistical Data Analysis - Lecture 06 14/03/03

16 Statistical Data Analysis - Lecture 06 14/03/03
Expected counts The expected counts are calculated as though H0 is true. If H0 is true, then we have one set of genotype probabilities for every race. That is, pj does not depend on race Therefore, if the probability that a randomly selected individual belongs to race i is ri then the expected number of individuals of individuals from race with genotype j is Nripj (where N is the number of people in the whole population) Statistical Data Analysis - Lecture 06 14/03/03

17 Estimated expected counts
The problem is we don’t know either ri or pj. The best we can do is estimate them from the data. We have a sample of n people each with a race and genotype If we let the rows of our table represent race and the columns, genotype, then the row totals divided by n will give us the race proportions. Let Ri represent the row totals If H0 the column is true then the column totals, Cj, divided by n would give us the estimated genotype frequencies. That is Statistical Data Analysis - Lecture 06 14/03/03

18 Estimated expected counts
Therefore, the expected count in the ith row and the jth column of our table, Eij, is In order to test our null hypothesis we need to know about the distribution of our test statistic under the null hypothesis. Why? Statistical Data Analysis - Lecture 06 14/03/03

19 Statistical Data Analysis - Lecture 06 14/03/03
Hypothesis testing We evaluate our test statistic with respect to the null hypothesis If the test statistic is abnormal (usually this means is of large magnitude), then the probability of observing such a statistic if the null hypothesis were true would be small  a small P-value  evidence against the null hypothesis Statistical Data Analysis - Lecture 06 14/03/03

20 Chi-Square degrees of freedom
Recall the Chi-square distribution has an extra parameter that describes its shape – the degrees of freedom For a two-way table, in general, the degrees of freedom are given by df = (I – 1)(J – 1) where I is the number of rows, and J the number of columns Statistical Data Analysis - Lecture 06 14/03/03

21 Statistical Data Analysis - Lecture 06 14/03/03
Chi-square df In our example we have 3 races ( I = 3 ) and 6 genotypes ( J = 6) Therefore we would test our hypothesis with (3-1)(6-1) = 2*5 = 10 df. Statistical Data Analysis - Lecture 06 14/03/03


Download ppt "Statistical Data Analysis - Lecture 06 14/03/03"

Similar presentations


Ads by Google