Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present.

Slides:



Advertisements
Similar presentations
Statistical basics Marian Scott Dept of Statistics, University of Glasgow August 2010.
Advertisements

Statistical basics Marian Scott Dept of Statistics, University of Glasgow August 2008.
Probability models- the Normal especially.
Significance and probability Type I and II errors Practical Psychology 1 Week 10.
Behavioural Science II Week 1, Semester 2, 2002
Hypothesis Testing Steps of a Statistical Significance Test. 1. Assumptions Type of data, form of population, method of sampling, sample size.
DATA ANALYSIS I MKT525. Plan of analysis What decision must be made? What are research objectives? What do you have to know to reach those objectives?
Lecture 2: Basic steps in SPSS and some tests of statistical inference
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview of Lecture Independent and Dependent Variables Between and Within Designs.
Social Research Methods
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 6 Chicago School of Professional Psychology.
Statistical hypothesis testing – Inferential statistics I.
Inferential Statistics
Statistical Inference Dr. Mona Hassan Ahmed Prof. of Biostatistics HIPH, Alexandria University.
AM Recitation 2/10/11.
Hypothesis testing is used to make decisions concerning the value of a parameter.
Hypothesis Testing.
Jump to first page HYPOTHESIS TESTING The use of sample data to make a decision either to accept or to reject a statement about a parameter value or about.
Significance Testing 10/15/2013. Readings Chapter 3 Proposing Explanations, Framing Hypotheses, and Making Comparisons (Pollock) (pp ) Chapter 5.
Chapter 1 Descriptive Analysis. Statistics – Making sense out of data. Gives verifiable evidence to support the answer to a question. 4 Major Parts 1.Collecting.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
6 - 1 © 2000 Prentice-Hall, Inc. A First Course in Business Statistics Inferences Based on a Single Sample: Tests of Hypothesis Chapter 6.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Introduction To Biological Research. Step-by-step analysis of biological data The statistical analysis of a biological experiment may be broken down into.
STA Statistical Inference
Making decisions about distributions: Introduction to the Null Hypothesis 47:269: Research Methods I Dr. Leonard April 14, 2010.
Unit 8 Section : z Test for a Mean  Many hypotheses are tested using the generalized statistical formula: Test value = (Observed Value)-(expected.
Inferential Statistics 2 Maarten Buis January 11, 2006.
Learning Objectives In this chapter you will learn about the t-test and its distribution t-test for related samples t-test for independent samples hypothesis.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
Statistics is concerned with the proper methods used to collect, analyse, present and interpret DATA There are two types of Statistics: Descriptive – any.
Inferential Statistics Body of statistical computations relevant to making inferences from findings based on sample observations to some larger population.
Statistical Inference Statistical Inference involves estimating a population parameter (mean) from a sample that is taken from the population. Inference.
Statistical Significance for a two-way table Inference for a two-way table We often gather data and arrange them in a two-way table to see if two categorical.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Simple examples of the Bayesian approach For proportions and means.
Analyzing Statistical Inferences July 30, Inferential Statistics? When? When you infer from a sample to a population Generalize sample results to.
1 PAUF 610 TA 1 st Discussion. 2 3 Population & Sample Population includes all members of a specified group. (total collection of objects/people studied)
Introduction to Medical Statistics. Why Do Statistics? Extrapolate from data collected to make general conclusions about larger population from which.
Chapter 9: Hypothesis Tests for One Population Mean 9.2 Terms, Errors, and Hypotheses.
Lec. 19 – Hypothesis Testing: The Null and Types of Error.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Chapter 22 Inferential Data Analysis: Part 2 PowerPoint presentation developed by: Jennifer L. Bellamy & Sarah E. Bledsoe.
Ex St 801 Statistical Methods Part 2 Inference about a Single Population Mean (HYP)
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Hypothesis Testing II: The Two-sample Case
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Hypothesis Testing: Hypotheses
Social Research Methods
MATH 2311 Section 8.2.
Ask 100 randomly chosen people the following 6 QUESTIONS…
Inferential Statistics
R. E. Wyllys Copyright 2003 by R. E. Wyllys Last revised 2003 Jan 15
Introduction to Inference
Summarising and presenting data - Univariate analysis continued
A Closer Look at Testing
Introduction to Inference
Statistical inference
Chapter 9: Hypothesis Tests Based on a Single Sample
Hypothesis Tests for Proportions
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Correlation and the Pearson r
CHAPTER 1 Exploring Data
Power Problems.
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Statistical Power.
Statistical inference
Presentation transcript:

Some statistical basics Marian Scott

Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present results (graphically), Test hypotheses Construct models

Variables- number and type Univariate: there is one variable of interest measured on the individuals in the sample. We may ask: What is the distribution of results-this may be further resolved into questions concerning the mean or average value of the variable and the scatter or variability in the results?

Bivariate Bivariate two variables of interest are measured on each member of the sample. We may ask : How are the two variables related? If one variable is time, how does the other variable change? How can we model the dependence of one variable on the other?

Multivariate Multivariate many variables of interest are measured on the individuals in the sample, we might ask: What relationships exist between the variables? Is it possible to reduce the number of variables, but still retain 'all' the information? Can we identify any grouping of the individuals on the basis of the variables?

Data types Numerical: a variable may be either continuous or discrete. For a discrete variable, the values taken are whole numbers (e.g. number of chromosome abnormalities, numbers of eggs). For a continuous variable, values taken are real numbers (positive or negative and including fractional parts) (e.g. blood lead level, alkalinity, weight, temperature).

categorical Categorical: a limited number of categories or classes exist, each member of the sample belongs to one and only one of the classes e.g. sex is categorical. Sex is a nominal categorical variable since the categories are unordered. Dose of a drug or level of diluent (eg recorded as low, medium,high) would be an ordinal categorical variable since the different classes are ordered

Inference and Statistical Significance Sample Population inference Is the sample representative? Is the population homogeneous? Since only a sample has been taken from the population we cannot be 100% certain Significance testing

Hypothesis Testing II Null hypothesis: usually no effect Alternative hypothesis: effect Make a decision based on the evidence (the data) There is a risk of getting it wrong! Two types of error:- reject null when we shouldnt - Type I dont reject null when we should - Type II

Significance Levels We cannot reduce probabilities of both Type I and Type II errors to zero. So we control the probability of a Type I error. This is referred to as the Significance Level or p- value. Generally p-value of <0.05 is considered a reasonable risk of a Type I error. (beyond reasonable doubt)

Statistical Significance vs. Practical Importance Statistical significance is concerned with the ability to discriminate between treatments given the background variation. Practical importance relates to the scientific domain and is concerned with scientific discovery and explanation.

Power Power is related to Type II error probability of power = 1 - making a Type II error Aim: to keep power as high as possible

Sample size calculations What is the objective of the experiment? How much of a difference is it important to be able to detect (the effect size)? At what significance level do you want to conduct the test? (decrease the significance level, reduces power) What is the power of the experiment (what is the probability that you will detect such a difference when it actually exists)? How variable is the population? Greater variation needs larger sample size to achieve the same power

Power Curves

Modelling continuous variables- checking Normality Normal density function and histogram Check for symmetry Other possibility-Normal probability plot

Modelling continuous variables- checking Normality Normal probability plot Should show a straight line p-value of test is also reported (null: data are Normally distributed)

Statistical inference Hypothesis testing and the p-value Statistical significance vs real-world importance Confidence intervals

Confidence intervals- an alternative to hypothesis testing A confidence interval is a range of credible values for the population parameter. The confidence coefficient is the percentage of times that the method will in the long run capture the true population parameter. A common form is sample estimator 2* estimated standard error

Statistical models Outcomes or Responses these are the results of the practical work and are sometimes referred to as dependent variables. Causes or Explanations these are the conditions or environment within which the outcomes or responses have been observed and are sometimes referred to as independent variables, but more commonly known as covariates.

Statistical models In experiments many of the covariates have been determined by the experimenter but some may be aspects that the experimenter has no control over but that are relevant to the outcomes or responses. In observational studies, these are usually not under the control of the experimenter but are recorded as possible explanations of the outcomes or responses.

Specifying a statistical models Models specify the way in which outcomes and causes link together, eg. Metabolite = Temperature The = sign does not indicate equality in a mathematical sense and there should be an additional item on the right hand side giving a formula:- Metabolite = Temperature + Error

statistical model interpretation Metabolite = Temperature + Error The outcome Metabolite is explained by Temperature and other things that we have not recorded which we call Error. The task that we then have in terms of data analysis is simply to find out if the effect that Temperature has is large in comparison to that which Error has so that we can say whether or not the Metabolite that we observe is explained by Temperature.

Correlations and linear relationships Strength of linear relationship Simple indicator lying between –1 and +1 Check your plots for linearity

gene correlations

Interpreting correlations The correlation coefficient is used as a measure of the linear relationship between two variables, The correlation coefficient is a measure of the strength of the linear association between two variables. If the relationship is non-linear, the coefficient can still be evaluated and may appear sensible, so beware- plot the data first.

Simple regression model The basic regression model assumes: The average value of the response x, is linearly related to the explanatory t, The spread of the response x, about the average is the SAME for all values of t, The VARIABILITY of the response x, about the average follows a NORMAL distribution for each value of t.

Simple regression model Model is fit typically using least squares Goodness of fit of model assessed based on residual sum of squares and R 2 Assumptions checked using residual plots Inference about model parameters carried out using hypothesis tests or confidence intervals

statistical model interpretation The traditional statistical tests such as t-tests, ANOVA, ANCOVA and regression are each special cases of a more general type of model, making a number of assumptions - t-tests work where there are two groups, ANOVA works with categorical explanatory variables, regression assumes that explanatory variables are continuous, Our explanatory variables are not like this, they are mixtures of continuous and categorical, so we need a more flexible approach- the G(eneral) L(inear) M(odel).

General linear models General Linear Models (GLMs) are a comprehensive set of techniques that cover a wide range of analyses. Problems that make use of number of specific techniques may be specified as GLM problems using a unified specification called a Model Syntax. The form of the Model Syntax varies a little from statistics package to statistics package, but is essentially just a way of unambiguously specifying what the relationship is between variables (categorical or continuous).

Examples ExampleTraditional TestGLM word equation Comparing the effect of burning and clipping on bracken Two sample t-test SHOOTS = MANAGEMENT Comparing the effect of two different drugs with a placebo One-way analysis of variance EFFECT = DRUG Comparing the yield between fertilisers conducting the experiment in several fields One-way analysis of variance with blocking YIELD = FIELD + FERTILISER Investigating the relationship between height and weight in people Regression WEIGHT = HEIGHT Investigating the relationship between oxygen consumption and weight in scampi, taking level of activity into account Analysis of covariance, with emphasis on regression OXYGEN = WEIGHT + ACTIVITY or under different assumptions (an interaction between the terms) OXYGEN = WEIGHT | ACTIVITY

summary hypothesis tests and confidence intervals are used to make inferences we build statistical models to explore relationships and explain variation the modelling framework is a general one – general linear models, generalised additive models assumptions should be checked.