Statistical basics Marian Scott Dept of Statistics, University of Glasgow August 2010.

Slides:



Advertisements
Similar presentations
Some statistical ideas Marian Scott Statistics, University of Glasgow June 2012.
Advertisements

Some statistical ideas Marian Scott Statistics, University of Glasgow January 2014.
Statistical basics Marian Scott Dept of Statistics, University of Glasgow August 2008.
Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present.
Probability models- the Normal especially.
Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011.
STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Objectives (BPS chapter 24)
AP Statistics – Chapter 9 Test Review
Data Analysis Statistics. Inferential statistics.
Behavioural Science II Week 1, Semester 2, 2002
Hypothesis Testing Steps of a Statistical Significance Test. 1. Assumptions Type of data, form of population, method of sampling, sample size.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
DATA ANALYSIS I MKT525. Plan of analysis What decision must be made? What are research objectives? What do you have to know to reach those objectives?
Monday, 4/29/02, Slide #1 MA 102 Statistical Controversies Monday, 4/29/02 Today: CLOSING CEREMONIES!  Discuss HW #3  Review for final exam  Evaluations.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Lecture 2: Basic steps in SPSS and some tests of statistical inference
Chapter 2 Simple Comparative Experiments
Social Research Methods
Data Analysis Statistics. Inferential statistics.
AM Recitation 2/10/11.
Confidence Intervals and Hypothesis Testing - II
1 STATISTICAL HYPOTHESES AND THEIR VERIFICATION Kazimieras Pukėnas.
1/2555 สมศักดิ์ ศิวดำรงพงศ์
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 1 Descriptive Analysis. Statistics – Making sense out of data. Gives verifiable evidence to support the answer to a question. 4 Major Parts 1.Collecting.
Quantitative Skills: Data Analysis
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Confidence Interval Estimation.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Learning Objectives In this chapter you will learn about the t-test and its distribution t-test for related samples t-test for independent samples hypothesis.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
Data Analysis (continued). Analyzing the Results of Research Investigations Two basic ways of describing the results Two basic ways of describing the.
Introduction Osborn. Daubert is a benchmark!!!: Daubert (1993)- Judges are the “gatekeepers” of scientific evidence. Must determine if the science is.
Review of Chapters 1- 6 We review some important themes from the first 6 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Research Seminars in IT in Education (MIT6003) Quantitative Educational Research Design 2 Dr Jacky Pow.
Inferential Statistics Body of statistical computations relevant to making inferences from findings based on sample observations to some larger population.
Medical Statistics as a science
Academic Research Academic Research Dr Kishor Bhanushali M
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 6 Putting Statistics to Work.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Analyzing Statistical Inferences July 30, Inferential Statistics? When? When you infer from a sample to a population Generalize sample results to.
Statistics. Descriptive Statistics Organize & summarize data (ex: central tendency & variability.
Inference ConceptsSlide #1 1-sample Z-test H o :  =  o (where  o = specific value) Statistic: Test Statistic: Assume: –  is known – n is “large” (so.
Review: Stages in Research Process Formulate Problem Determine Research Design Determine Data Collection Method Design Data Collection Forms Design Sample.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Appendix I A Refresher on some Statistical Terms and Tests.
Outline Sampling Measurement Descriptive Statistics:
Chapter Nine Hypothesis Testing.
Chapter 5 STATISTICAL INFERENCE: ESTIMATION AND HYPOTHESES TESTING
Two-Sample Hypothesis Testing
Statistical tests for quantitative variables
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Dr.MUSTAQUE AHMED MBBS,MD(COMMUNITY MEDICINE), FELLOWSHIP IN HIV/AIDS
Inference for Regression
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Chapter 2 Simple Comparative Experiments
Social Research Methods
Inferential Statistics:
Chapter 9: Hypothesis Tests Based on a Single Sample
Welcome!.
Lecture 1: Descriptive Statistics and Exploratory
Statistics PSY302 Review Quiz One Spring 2017
DESIGN OF EXPERIMENT (DOE)
Introductory Statistics
Presentation transcript:

Statistical basics Marian Scott Dept of Statistics, University of Glasgow August 2010

What shall we cover? Why might we need some statistical skills Statistical inference- what is it? how to handle variation exploring data probability models inferential tools- hypothesis tests and confidence intervals

Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present results (graphically), Test hypotheses Construct models

statistical language variable- a single aspect of interest population- a large group of individuals sample- a subset of the population parameter- a single number summarising the variable in the population statistic- a single number summarising the variable in the sample

statistical language- Radiation protection- C-14 in fish variable- radiocarbon level (Bq/KgC) population- all fish caught for human consumption in W Scotland sample- 20 fish bought in local markets parameter- population mean C-14 level statistic- sample mean C-14 level

Questions Univariate: What is the distribution of results-this may be further resolved into questions concerning the mean or average value of the variable and the scatter or variability in the results? Bivariate: How are the two variables related? How can we model the dependence of one variable on the other? Multivariate: What relationships exist between the variables? Is it possible to reduce the number of variables, but still retain 'all' the information? Can we identify any grouping of the individuals on the basis of the variables?

Data types Numerical: a variable may be either continuous or discrete. – For a discrete variable, the values taken are whole numbers (e.g. number of invertebrates). – For a continuous variable, values taken are real numbers ( e.g. pH, alkalinity, DOC, temperature). Categorical: a limited number of categories or classes exist, each member of the sample belongs to one and only one of the classes. – Compliance is a nominal categorical variable since the categories are unordered. – Level of diluent (eg recorded as low, medium,high) would be an ordinal categorical variable since the different classes are ordered

Inference and Statistical Significance Sample Population inference Is the sample representative? Is the population homogeneous? Since only a sample has been taken from the population we cannot be 100% certain Significance testing

the statistical process A process that allows inferences about properties of a large collection of things (the population) to be made based on observations on a small number of individuals belonging to the population (the sample). The use of valid statistical sampling techniques increases the chance that a set of specimens (the sample, in the collective sense) is collected in a manner that is representative of the population.

Variation soil or sediment samples taken side-by- side, from different parts of the same plant, or from different animals in the same environment, exhibit different activity densities of a given radionuclide. The distribution of values observed will provide an estimate of the variability inherent in the population of samples that, theoretically, could be taken.

What is the population? The population is the set of all items that could be sampled, such as all fish in a lake, all people living in the UK, all trees in a spatially defined forest, or all 20-g soil samples from a field. Appropriate specification of the population includes a description of its spatial extent and perhaps its temporal stability

What are the sampling units? In some cases, sampling units are discrete entities (i.e., animals, trees), but in others, the sampling unit might be investigator-defined, and arbitrarily sized. Example- technetium in shellfish The objective here is to provide a measure (the average) of technetium in shellfish (eg lobsters for human consumption) for the west coast of Scotland. Population is all lobsters on the west coast Sampling unit is an individual animal.

Summarising data- means, medians and other such statistics

plotting data- histograms, boxplots, stem and leaf plots, scatterplots

median lower quartile upper quartile

Preliminary Analysis There is considerable variation –Across different sites –Within the same site across different years Distribution of data is highly skewed with evidence of outliers and in some cases bimodality

probability models- the Normal especially

checking distributional assumptions

Modelling Continuous Variables checking normality Normal probability plot Should show a straight line p-value of test is also reported (null: data are Normally distributed)

Statistical inference Confidence intervals Hypothesis testing and the p-value Statistical significance vs real-world importance

a formal statistical procedure- confidence intervals

Confidence intervals- an alternative to hypothesis testing A confidence interval is a range of credible values for the population parameter. The confidence coefficient is the percentage of times that the method will in the long run capture the true population parameter. A common form is sample estimator 2* estimated standard error

another formal inferential procedure- hypothesis testing

Hypothesis Testing Null hypothesis: usually no effect Alternative hypothesis: effect Make a decision based on the evidence (the data) There is a risk of getting it wrong! Two types of error:- –reject null when we shouldnt - Type I –dont reject null when we should - Type II

Significance Levels We cannot reduce probabilities of both Type I and Type II errors to zero. So we control the probability of a Type I error. This is referred to as the Significance Level or p-value. Generally p-value of <0.05 is considered a reasonable risk of a Type I error. (beyond reasonable doubt)

Statistical Significance vs. Practical Importance Statistical significance is concerned with the ability to discriminate between treatments given the background variation. Practical importance relates to the scientific domain and is concerned with scientific discovery and explanation.

Power Power is related to Type II error probability of power = 1 - making a Type II error Aim: to keep power as high as possible

relationships- linear or otherwise

Correlations and linear relationships pearson correlation Strength of linear relationship Simple indicator lying between –1 and +1 Check your plots for linearity

Interpreting correlations The correlation coefficient is used as a measure of the linear relationship between two variables, The correlation coefficient is a measure of the strength of the linear association between two variables. If the relationship is non-linear, the coefficient can still be evaluated and may appear sensible, so beware- plot the data first.

what is a statistical model?

Statistical models Outcomes or Responses these are the results of the practical work and are sometimes referred to as dependent variables. Causes or Explanations these are the conditions or environment within which the outcomes or responses have been observed and are sometimes referred to asindependent variables, but more commonly known as covariates.

Specifying a statistical models Models specify the way in which outcomes and causes link together, eg. Metabolite ~ Temperature there should be an additional item on the right hand side giving a formula:- Metabolite ~ Temperature + Error

statistical model interpretation Metabolite ~ Temperature + Error The outcome Metabolite is explained by Temperature and other things that we have not recorded which we call Error. The task that we then have in terms of data analysis is simply to find out if the effect that Temperature has is large in comparison to that which Error has so that we can say whether or not the Metabolite that we observe is explained by Temperature.

summary hypothesis tests and confidence intervals are used to make inferences we build statistical models to explore relationships and explain variation a general linear modelling framework is very flexible assumptions should be checked.