1. Introduction 2.Course Information 3.Study Design 4.Looking at Data Today’s Topics Introduction to the Practice of Statistics Ch. 1, 2.5, 3.2 MBP1010.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Karl W Broman Biostatistics & Medical Informatics University of Wisconsin – Madison How to display data badly.
Karl W Broman Biostatistics & Medical Informatics University of Wisconsin – Madison How to display data badly.
Chapter 7: Data for Decisions Lesson Plan
How to display data badly Karl W Broman Department of Biostatistics
INTRODUCTION TO CLINICAL RESEARCH How To Make A Bad Plot Karen Bandeen-Roche, Ph.D. July 13, 2010.
Departments of Medicine and Biostatistics
BIAS AND CONFOUNDING Nigel Paneth. HYPOTHESIS FORMULATION AND ERRORS IN RESEARCH All analytic studies must begin with a clearly formulated hypothesis.
Chance, bias and confounding
1 Economics 240A Power One. 2 Outline w Course Organization w Course Overview w Resources for Studying.
Research Study. Type Experimental study A study in which the investigator selects the levels of at least one factor Observational study A design in which.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Experimental Design Research vs Experiment. Research A careful search An effort to obtain new knowledge in order to answer a question or to solve a problem.
Experimental Statistics I.  We use data to answer research questions  What evidence does data provide?  How do I make sense of these numbers without.
Experimental design, basic statistics, and sample size determination
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
1 Introduction to biostatistics Lecture plan 1. Basics 2. Variable types 3. Descriptive statistics: Categorical data Categorical data Numerical data Numerical.
Measures of Central Tendency
Experimental design and sample size determination Karl W Broman Department of Biostatistics Johns Hopkins University
Chapter 3 Goals After completing this chapter, you should be able to: Describe key data collection methods Know key definitions:  Population vs. Sample.
Math 116 Chapter 12.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Describing distributions with numbers
Chapter 1 Descriptive Analysis. Statistics – Making sense out of data. Gives verifiable evidence to support the answer to a question. 4 Major Parts 1.Collecting.
● Midterm exam next Monday in class ● Bring your own blue books ● Closed book. One page cheat sheet and calculators allowed. ● Exam emphasizes understanding.
Chapter 1: Exploring Data AP Stats, Questionnaire “Please take a few minutes to answer the following questions. I am collecting data for my.
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Chapter 1: The Nature of Statistics
Stat 1510: Statistical Thinking and Concepts 1 Density Curves and Normal Distribution.
Chapter 7: Data for Decisions Lesson Plan Sampling Bad Sampling Methods Simple Random Samples Cautions About Sample Surveys Experiments Thinking About.
Introduction Biostatistics Analysis: Lecture 1 Definitions and Data Collection.
Describing distributions with numbers
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
STA Lecture 51 STA 291 Lecture 5 Chap 4 Graphical and Tabular Techniques for categorical data Graphical Techniques for numerical data.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.2Experiments.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
1 Chapter 4: Describing Distributions 4.1Graphs: good and bad 4.2Displaying distributions with graphs 4.3Describing distributions with numbers.
1. Introduction 2.Course Information and Schedule 3.Study Design 4.Looking at Data Today’s Topics Introduction to the Practice of Statistics Ch. 1, 2.5,
1. Introduction 2.Course Information and Schedule 3.Study Design 4.Looking at Data Today’s Topics Introduction to the Practice of Statistics Ch. 1, 2.5,
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.2Experiments.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
How to display data badly
Karl W Broman Biostatistics & Medical Informatics University of Wisconsin – Madison How to display data badly.
Design of Clinical Research Studies ASAP Session by: Robert McCarter, ScD Dir. Biostatistics and Informatics, CNMC
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.2Experiments.
MBP Lecture 2: January 14, Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.
How to display data badly
1 Take a challenge with time; never let time idles away aimlessly.
Statistics - is the science of collecting, organizing, and interpreting numerical facts we call data. Individuals – objects described by a set of data.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
PSY 325 AID Education Expert/psy325aid.com FOR MORE CLASSES VISIT
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Copyright ©2011 Brooks/Cole, Cengage Learning Gathering Useful Data for Examining Relationships Observation VS Experiment Chapter 6 1.
Introduction to Biostatistics Lecture 1. Biostatistics Definition: – The application of statistics to biological sciences Is the science which deals with.
Measures of disease frequency Simon Thornley. Measures of Effect and Disease Frequency Aims – To define and describe the uses of common epidemiological.
How to display data badly Karl W Broman Department of Biostatistics
Thursday, May 12, 2016 Report at 11:30 to Prairieview
The rise of statistics Statistics is the science of collecting, organizing and interpreting data. The goal of statistics is to gain understanding from.
Learning Objectives : After completing this lesson, you should be able to: Describe key data collection methods Know key definitions: Population vs. Sample.
Statistical Data Analysis
Basic Statistics Overview
Experimental Design Research vs Experiment
Welcome!.
Statistical Data Analysis
Presentation transcript:

1. Introduction 2.Course Information 3.Study Design 4.Looking at Data Today’s Topics Introduction to the Practice of Statistics Ch. 1, 2.5, 3.2 MBP1010 – Jan. 4, 2011

(1) How can we describe and draw meaning from a collection of data? (2) How can we infer information about the whole population when we know data from only some of the population (a sample)? Meaning from Data

- science of understanding data and making decisions in the face of variability and uncertainty - statistics is NOT a field of mathematics

Statistical Thinking -humans are good at recognizing patterns and there is real danger of over-interpreting patterns that are merely due to the play of chance (false leads) - role of statistics - to reject chance as an explanation so that we can have reasonable assurance that patterns seen are worthy of interpretation

Statistical Thinking - explore data prior to analysis - think about context and design - reasoning behind standard statistical methods Interpretation/Conclusions

1. Study designs/Looking at data 2. Concepts of statistical inference and hypothesis testing 3. Specific statistical tests - 1 and 2 sample test for continuous and categorical data - correlation, regression and ANOVA 4. Other Topics - eg sensitivity/specifiicity, survival analysis, logistic regression 5. Bioinformatics Course Overview

Changes to MBP1010 this year Good news: doing less actual statistical analysis focus more on concepts/interpretation Bad news: short time frame to implement changes Department has made attendance at lectures mandatory. Good news/Bad news!

What statistical software is available in your lab? What software does you supervisor recommend? What statistical software have you used? to: by Mon Jan 10 at the latest Information Requested

Course Information Tutorials: Thursdays 2 to 3:30 pm OCI First tutorial: Jan 13, 2011 TA: Dave Stock Lectures: Tuesdays 1 to 3 pm 620 University, Course Website – U of T Blackboard UTORiD and password; U of T address

Updated course information and schedule posted at website no lecture or tutorial (Jan 25/27) Updated marking scheme 3 Biostatistics Assignments: =35% Biostatistics Exam:30% Bioinformatics Assignment30% Participation 5%

Resources see website for electronic resources Introduction to the Practice of Statistics (5th Edition), by Moore, DS and McCabe, GP). Presenting medical statistics from proposal to publication: A step-by-step guide. by Janet Peacock and Sally KerryJanet PeacockSally Kerry

Can what we eat influence our risk of cancer? The case of dietary fat and breast cancer Study Design Posted on website: New York Times article Searching for clarity: A primer on medical studies

What should we do next?

An observational study observes individuals and measures variables of interest but does not attempt to influence the responses. Observational Studies

Case/control and cohort studies common in cancer research (epidemiology) - outcome is binary: cancer/ no cancer Observational studies often examine factors associated with continuous outcome variables - eg association of body weight or diet with hormone levels - calcium intake and blood pressure

X X X X X X X 0 X X X Exposure eg diet Case Control Study Exposure eg diet

X0 0 0 X 0 0 X 0 0 X Cohort Study Exposure eg diet Cancer (yes/no)

Relative Risk Compare risk of disease in those with highest versus lowest intake RR = 1.0 no association RR = times the risk 40% higher risk RR = % lower risk

a. Total Fat Odds Ratio or Relative Risk Case Control: Challier (1998) DeStefani (1998) Ewertz (1990) Franceschi (1996) Graham (1982) Graham (1991) Hirohata (1985) Hirohata (1987) (Caucasian) Hirohata (1987) (Japanese) Ingram (1991) Katsouyanni (1988) Katsouyanni (1994) Landa (1994) Lee (1991) Levi (1993) Mannisto (1999) Martin-Moreno (1994) Miller (1978) Núñez (1996) Potischman (1998) Pryor (1989) Richardson (1991) Rohan (1988) Shun-Zhang (1990) Toniolo (1989) Trichopoulou (1995) van't Veer (1990,1991) Wakai (2000) Witte (1997) Yuan (1995) Zaridze (1991) Case Control Summary Cohort : Gaard (1995) Graham (1992) Holmes (1999) Howe (1991) Jones (1987) Knekt (1990) Kushi (1992) Thiébaut (2001 ) Toniolo (1994) van den Brandt (1993) Velie (2000) Wolk (1998) Cohort Summary All Studies Summary Bingham (2003) Cho (2003)

Interpretation Suppose we find that women who eat a low fat diet tend to have lower risk of breast cancer. Can we conclude that the fat in the diet is responsible for the lower risk of breast cancer?

Interpretation Suppose we find that women who eat a low fat diet tend to have lower risk of breast cancer. Can we conclude that the fat in the diet is responsible for the lower risk of breast cancer? No. Other factors may be responsible for the association with dietary fat (confounding)

Problem of Confounding Suppose A is associated with B: This may be because: A causes B B causes A X is associated with both A and B X need not be a cause of either A or B

Problem of Confounding -women who eat more dietary fat may differ from those who less fat (eg. weight, exercise, other dietary factors) -these factors may influence the risk of breast cancer In our dietary fat example:

Trying to control for confounding - measure potential confounders eg. measure weight and physical activity -“control” for possible confounders in analysis - but…what about confounding with variables we don’t know exist or can’t measure?

An observational study observes individuals and measures variables of interest but does not attempt to influence the responses. Association between variables a response variable, even if it is very strong, is not good evidence of a cause and effect link between variables Observational Studies Correlation is not causation

Basic principles of experimental design 1.Formulate question/goal in advance 2. Comparison/control 3. Replication 4. Randomization 5. Stratification (or blocking)

Example Question: Does salted drinking water affect blood pressure (BP) in mice? Experiment: 1Provide treatment - water containing 1% NaCl for 14 days 1.Measure outcome - BP 29

Comparison/control Good experiments are comparative. Compare BP in mice fed salt water to BP in mice fed plain water. Ideally, the experimental group is compared to concurrent controls (rather than to historical controls). 30

Why replicate? Reduce the effect of uncontrolled variation (i.e., increase precision). Quantify uncertainty. A related point: An estimate of effect is of no value without some statement of the uncertainty in the estimate. 31

Randomization Experimental subjects (“units”) should be assigned to treatment groups at random. At random does not mean haphazardly. One needs to explicitly randomize using A computer, or Coins, dice or cards. 32

Why randomize? Avoid bias. –For example: the first six mice you grab may have intrinsically higher BP. Control the role of chance. –Randomization allows the later use of probability theory, and so gives a solid foundation for statistical analysis. 33

Stratification Suppose that measurements will be made in males and females AND You anticipate a difference in response between males and females – Randomize within males and females separately - any systematic difference by sex removed - this is sometimes called “blocking”. -Take account of the difference between males and females in analysis: - helps control variability

Randomization and stratification If you can (and want to), fix a variable. – e.g., study only men or women or a single strain of animal If you don’t fix a variable, stratify on it. – e.g., randomize treatment men and women If you can neither fix nor stratify a variable, randomize to treatment.

Other points Blinding –Measurements made by people can be influenced by unconscious biases. –Ideally, measurements should be made without knowledge of the treatment applied. Internal controls –use the subjects themselves as their own controls (e.g., consider the response after vs. before treatment). –Why? Increased precision. 36

Other points Representativeness –Are the subjects/tissues you are studying really representative of the population you want to study? –Ideally, your study material is a random sample from the population of interest. 37

Summary comparative - control group Unbiased –Randomization –Blinding High precision –Replication –Blocking Simple –Protect against mistakes Able to estimate uncertainty –Replication –Randomization Characteristics of good experiments: 38

Jackson et al. Nutr.Cancer, 1998 Dietary fat and mammary tumors in Sprague-Dawley rats (n=30 per diet group) Randomized Design

Dietary fat and fiber and mammary tumors in Sprague-Dawley rats (n=30) Factorial Experiment

Women’s Health Initiative (US) - 48,835 postmenopausal women - followed for 8-12 years Diet and Breast Cancer Prevention Study high risk women - followed for 7-17 years Randomized Clinical Trials in Humans - Dietary Fat and Breast Cancer

Women’s Health Initiative - Postmenopausal women (50-79 years of age) - n=48,835; follow-up 8-12 years - randomized 40:60 intervention and control - group dietary counselling - follow up for breast cancer

Copyright restrictions may apply. Prentice, R. L. et al. JAMA 2006;295: Kaplan-Meier Estimates of the Cumulative Hazard for Invasive Breast Cancer

Eligible Subjects Identified (> 50% density) Prerandomization Assessment Intervention Control (n=2,343) (n=2,350) Annual Visits demo/anthro data diet records non fasting serum Follow up until Dec 2005 (7-17 years per subject) breast cancer incidence

Cumulative breast cancer hazards and odds ratios according to randomized group. Martin L J et al. Cancer Res 2011;71: ©2011 by American Association for Cancer Research

Practical Issues: - long (particularly for cancer outcomes!) - expensive - limited in “treatment” options Randomized Clinical Trials in Humans

- highly selected subjects - selection criteria and motivation - subject/investigator blinding - subjects drop out -compliance? - other changes with intervention? Randomized Clinical Trials in Humans Other issues:

salt intake? food intake? weight? activity? Does salted drinking water affect BP in mice?

Main Points - primary interest is causal relationships between variables - observational studies show associations only - randomized studies best for causation but are not without challenges - totality of evidence important

What’s in the dataset? What are the observations (individuals)? Eg people, animals, cells, countries How many observations are in the dataset? How many observations should there be? Are the observations independent? - repeated in an individudal?

What are the variables? What is their exact definition? How were they measured? What are the units of measurement? What type of variables? What’s in the dataset?

Main Types of Variables Categorical: - include nominal and dichotomous variables - qualitative difference between values - eg sex (male/female), smoker/non smoker Continuous: - quantitative - equal distance between each value - eg blood pressure, age, dietary fat Ordinal variables can be ordered but they do not have specific numeric values, eg scales, ratings

Continuous Variables Examining a distribution: overall pattern can be described by shape, centre and spread in a graph of data look for overall pattern and striking deviations from the pattern outlier – individual value that falls outside the overall pattern

Stem and Leaf Plots - displays distribution of small/moderate amounts of data - includes the actual numerical values Example data: Blood pressure data in 21 patients : 8 10 : : : : 06 Stem (all but last digit) Leaf (last digit)

(left)—Serum albumin values in 248 adults FIG 2 (right)—Normal distribution with the same mean and standard deviation as the serum albumin values. Altman D G, Bland J M BMJ 1995;310:298 ©1995 by British Medical Journal Publishing Group

Importance of Normal Distribution* 1. Distributions of real data are often close to normal. 2. Mathematically easy to work with so many statistical tests are designed for normal (or close to normal) distributions). 3. If the mean and SD of a normal distribution are known, you can make quantitative predictions about the population. * also called Gaussian curve

Describing Distributions with Numbers

Blood Pressure Data: n= 21 measurements mean = 2395/21 = 114 median = observation 11 =

Mean versus Median - skewed data 0: : : 039 3: 1 4: 4 5: 6: 2 Stem Plot Mean = 16.7 Median = 11

BP data; n = 10 Min Q1 Median Q3 Max

75% quantile 25% quantile Median IQR 1.5xIQR Everything above or below are considered outliers

Dot Plot

Measures of Spread - range of data set: largest - smallest value - interquartile range (IQR): 3rd minus 1st quartile - sample variance and standard deviation

Deviation from the Mean

Choosing a summary Five-number summary -skewed distribution - outliers x and s (mean and std dev.) - reasonably symmetric - free of outliers

Extreme Observations or Outliers - rule of thumb 1.5 x IQR for potential outliers - observations that stand apart from the overall pattern (not just extreme values) - do not automatically delete outliers - try to explain them - an error in measurement or in recording data - an usual occurrence - describe outliers, what you do with them and what their effect is

1.5 x 3.5(IQR) = th (11.46) = MJ Energy expenditure in 29 women measured by doubly labelled water (MJ per day).

What did we do about the outlier? - checked recording/calculations/data entry - unusual occurrence? - biologically plausible? - re-measured laboratory samples - analysis with and without outlier - described all above in paper

Data Display

Data presentation Bad plot Good plot 78

% Dietary Fat Intervention Control Dietary fat intake in the intervention and control groups (n=150 intervention and 187 control)

How to Display Data Badly H Wainer (1984) How to display data badly. American Statistician 38(2): posted at website -use of Microsoft Excel and Powerpoint has resulted in remarkable advances in the field (of poor data display)

The aim of good data graphics: Display data accurately and clearly. Some rules for displaying data badly: – Display as little information as possible. – Obscure what you do show (with chart junk). – Use pseudo-3d and color gratuitously. – Make a pie chart (preferably in color and 3d). – Use a poorly chosen scale. General principles

Pay attention to scale! Same data, different scale

Displaying data well Be accurate and clear. Let the data speak. – Show as much information as possible, taking care not to obscure the message. Science not sales. – Avoid unnecessary frills — esp. gratuitous 3d. In tables, every digit should be meaningful.

Further reading – Data Display ER Tufte (1983) The visual display of quantitative Information. (1990) Envisioning information. (1997) Visual explanations. WS Cleveland (1993) Visualizing data. Hobart Press. WS Cleveland (1994) The elements of graphing data. CRC Press.

What statistical software is available in your lab? What software does you supervisor recommend? What statistical software have you used? to: by Mon Jan 10 at the latest Information Requested