Statistical Reasoning and Analysis

Slides:



Advertisements
Similar presentations
Andrea M. Landis, PhD, RN UW LEAH
Advertisements

Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
What is Interaction for A Binary Outcome? Chun Li Department of Biostatistics Center for Human Genetics Research September 19, 2007.
Chance, bias and confounding
ODAC May 3, Subgroup Analyses in Clinical Trials Stephen L George, PhD Department of Biostatistics and Bioinformatics Duke University Medical Center.
Statistics Micro Mini Threats to Your Experiment!
Sampling and Experimental Control Goals of clinical research is to make generalizations beyond the individual studied to others with similar conditions.
Today Concepts underlying inferential statistics
Cohort Studies Hanna E. Bloomfield, MD, MPH Professor of Medicine Associate Chief of Staff, Research Minneapolis VA Medical Center.
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
THREE CONCEPTS ABOUT THE RELATIONSHIPS OF VARIABLES IN RESEARCH
CORRELATIO NAL RESEARCH METHOD. The researcher wanted to determine if there is a significant relationship between the nursing personnel characteristics.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Multiple Choice Questions for discussion
RESEARCH A systematic quest for undiscovered truth A way of thinking
Simple Linear Regression
Chapter 12 Multiple Regression and Model Building.
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
Empowering Evidence: Basic Statistics June 3, 2015 Julian Wolfson, Ph.D. Division of Biostatistics School of Public Health.
 Is there a comparison? ◦ Are the groups really comparable?  Are the differences being reported real? ◦ Are they worth reporting? ◦ How much confidence.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Statistical Power 1. First: Effect Size The size of the distance between two means in standardized units (not inferential). A measure of the impact of.
1 Chapter 3: Experimental Design. 2 Effect of Wine Consumption on Heart Disease Death Rate **Each data point represents a different country.
POSC 202A: Lecture 12/10 Announcements: “Lab” Tomorrow; Final ed out tomorrow or Friday. I will make it due Wed, 5pm. Aren’t I tender? Lecture: Substantive.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
October 15. In Chapter 19: 19.1 Preventing Confounding 19.2 Simpson’s Paradox 19.3 Mantel-Haenszel Methods 19.4 Interaction.
1 Risk Assessment Tests Marina Kondratovich, Ph.D. OIVD/CDRH/FDA March 9, 2011 Molecular and Clinical Genetics Panel for Direct-to-Consumer (DTC) Genetic.
Research Design. Selecting the Appropriate Research Design A research design is basically a plan or strategy for conducting one’s research. It serves.
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
Matching. Objectives Discuss methods of matching Discuss advantages and disadvantages of matching Discuss applications of matching Confounding residual.
Feminist Methods of Research
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Contingency Tables.
Review Design of experiments, histograms, average and standard deviation, normal approximation, measurement error, and probability.
Copyright ©2011 Brooks/Cole, Cengage Learning Gathering Useful Data for Examining Relationships Observation VS Experiment Chapter 6 1.
Qualitative Research Quantitative Research. These are the two forms of research paradigms (Leedy, 1997) which are qualitative and quantitative These paradigms.
Foundations of Science
BIAS AND CONFOUNDING Nigel Paneth.
The epidemiological tool-box
Present: Disease Past: Exposure
Statistics for Managers using Microsoft Excel 3rd Edition
MODULE 2 Myers’ Exploring Psychology 5th Ed.
Take-home quiz due! Get out materials for notes!
Two-Way Tables and The Chi-Square Test*
Research Designs, Threats to Validity and the Hierarchy of Evidence and Appraisal of Limitations (HEAL) Grading System.
Qualitative Research Quantitative Research.
BIAS AND CONFOUNDING
Confounding and Effect Modification
Critical Reading of Clinical Study Results
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Essential Statistics (a.k.a: The statistical bare minimum I should take along from STAT 101)
Chapter 2 Looking at Data— Relationships
Chapter Eight: Quantitative Methods
Scatterplots, Association, and Correlation
Analyzing Reliability and Validity in Outcomes Assessment Part 1
Lecture Slides Elementary Statistics Eleventh Edition
Stat 112 Notes 4 Today: Review of p-values for one-sided tests
Chapter GS Getting Started.
Warm Up A 2008 study on students graduating with an Associates degree from public community colleges found the following levels of student debt. None %
Chapter GS Getting Started.
ERRORS, CONFOUNDING, and INTERACTION
Critical Appraisal วิจารณญาณ
I. Introduction and Data Collection C. Conducting a Study
Chapter GS Getting Started.
Analyzing Reliability and Validity in Outcomes Assessment
Chapter GS Getting Started.
Bias in Researches Prof. Dr. Maha Al-Nuaimi.
Chapter 22 – Comparing Two Proportions
Warm Up A 2017 report from the National Student Clearinghouse Research Center measured the percentage of students who had graduated, were still enrolled.
Presentation transcript:

Statistical Reasoning and Analysis Tony Panzarella Department of Biostatistics Princess Margaret Cancer Center  Tony.Panzarella@uhnres.utoronto.ca September 2014

“Lies, damned lies, and statistics” (British Prime Minister Benjamin Disraeli,1804–1881) “Talked politics, scandal, and the three classes of witnesses—liars, d—d liars, and experts.” (The Life and Letters of Thomas Henry Huxley by L. Huxley, 1900) https://en.wikipedia.org/wiki/Lies,_damned_lies,_and_statistics

Statistics – the new Sexy!? Hal Varian, Google's chief economist, says statistician will be 'the sexy job in the next 10 years.' Chad Schafer explains why. (August 4, 2013, Daniel Marsula/Post-Gazette) http://www.post-gazette.com/opinion/Op-Ed/2013/08/04/The-Next-Page-Data-Driven-Why-statistics-is-sexy/stories/201308040172#ixzz3CFcEr9IV Google’s prediction: What will be the "sexy" job in the next ten years? Here’s a strange prediction from Google’s Chief Economist: “I keep saying that the sexy job in the next 10 years will be statisticians. And I’m not kidding.”

Some Keywords Study Designs, Confounding Pitfalls of Data Analysis Bias (representative sampling, statistical assumptions) Errors in methodology (statistical power, multiple comparisons, measurement error) Interpretation (precision and accuracy, causality, graphical representation) Era of Big Data

Prelude: Design and Analysis Objective: Design the ultimate Intro to PHS talk… and the worst one that I can still get away with… Methods: Identify topic(s), and delivery with visuals Examples; No formula Take-home messages

Correlations, Cluster analysis Source: Sebastian Wernicke, 2010, TedTalks

Nonparametric methods using ranks, Discriminant analysis Source: Sebastian Wernicke, 2010, TedTalks

Confidence Intervals, Associations, Time-to-event analysis Source: Sebastian Wernicke, 2010, TedTalks

Data Mining, Pattern Recognitions Source: Sebastian Wernicke, 2010, TedTalks

Pattern Recognitions (Evidence….Hans Rolling) Source: Sebastian Wernicke, 2010, TedTalks Hans Rolling “The best stats you’ve ever seen, New insights on poverty)

Motivating Example: Smoking & Survival 20-year follow-up study, Wickham in UK (Tunbridge et al. 1977) 1972-1974, one-in-six survey of the electoral roll, largely concerned with thyroid disease and heart disease For simplicity, consider women aged 45 to 75 at the start of the study Smoking status: current smoker (Y/N) 20-year survival info: determined for all women in the study

Smoking & Survival (Cont’d) Protective effect of smoking? (data adapted from Appleton et al. 1996, Am. Stat.)

Smoking & Survival (Con’t) Consider 10-year ranges: 45-54,55-64,65-75 Non-smoking group does better in each case!

Gender Bias, or Not? 1973, UC Berkeley was sued for discrimination against women in graduate school admissions Percent acceptance: Male vs Female, 44% vs. 35%

Gender Bias, or Not? (cont’d) P. J. Bickel, E. A. Hammel, J. W. O'Connell. (1975). Sex Bias in Graduate Admissions: Data from Berkeley. Science 187, (4175). pp. 398-404

Message #1 Be aware of the dangers of ignoring a covariate that is correlated to an outcome variable and an explanatory one. Simpson, E.H. (1951). “The interpretation of Interaction in Contingency Tables”, Journal of the Royal Statistical Society, B, 13, 238-241. Simpson’s Paradox; many other examples

Guard Against Biases

Biases due to … Selection of subjects: web surveys Responses: e.g. question on income Contamination in controls: non-blind study Recall: food-intake Attrition: drop out Reporting: negative findings Publication: meta-analysis Over thirty kinds of biases

Guard Against Biases [BACKUP REFERENCE] Bias in design Concato et al (2001). A nested case–control study of the effectiveness of screening for prostate cancer: research design Concato et al. (2001) reports another type of bias in designs for prostate cancer detection when groups were asymptomatic men who received digital rectal examination, screening by prostate specific antigen and transrectal ultrasound, but there was no ‘control’ group with ‘no screening’. Thus the effectiveness of screening could not be evaluated. Although prostate-specific antigen (PSA) and digital rectal examination (DRE) are commonly used to screen for prostate cancer, available data do not confirm that either test improves survival. This report describes the methodological aspects of a nested case–control study addressing the question of whether PSA screening, with or without DRE, is effective in increasing survival. Potential sources of bias are discussed, as well as corresponding strategies used to avoid them

Possible steps to minimize bias Assess the validity of the identified target population, and the groups to be included in the study in the context of objectives and the methodology. Evaluate the reliability & validity of the measurements required to assess the antecedents and outcomes, also other tools you plan to deploy. Carry out a pilot study and pretest the tools. Make changes as needed. Identify possible confounding factors and other sources of bias; develop an appropriate design that can take care of most of these biases. Use matching, blinding, masking, and random allocation as needed. Analyze the data with proper statistical methods. Use standardized or adjusted rates where needed, do the stratified analysis, or use mathematical models such as regression to take care of biases that could not be ruled out by design. Report only the evidence based the results – enthusiastically but dispassionately

Multiple Testing (Large p & small n)

Data Dredging

“Deming, data and observational studies: a process out of control and needing fixing” Young and Karr (2011) Significance, p116-120.

“Deming, data and observational studies: a process out of control and needing fixing”

Deming, data and observational studies: a process out of control and needing fixing” Young and Karr (2011). Significance, p116-120 Young, S. S. and Yu, M. (2009) To the Editor. Journal of the American Medical Association, 301, 720–721.

Visual Display of Quantitative Information Effectiveness of traffic enforcement in 1955-6, Before vs. After Source (Tufte, 1983)

In the Age of BIG Data Does Big Data make Statistics obsolete? NO! BIG data, Big mistake? (Google Flu) http://www.ft.com/cms/s/2/21a6e7d8-b479-11e3-a09a-00144feabdc0.html#axzz3CHWGduc7

Statistical Truisms Correlation does not imply Causation In fact, causal relationships are among the most significant discoveries from big data that analytics practitioners seek. Finding causes to observed effects would truly be a gold mine of value for any business, science, government, healthcare, or security group that is analyzing big data. Sample variance does not go to zero, even with Big Data

Statistical Truisms (cont’d) Sample bias does not necessarily go to zero, even with Big Data  Sample bias can lead to models with biased results, slanted against the wonderful diversity of the original population Absence of Evidence is not the same as Evidence of Absence  Reference: http://www.amstat.org/publications/jse/v10n3/chance.html 

Acknowledgements Prof. Wendy Lou

Q & A