Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at: courses.

Slides:



Advertisements
Similar presentations
Apr-15H.S.1 Stata: Linear Regression Stata 3, linear regression Hein Stigum Presentation, data and programs at: courses.
Advertisements

StatisticalDesign&ModelsValidation. Introduction.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
Chi Square Tests Chapter 17.
Statistical Tests Karen H. Hagglund, M.S.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
By Wendiann Sethi Spring  The second stages of using SPSS is data analysis. We will review descriptive statistics and then move onto other methods.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Chapter 13 Analyzing Quantitative data. LEVELS OF MEASUREMENT Nominal Measurement Ordinal Measurement Interval Measurement Ratio Measurement.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Chapter Eighteen MEASURES OF ASSOCIATION
Final Review Session.
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Chapter 19 Data Analysis Overview
Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:
Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at:
Ch. 14: The Multiple Regression Model building
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Chapter 14 Inferential Data Analysis
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Quantitative Methods: Choosing a statistical test Summer School June 2015 Dr. Tracie Afifi.
Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242.
PPA 501 – A NALYTICAL M ETHODS IN A DMINISTRATION Lecture 3b – Fundamentals of Quantitative Research.
Simple Linear Regression
Statistics for clinical research An introductory course.
Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 
Descriptive Statistics e.g.,frequencies, percentiles, mean, median, mode, ranges, inter-quartile ranges, sds, Zs Describe data Inferential Statistics e.g.,
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Statistics Definition Methods of organizing and analyzing quantitative data Types Descriptive statistics –Central tendency, variability, etc. Inferential.
N318b Winter 2002 Nursing Statistics Lecture 2: Measures of Central Tendency and Variability.
Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Linear correlation and linear regression + summary of tests
Research Seminars in IT in Education (MIT6003) Quantitative Educational Research Design 2 Dr Jacky Pow.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
Regression & Correlation. Review: Types of Variables & Steps in Analysis.
Academic Research Academic Research Dr Kishor Bhanushali M
ANALYSIS PLAN: STATISTICAL PROCEDURES
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.
Fundamental Concepts of Biostatistics Cathy Jenkins, MS Biostatistician II Lisa Kaltenbach, MS Biostatistician II April 17, 2007.
Chap 18-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 18-1 Chapter 18 A Roadmap for Analyzing Data Basic Business Statistics.
Statistics as a Tool A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Nonparametric Statistics
Lecture 7: Bivariate Statistics. 2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has.
Descriptive Statistics Printing information at: Class website:
Chapter 15 Analyzing Quantitative Data. Levels of Measurement Nominal measurement Involves assigning numbers to classify characteristics into categories.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Appendix I A Refresher on some Statistical Terms and Tests.
Chapter 18 Data Analysis Overview Yandell – Econ 216 Chap 18-1.
A radical view on plots in analysis
REGRESSION G&W p
LEVELS of DATA.
APPROACHES TO QUANTITATIVE DATA ANALYSIS
A statistical package for epidemiologists
Stats Club Marnie Brennan
Introduction to analysis DAGitty
Introduction to Statistics
Basic Statistical Terms
Presentation, data and programs at:
Standard Statistical analysis Linear-, logistic- and Cox-regression
Regression diagnostics
Statistics II: An Overview of Statistics
15.1 The Role of Statistics in the Research Process
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
Introductory Statistics
Presentation transcript:

Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at: courses

Jul-15H.S.2 Agenda Concepts Bivariate analysis –Continuous symmetrical –Continuous skewed –Categorical Multivariable analysis –Linear regression –Logistic regression Outcome variable decides analysis

CONCEPTS Jul-15H.S.3

Jul-15H.S.4 Precision and bias Measures of populations –precision - random error - statistics –bias - systematic error - epidemiology True value Estimate Precision Bias

Jul-15H.S.5 Precision: Estimation PopulationSample ( | ) Estimate with confidence interval 95% confidence interval: 95% of repeated intervals will contain the true value

Jul-15H.S.6 Precision: Testing PopulationSample | | group 1 group 2 p-value=P(observing this difference or more, when the true difference is zero)

Jul-15H.S.7 Precision: Significance level Birth weight, 500 newborn, observe difference H 0 : boys=girls 10 grp= grp= grp= grp= grp=0.02 H a : boys≠girls p<0.05 Significance level

Jul-15H.S.8 Precision: Test situations 1 sample test Weight =10 2 independent samples Weight by sex K independent samples Weight by age groups 2 dependent samples Weight last year = Weight today

Jul-15H.S.9 Bias: DAGs E gest age D birth weight C2 parity C1 sex AssociationsBivariate (unadjusted) Causal effectsMultivariable (adjusted) Draw your assumptions before your conclusions

WHY USE GRAPHS? Jul-15H.S.10

Jul-15H.S.11 Problem example Lunch meals per week –Table of means (around 5 per week) –Linear regression

Jul-15H.S.12 Problem example 2 Iron level by sex –Both linear and logistic regression –Opposite results Iron level in blood

Jul-15H.S.13 Datatypes Categorical data –Nominal: married/ single/ divorced –Ordinal:small/ medium/ large Numerical data –Discrete:number of children –Continuous:weight

Jul-15H.S.14 Outcome data type dictates type of analysis

BIVARIATE ANALYSIS 1 Continuous symmetric outcome: Birth weight Jul-15H.S.15

Jul-15H.S.16 Distribution kdensity weight drop if weight<2000 kdensity weight

Jul-15H.S.17 Central tendency and dispersion Mean and standard deviation: Mean with confidence interval:

Jul-15H.S.18 Compare groups, equal variance? EqualNot equal

Jul-15H.S.19 2 independent samples Are birth weights the same for boys and girls? Scatterplot Density plot

Jul-15H.S.20 2 independent samples test ttest weight, by(sex) unequalunequal variances ttest var1==var2paired test

Jul-15H.S.21 K independent samples Is birth weight the same over parity? Scatterplot Density plot

Jul-15H.S.22 K independent samples test equal means? Equal variances?

Jul-15H.S.23 Continuous by continuous Does birth weight depend on gestational age? Scatterplot Scatterplot, outlier dropped

Jul-15H.S.24 Continuous by continuous tests Cut gestational age up in groups, then use T-test or ANOVA or Use linear regression with 1 covariate

Jul-15H.S.25 Test situations 1 sample test ttest weight =10 2 independent samples test weight, by(sex) K independent samples oneway weight parity 2 dependent samples (Paired) ttest weight_last_year == weight_today

BIVARIATE ANALYSIS 2 Continuous skewed outcome: Number of sexual partners Jul-15H.S.26

Jul-15H.S.27 Distribution kdensity partners if partners<=50

Jul-15H.S.28 Central tendency and dispersion Median and percentiles:

Jul-15H.S.29 2 independent samples Do males and females have the same number of partners? ScatterplotDensity plot

Jul-15H.S.30 2 independent samples test equal medians?

Jul-15H.S.31 K independent samples Do partners vary with age? ScatterplotDensity plot

Jul-15H.S.32 K independent samples test equal medians?

Jul-15H.S.33 Table of descriptives

Jul-15H.S.34 Table of tests Categorical ordered: use nonparametric tests If N is large: may use parametric tests Remarks:If unequal variance in ANOVA: Use linear regression with robust variance estimation

BIVARIATE ANALYSIS 3 Categorical outcome: Being bullied Jul-15H.S.35

Jul-15H.S.36 Frequency and proportion Frequency: Proportion with CI:

Jul-15H.S.37 Proportion, confidence interval proportion: standard error: confidence interval: x=”disease” n=total number

Jul-15H.S.38 Crosstables equal proportions? Are boys bullied as much as girls?

Jul-15H.S.39 Ordered categories, trend Trend? equal proportions?

Jul-15H.S.40 Table of tests Categorical ordered: use nonparametric tests If N is large: may use parametric tests Remarks:If unequal variance in ANOVA: Use linear regression with robust variance estimation

MULTIVARIABLE ANALYSIS 1 Continuous outcome: Linear regression, Birth weight Jul-15H.S.41

Jul-15H.S.42 Regression idea

Jul-15H.S.43 Model and assumptions Model Association measure  1 = increase in y for one unit increase in x 1 Assumptions –Independent errors –Linear effects –Constant error variance Robustness –influence

Jul-15H.S.44 Workflow DAG Scatterplots Bivariate analysis Regression –Model estimation –Test of assumptions Independent errors Linear effects Constant error variance –Robustness Influence E gest age D birth weight C2 parity C1 sex

Categorical covariates 2 categories –OK 3+ categories –Use “dummies” “Dummies” are 0/1 variables used to create contrasts Want 3 categories for parity: 0, 1 and 2-7 children Choose 0 as reference Make dummies for the two other categories Jul-15H.S.45 generate Parity1 =(parity==1) if parity<. generate Parity2_7 =(parity>=2) if parity<.

Create meaningful constant Expected birth weight at: gest= 0, sex=0, parity=0, not meaningful gest=280, sex=1, parity=0

Model estimation Jul-15H.S.47

Jul-15H.S.48 Test of assumptions Plot residuals versus predicted y –Independent residuals? –Linear effects? –constant variance?

Jul-15H.S.49 Violations of assumptions Dependent residuals Use mixed models or GEE Non linear effects Add square term Non-constant variance Use robust variance estimation

Jul-15H.S.50 Influence

Jul-15H.S.51 Measures of influence Measure change in: –Predicted outcome –Deviance –Coefficients (beta) Delta beta Remove obs 1, see change remove obs 2, see change

Delta beta for gestational age Jul-15H.S.52 If obs nr 539 is removed, beta will change from 6 to 16

Removing outlier Jul-15H.S.53 Full modelOutlier removed One outlier affected two estimatesFinal model

MULTIVARIABLE ANALYSIS 2 Binary outcome: Logistic regression, Being bullied Jul-15H.S.54

Ordered categories and model Jul-15H.S.55 CategoriesRegression model 2Logistic 3-7Ordinal logistic >7Linear (treat as interval) Interval versus ordered scale: Interval scale Ordered scale 123 lowmediumhigh

Jul-15H.S.56 Logistic model and assumptions Association measure Odds ratio in y for 1 unit increase in x 1 Assumptions –Independent errors –Linear effects on the log odds scale Robustness –influence

Jul-15H.S.57Jul-1557Jul-15H.S.57 Being bullied We want the total effect of country on being bullied. –The risk of being bullied depends on age and sex. –The age and sex distribution may differ between countries. Should we adjust for age and sex? E country D bullied C1 age C2 sex No, age and sex are mediating variables

Logistic: being bullied Jul-15H.S.58 OR  RR if outcome is rare OR>RR (further from 1) if the outcome is common Prevalence of being bullied=17% Roughly: Same risk of being bullied in Island as in Sweden. 2 times the risk in Norway as in Sweden. 3 times the risk in Finnland as in Sweden.

Jul-15H.S.59 Summing up DAGs –State prior knowledge. Guide analysis Plots –Linearity, variance, outliers Bivariate analysis –Continuous symmetricalMean, T-test, anova –Continuous skewedMedian, nonparametric –CategoricalFreq, cross, chi-square Multivariable analysis –ContinuousLinear regression –BinaryLogistic regression