Statistical Analysis I Mosuk Chow, PhD Senior Scientist and Professor Department of Statistics December 8, 2015 CTSI BERD Research Methods Seminar Series.

Slides:



Advertisements
Similar presentations
Statistics 100 Lecture Set 6. Re-cap Last day, looked at a variety of plots For categorical variables, most useful plots were bar charts and pie charts.
Advertisements

Statistics It is the science of planning studies and experiments, obtaining sample data, and then organizing, summarizing, analyzing, interpreting data,
SUMMARIZING DATA: Measures of variation Measure of Dispersion (variation) is the measure of extent of deviation of individual value from the central value.
Introduction to statistics in medicine – Part 1 Arier Lee.
Statistical Tests Karen H. Hagglund, M.S.
Bios 101 Lecture 4: Descriptive Statistics Shankar Viswanathan, DrPH. Division of Biostatistics Department of Epidemiology and Population Health Albert.
QUANTITATIVE DATA ANALYSIS
Intro to Statistics for the Behavioral Sciences PSYC 1900
BCOR 1020 Business Statistics Lecture 15 – March 6, 2008.
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Descriptive statistics (Part I)
Statistics 800: Quantitative Business Analysis for Decision Making Measures of Locations and Variability.
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Alok Srivastava Chapter 2 Describing Data: Graphs and Tables Basic Concepts Frequency Tables and Histograms Bar and Pie Charts Scatter Plots Time Series.
Statistical Techniques in Hospital Management QUA 537
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
PPA 501 – A NALYTICAL M ETHODS IN A DMINISTRATION Lecture 3b – Fundamentals of Quantitative Research.
Chapter 1 Descriptive Analysis. Statistics – Making sense out of data. Gives verifiable evidence to support the answer to a question. 4 Major Parts 1.Collecting.
BIOSTATISTICS II. RECAP ROLE OF BIOSATTISTICS IN PUBLIC HEALTH SOURCES AND FUNCTIONS OF VITAL STATISTICS RATES/ RATIOS/PROPORTIONS TYPES OF DATA CATEGORICAL.
With Statistics Workshop with Statistics Workshop FunFunFunFun.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Statistics 3502/6304 Prof. Eric A. Suess Chapter 3.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
PTP 560 Research Methods Week 8 Thomas Ruediger, PT.
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
STAT 211 – 019 Dan Piett West Virginia University Lecture 1.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
Chapter 2 Describing Data.
Chapter 21 Basic Statistics.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
Agenda Descriptive Statistics Measures of Spread - Variability.
Unit 4 Statistical Analysis Data Representations.
Displaying Distributions with Graphs. the science of collecting, analyzing, and drawing conclusions from data.
Statistical Analysis II Lan Kong Associate Professor Division of Biostatistics and Bioinformatics Department of Public Health Sciences December 15, 2015.
Statistics with TI-Nspire™ Technology Module E Lesson 1: Elementary concepts.
Measurements Statistics WEEK 6. Lesson Objectives Review Descriptive / Survey Level of measurements Descriptive Statistics.
1 Take a challenge with time; never let time idles away aimlessly.
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
Statistics Vocabulary. 1. STATISTICS Definition The study of collecting, organizing, and interpreting data Example Statistics are used to determine car.
Data Presentation Numerical Summary Measures Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU.
Descriptive Statistics
Prof. Eric A. Suess Chapter 3
Measurements Statistics
Chapter 6 Introductory Statistics and Data
MATH-138 Elementary Statistics
Review 1. Describing variables.
LEVELS of DATA.
Unit 4 Statistical Analysis Data Representations
CHAPTER 5 Basic Statistics
Basic Statistics Overview
Description of Data (Summary and Variability measures)
Laugh, and the world laughs with you. Weep and you weep alone
Chapter 2 Describing Data: Graphs and Tables
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
An Introduction to Statistics
Introduction to Statistics
Basic Statistical Terms
Welcome!.
Review for Exam 1 Ch 1-5 Ch 1-3 Descriptive Statistics
Displaying Data – Charts & Graphs
(-4)*(-7)= Agenda Bell Ringer Bell Ringer
Probability and Statistics
Chapter 6 Introductory Statistics and Data
Biostatistics Lecture (2).
Introductory Statistics
Presentation transcript:

Statistical Analysis I Mosuk Chow, PhD Senior Scientist and Professor Department of Statistics December 8, 2015 CTSI BERD Research Methods Seminar Series

Biostatistics, Epidemiology, Research Design(BERD) BERD Goals: l Match the needs of investigators to the appropriate biostatisticians/epidemiologists/methodologists l Provide BERD support to investigators l Offer BERD education to students and investigators via in-person, videoconferenced, and on-line classes

Statistics Encompasses l Study design n Selection of efficient design (cohort study/case-control study) n Sample size n Randomization l Data collection l Summarizing data n Important first step in understanding the data collected l Analyzing data to draw conclusions l Communicating the results of analyses

Keys to Successful Collaboration Between Statistician and Investigator: A Two-Way Street l Involve statistician at beginning of project (planning/design phase) l Specific objectives l Communication n avoid jargon n willingness to explain details

Keys to Successful Collaboration: A Two-Way Street l Respect n Knowledge n Skills n Experience n Time l Embrace statistician as a member of the research team l Fund statistician on grant application for best collaboration n Most statisticians are supported by grants, not by Institutional funds

Statistical Analysis l Describing data n Numeric or graphic l Statistical Inference n Estimation of parameters of interest n Hypothesis testing n Regression modeling l Interpretation and presentation of the results

Describing data: Basic Terms l Measurement – assignment of a number to a characteristic of an object or event l Data – collection of measurements l Sample – collected data l Population – all possible data l Variable – a property or characteristic of the population/sample – e.g., gender, weight, blood pressure.

Example of data set/sample Data on albumin and bilirubin levels before and after treatment with a study drug

Describing Data l Types of data l Summary measures (numeric) l Visually describing data (graphical)

Types of Variables l Qualitative or Categorical n Binary (or dichotomous) True/False, Yes/No n Nominal – no natural ordering Ethnicity n Ordinal – Categories have natural ranks u Degree of agreement (strong, modest, weak) u Size of tumor (small, medium, large) l Quantitative n Ratio - Ordered, constant scale, natural zero (age, weight) n Interval- Ordered, constant scale, no natural zero u Differences make sense, but ratios do not u Temperature in Celsius (30°-20°=20°-10°, but 20°/10° is not twice as hot)

Types of Measurements for Quantitative Variables l Continuous: Weight, Height, Age l Discrete: a countable number of values n The number of births, Age in years l Likert scale: “agree”, “strongly agree”, etc. Somewhere between ordinal and discrete n Scales with <= 4 possibilities are usually considered to be ordinal. n Scales with >=7 possibilities are usually considered to be discrete.

Descriptive Statistics Quantitative variable l Measure(s) of central location/tendency n Mean n Median n Mode l Measure(s) of variability (dispersion) n describe the spread of the distribution

l Summary Measures of dispersion/variation n Minimum and Maximum n Range = Maximum – Minimum n Sample variances ( abbreviated s 2 ) and standard deviation (s or SD) with denominator=n-1 Descriptive Statistics (cont.)

Other Measures of Variation l Interquartile range (IQR): 75 th percentile – 25 th percentile l MAD: median absolute deviation l CV: Coefficient of variation n Ratio of SD over sample mean n Measure relative variability n Independent of measurement units n Useful for comparing two or more sets of data

Tell whole story of data, detect outliers l Histogram l Stem and Leaf Plot l Box Plot Describing data graphically

Histogram l Divide range of data into intervals (bins) of equal width. l Count the number of observations in each class. 113 men Each bar spans a width of 5 mmHg. The height represents the number of individuals in that range of SBP.

Histogram of SBP Bin Width = 20 mmHg Bin Width = 1 mmHg

Stem and Leaf Plot l Provides a good summary of data structure l Easy to construct and much less prone to error than the tally method of finding a histogram “stem”: the first digit or digits of the number. “leaf” : the trailing digit.

Box Plot: SBP for 113 Males Sample Median Blood Pressure 75 th Percentile 25 th Percentile Largest Observation Smallest Observation

Descriptive Statistics (cont.) Categorical variable l Frequency (counts) distribution l Relative frequency (percentages) l Pie chart l Bar graph

Describe relationship between two variables One quantitative and one categorical l Descriptive statistics within each category l Side by side boxplots/histograms Both quantitative l Scatter plot Both categorical l Contingency table

A process of making inference (an estimate, prediction, or decision) about a population (parameters) based on a sample (statistics) drawn from that population. Statistical Inference Statistics (Vary from sample to sample) Parameters (Fixed, unknown) Population Sample Inference

Statistical Inference Questions to ask in selecting appropriate methods l Are observation units independent? l How many variables are of interest? l Type and distribution of variable(s)? l One-sample or two-sample problem? l Are samples independent? l Parameters of interest (mean, variance, proportion)? l Sample size sufficient for the chosen method? (see decision making flow chart in the handout)

Estimation of population mean l We don’t know the population mean μ but would like to estimate it. l We draw a sample from the population. l We calculate the sample mean X. l How close is X to μ? l Statistical theory will tell us how close X is to μ. l Statistical inference is the process of trying to draw conclusions about the population from the sample.

Key Statistical Concept l Question: How close is the sample mean to the population mean? l Statistical Inference for sample mean n Sample mean will change from sample to sample n We need a statistical model to quantify the distribution of sample means (Sampling distribution) n Sometimes, need “normal distribution” for the population data

Normal Distribution l Normal distribution, denoted by N(µ,  2 ), is characterized by two parameters µ: The mean is the center.  : The standard deviation measures the spread (variability). Mean Standard Deviation Standard Deviation Mean Probability density function

Distribution of Blood Pressure in Men (population) Y: Blood pressure Y~ N( µ,  2 ) Parameters: Mean, µ = 125 mmHg SD,  = 14 mmHg 99.7% 95% 68% The rule for normal distribution applied to the distribution of systolic blood pressure in men.

Sampling Distribution l The sampling distribution refers to the distribution of the sample statistics (e.g. sample means) over all possible samples of size n that could have been selected from the study population. l If the population data follow normal distribution N(µ,  2 ), then the sample means follow normal distribution N(µ,  2 /n). l What if the population data do not come from normal distribution?

Central Limit Theorem (CLT) l If the sample size is large, the distribution of sample means approximates a normal distribution. ~ N( µ,  2 /n) l The Central Limit Theorem works even when the population is not normally distributed (or even not continuous). tml tml For sample means, the standard rule is n > 60 for the Central Limit Theorem to kick in, depending on how “abnormal” the population distribution is. 60 is a worst-case scenario.

Sampling Distribution l By CLT, about 95% of the time, the sample mean will be within two standard errors of the population mean. n This tells us how “close” the sample statistic should be to the population parameter. l Standard errors (SE) measure the precision of your sample statistic. l A small SE means it is more precise. l The SE is the standard deviation of the sampling distribution of the statistic.

Standard Error of Sample Mean l The standard error of sample mean (SEM) is a measure of the precision of the sample mean. n  : standard deviation (SD) of population distribution. SEM = The standard deviation is not the standard error of a statistic!

Example l Measure systolic blood pressure on random sample of 100 students Sample sizen = 100 Sample mean = 125 mm Hg Sample SDs = 14.0 mm Hg l Population SD (  ) can be replaced by sample SD for large sample SEM =

Confidence Interval for population mean l An approximate 95% confidence interval for population mean µ is: ± 2×SEM or precisely l is a random variable (vary from sample to sample), so confidence interval is random and it has 95% chance of covering µ before a sample is selected. l Once a sample is taken, we observe, then either µ is within the calculated interval or it is not. l The confidence interval gives the range of plausible values for µ.