1 Introduction to biostatistics Lecture plan 1. Basics 2. Variable types 3. Descriptive statistics: Categorical data Categorical data Numerical data Numerical.

Slides:



Advertisements
Similar presentations
STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!
Advertisements

Descriptive Measures MARE 250 Dr. Jason Turner.
Descriptive Statistics
Statistics. Review of Statistics Levels of Measurement Descriptive and Inferential Statistics.
Statistical Tests Karen H. Hagglund, M.S.
Bios 101 Lecture 4: Descriptive Statistics Shankar Viswanathan, DrPH. Division of Biostatistics Department of Epidemiology and Population Health Albert.
QUANTITATIVE DATA ANALYSIS
Descriptive Statistics A.A. Elimam College of Business San Francisco State University.
Descriptive Statistics – Central Tendency & Variability Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke.
DATA ANALYSIS I MKT525. Plan of analysis What decision must be made? What are research objectives? What do you have to know to reach those objectives?
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Introduction to Educational Statistics
Descriptive statistics (Part I)
Quantitative Data Analysis Definitions Examples of a data set Creating a data set Displaying and presenting data – frequency distributions Grouping and.
Data Analysis Statistics. Inferential statistics.
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
(a brief over view) Inferential Statistics.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Statistical Analysis I have all this data. Now what does it mean?
Hypothesis Testing Charity I. Mulig. Variable A variable is any property or quantity that can take on different values. Variables may take on discrete.
Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.
Choosing and using statistics to test ecological hypotheses
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
Chapter 15 Data Analysis: Testing for Significant Differences.
1.
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Statistical Analysis I have all this data. Now what does it mean?
PPA 501 – Analytical Methods in Administration Lecture 5a - Counting and Charting Responses.
Descriptive Statistics Roger L. Brown, Ph.D. Medical Research Consulting Middleton, WI Online Course #1.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
RESULTS & DATA ANALYSIS. Descriptive Statistics  Descriptive (describe)  Frequencies  Percents  Measures of Central Tendency mean median mode.
Central Tendency Introduction to Statistics Chapter 3 Sep 1, 2009 Class #3.
Practice 1 Tao Yuchun Medical Statistics
Chapter 2 Describing Data.
The exam duration: 1hour 30 min. Marks :25 All MCQ’s. You should choose the correct answer. No major calculations, but simple maths IQ is required. No.
Introduction to Statistics Mr. Joseph Najuch Introduction to statistical concepts including descriptive statistics, basic probability rules, conditional.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
Skewness & Kurtosis: Reference
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Statistics in Biology. Histogram Shows continuous data – Data within a particular range.
Determination of Sample Size: A Review of Statistical Theory
Medical Statistics as a science
Statistics 1: Introduction to Probability and Statistics Section 3-2.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 6 Putting Statistics to Work.
Copyright © 2005 Pearson Education, Inc. Slide 6-1.
STATISTICS FOR SCIENCE RESEARCH (The Basics). Why Stats? Scientists analyze data collected in an experiment to look for patterns or relationships among.
Principles of statistical testing
LIS 570 Summarising and presenting data - Univariate analysis.
Descriptive and Inferential Statistics Or How I Learned to Stop Worrying and Love My IA.
Introduction to Medical Statistics. Why Do Statistics? Extrapolate from data collected to make general conclusions about larger population from which.
Statistics Nik Bobrovitz BHSc, MSc PhD Student University of Oxford December 2015
Measurements Statistics WEEK 6. Lesson Objectives Review Descriptive / Survey Level of measurements Descriptive Statistics.
II. Descriptive Statistics (Zar, Chapters 1 - 4).
Descriptive Statistics(Summary and Variability measures)
Summarizing Data with Numerical Values Introduction: to summarize a set of numerical data we used three types of groups can be used to give an idea about.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Chapter 5 STATISTICS (PART 1).
Description of Data (Summary and Variability measures)
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Descriptive and inferential statistics. Confidence interval
Statistics 1: Introduction to Probability and Statistics
Review for Exam 1 Ch 1-5 Ch 1-3 Descriptive Statistics
Descriptive Statistics
Advanced Algebra Unit 1 Vocabulary
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Introductory Statistics
Presentation transcript:

1 Introduction to biostatistics Lecture plan 1. Basics 2. Variable types 3. Descriptive statistics: Categorical data Categorical data Numerical data Numerical data 4. Inferential statistics Confidence intervals Confidence intervals Hipotheses testing Hipotheses testing

2 DEFINITIONS STATISTICS can mean 2 things: - the numbers we get when we measure and count things (data) - a collection of procedures for describing and anlysing data. BIOSTATISTICS – application of statistics in nature sciences, when biomedical and problems are analysed.

3 Why do we need statistics? ????

4 Basic parts of statistics: Descriptive Descriptive Inferential Inferential

5 Terminology Population Sample Variables

6 Variable types Categorical (qualitative) Categorical (qualitative) Numerical (quantitative) Numerical (quantitative) Combined Combined

7 Categorical data Nominal 2 categories 2 categories >2 categories >2 categories Ordinal

8 Numerical data Continuous Continuous Discrete Discrete

9 Description of categorical data Arranging data Arranging data Frequencies, tables Frequencies, tables Visualization (graphical presentation) Visualization (graphical presentation)

10 Frequencies and contingency tables From those who were unsatisfied 4 were males, 6 were females. TotalMalesFemales Satisfied4080%1477,8%2681,3% Unsatisfied10 20 % 422,2%618,7% Total50100%18100%32100%

11 Graphical presentation

12 Graphical presentation

13 Graphical presentation

14 Graphical presentation

15 Graphical presentation Other: - Maps - Chernoff faces - Star plots, etc.Other: - Maps - Chernoff faces - Star plots, etc.

16 Description of numerical data Arranging data Arranging data Frequencies (relative and cumulative), graphical presentation Frequencies (relative and cumulative), graphical presentation Measures of central tendency and variance Measures of central tendency and variance Assessing normality Assessing normality

17 Grouping Sorting data Sorting data Groups (5-17 gr.) according researcher’s criteria. Groups (5-17 gr.) according researcher’s criteria. To assess distribution, for graphical presentation in excel

18 Frequencies, their comparison and calculation 197 students were asked about the amount of money (litas) they had in cash at the moment.

19 Gaphical presentation of frequencies

20 Normal distributions Most of them around center Most of them around center Less above and lower central values, approximately the same proportions Less above and lower central values, approximately the same proportions Most often Gaussian distribution Most often Gaussian distribution

21 Not normal distributions More observations in one part. More observations in one part.

22 Asymmetrical distribution

23 How would you describe/present your respondents if the data are numeric? 2 groups of measures: 1. Central tendency (central value, average) 2. Variance

24 MEASURES OF CENTRAL TENDENCY Means/averages (arithmetic, geometric, harmonic, etc.) Means/averages (arithmetic, geometric, harmonic, etc.) Mode Mode Median Median Quartiles Quartiles

25 MEASURES OF CENTRAL TENDENCY Arithmetic mean (X, μ) Arithmetic mean (X, μ)

26 MEASURES OF CENTRAL TENDENCY Median (Me) – the middle value or 50th procentile (the value of the observation, that divides the sorted data in almost equal parts). It is found this way When n odd: median is the middle observation When n odd: median is the middle observation When n even: median is the average of values of two middle observations When n even: median is the average of values of two middle observations

27 MEASURES OF CENTRAL TENDENCY Mode (Mo) – the most common values Mode (Mo) – the most common values Can be more than one mode Can be more than one mode

28 MEASURES OF CENTRAL TENDENCY Quartiles (Q 1, Q 2, Q 3, Q 4 ) – sample size is divided into 4 equal parts getting 25% of observations in each of them. Quartiles (Q 1, Q 2, Q 3, Q 4 ) – sample size is divided into 4 equal parts getting 25% of observations in each of them.

29 Is it enough measure of central tendency to describe respondents?

30 MEASURES OF VARIANCE Min and max Min and max Range Range Standard deviation – sqrt of variance (SD) Standard deviation – sqrt of variance (SD) Variance - V= ∑(x i - x) 2 /n-1 Variance - V= ∑(x i - x) 2 /n-1 Interquartile range (Q3-Q1 or 75%- 25%) IQRT Interquartile range (Q3-Q1 or 75%- 25%) IQRT

31 What measures are to be used for sample description? If distribution is NORMAL Mean Mean Variance (or standard deviation) Variance (or standard deviation) If distribution is NOT NORMAL Median Median IQRT or min/max IQRT or min/max Those measures are used also with numeric ordinal data

32 X, Mo, Me Mean~Median~Mode, SD ir empyric rule

33 EMPYRICAL RULE Number of observations (%) 1, 2 ir 2.5 SD from mean if distribution is normal

34 Example X -2SD +2SD X=8 SD=2,5

35 Normality assessment Summary Graphical Graphical Comparison of measures of central tendency; empyrical rule (mean and standard deviation) Comparison of measures of central tendency; empyrical rule (mean and standard deviation) Skewness and kurtosis (if Gaussian =0) Skewness and kurtosis (if Gaussian =0) Kolmogorov-Smirnov test Kolmogorov-Smirnov test

Median Mean( * ) 75th Procentile 25th Procentile 75th Procentile 25th Procentile OutliersBoxplot

Boxplot example

Central limit theorem

39 Inferential statistics Confidence intervals Confidence intervals Hipotheses testing Hipotheses testing

40 Confidence intervals Interval where the “true” value most likely could occur.

41 The variance of samples and their measures μ, σ, p 0 X 1, SD 1; p 1 X 2, SD 2 ; p 2 X 3, SD 3 ; p 3 X 4 ; SD 4 ; p 4 X

42 The variance of samples and confidence intervals μ, p 0

43 Confidence interval Statistical definition: Statistical definition: If the study was carried out 100 times, 100 results ir 100 CI were got, 95 times of 100 the “true” value will be in that interval. But it will not appear in that interval 5 times of 100.

44 Confidence intervals (general, most common calculation) 95% CI : X ± 1.96 SE X min ; X max Note: for normal distribution, when n is large 95% CI : p ± 1.96 SEp min ; p max Note: when p ir 1-p > 5/n Note: when p ir 1-p > 5/n

45 Standard error (SE) Numeric data (X ) Categorical data (p)

46 Width of confidence inerval depends on: a) Sample size; b) Confidence level (guaranty - usually 95%, but available any %); c) dispersion.

47 Hipotheses testing H 0 : μ 1 =μ 2 ; p 1 =p 2 ; (RR=1, OR=1, difference=0) H A : μ 1 ≠μ 2 ; p 1 ≠p 2 (two sided, one sided)

48 Significance level α (agreed 0.05). Test for P value (t-test, χ 2, etc. ). P value is the probability to get the difference (association), if the null hypothesis is true. OR P value is the probability to get the difference (association) due to chance alone, when the null hypothesis is true. Hipotheses testing

49 Statistical agreements If P<0.05, we say, that results can’t be explained by chance alone, therefore we reject H 0 and accept H A. If P<0.05, we say, that results can’t be explained by chance alone, therefore we reject H 0 and accept H A. If P≥0.05, we say, that found difference can be due to chance alone, therefore we don’t reject H 0. If P≥0.05, we say, that found difference can be due to chance alone, therefore we don’t reject H 0.

50 Tests Test depends on Study design, Study design, Variable type Variable type distribution, distribution, Number of groups, etc. Number of groups, etc. Tests (probability distributions): z test t test (one sample, two independent, paired) Χ2 (+ trend) F test Fisher exact test Mann-Whitney Wilcoxon and others.

51 P value tells, if there is statistically significant difference (association). P value tells, if there is statistically significant difference (association). CI gives interval where true value can be. CI gives interval where true value can be. Inferential statistics Summary

52 Inferential statistics Summary Neither P value, nor CI give other explanations of the result (bias and confounding). Neither P value, nor CI give other explanations of the result (bias and confounding). Neither P value, nor CI tell anything about the biological, clinical or public health meaning of the results. Neither P value, nor CI tell anything about the biological, clinical or public health meaning of the results.