Session III Introduction to Basic Data Analysis Dr. L. Jeyaseelan Professor Department of Biostatistics Christian Medical College, Vellore.

Slides:



Advertisements
Similar presentations
Introduction to Data Analysis
Advertisements

Confounding And Interaction Dr. L. Jeyaseelan Department Of Biostatistics CMC, Vellore.
Bios 101 Lecture 4: Descriptive Statistics Shankar Viswanathan, DrPH. Division of Biostatistics Department of Epidemiology and Population Health Albert.
QUANTITATIVE DATA ANALYSIS
Calculating & Reporting Healthcare Statistics
1 Economics 240A Power One. 2 Outline w Course Organization w Course Overview w Resources for Studying.
Intro to Descriptive Statistics
1 Basic statistics Week 10 Lecture 1. Thursday, May 20, 2004 ISYS3015 Analytic methods for IS professionals School of IT, University of Sydney 2 Meanings.
Introduction to Educational Statistics
Data observation and Descriptive Statistics
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Measures of Central Tendency
Math 116 Chapter 12.
Describing Data: Numerical
@ 2012 Wadsworth, Cengage Learning Chapter 5 Description of Behavior Through Numerical 2012 Wadsworth, Cengage Learning.
With Statistics Workshop with Statistics Workshop FunFunFunFun.
Chapter 3 Statistical Concepts.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
1 Data and central tendency Integrated Disease Surveillance Programme (IDSP) district surveillance officers (DSO) course.
Measures of Central Tendency or Measures of Location or Measures of Averages.
1 DATA DESCRIPTION. 2 Units l Unit: entity we are studying, subject if human being l Each unit/subject has certain parameters, e.g., a student (subject)
Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, :269: Research Methods I Dr. Leonard March 31, 2010.
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
JDS Special Program: Pre-training1 Basic Statistics 01 Describing Data.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Univariate Data Analysis.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Biostatistics: Measures of Central Tendency and Variance in Medical Laboratory Settings Module 5 1.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
Chapter 2 Describing Data.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Describing Data Lesson 3. Psychology & Statistics n Goals of Psychology l Describe, predict, influence behavior & cognitive processes n Role of statistics.
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
INVESTIGATION 1.
Agenda Descriptive Statistics Measures of Spread - Variability.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Measures of Central Tendency: The Mean, Median, and Mode
Biostatistics, statistical software I. Basic statistical concepts Krisztina Boda PhD Department of Medical Informatics, University of Szeged.
Medical Statistics as a science
Chapter Eight: Using Statistics to Answer Questions.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
LIS 570 Summarising and presenting data - Univariate analysis.
Descriptive Statistics for one Variable. Variables and measurements A variable is a characteristic of an individual or object in which the researcher.
Chapter 2 Describing and Presenting a Distribution of Scores.
Descriptive Statistics(Summary and Variability measures)
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Descriptive Statistics Dr.Ladish Krishnan Sr.Lecturer of Community Medicine AIMST.
Descriptive Statistics Printing information at: Class website:
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Lecture 8 Data Analysis: Univariate Analysis and Data Description Research Methods and Statistics 1.
Doc.RNDr.Iveta Bedáňová, Ph.D.
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
Topic 3: Measures of central tendency, dispersion and shape
Tips for exam 1- Complete all the exercises from the back of each chapter. 2- Make sure you re-do the ones you got wrong! 3- Just before the exam, re-read.
Chapter 5 STATISTICS (PART 1).
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Description of Data (Summary and Variability measures)
Introduction to Statistics
Basic Statistical Terms
MEASURES OF CENTRAL TENDENCY
Univariate Statistics
Chapter Nine: Using Statistics to Answer Questions
Descriptive Statistics
Basic Biostatistics Measures of central tendency and dispersion
Presentation transcript:

Session III Introduction to Basic Data Analysis Dr. L. Jeyaseelan Professor Department of Biostatistics Christian Medical College, Vellore.

About Statistics Class: Some one said “ If I had only one day to live, I would live it in my statistics class -

About Statistics Class (Contd..): “it would seem so much longer”

What is Statistics? A science of: Collecting numerical information (data) Evaluating the numerical information (classify, summarize, organize, analyze) Drawing conclusions based on evaluation

Statistical Applications Descriptive Statistics Summarizes or describes the data set at hand. Evaluate the data set for patterns and reduce information to a convenient form. Inferential Statistics Use sample data to make estimates or predictions about a larger set of data.

Types of Data Qualitative DataQuantitative Data Nominal Ordinal Discrete Continuous IntervalRatio

Terms Describing Data Quantitative Data: There is a natural numeric scale (can be subdivided into interval and ratio data) Example:- age, height, weight Qualitative Data: Measuring a characteristic for which there is no natural numeric scale (can be subdivided into nominal and ordinal data) Example:- Gender, Eye color

Quantitative data Discrete Data : Values are distinct and separate. Values are invariably whole numbers. Example: Number of children in a family. Continuous Data : Those which have uninterrupted range of values. Can assume either integral or fractional values. Example : Height, Weight, Age

Scales of Measurement (Qualitative data) Nominal data : To classify characteristics of people, objects or events into categories. Example: Gender (Male / Female). Ordinal data (Ranking scale) : Characteristics can be put into ordered categories. Example: Socio-economic status (Low/ Medium/ High).

DESCRIPTIVE STATISTICS

Descriptive Statistics Measures of central tendency are statistics that summarize a distribution of scores by reporting the most typical or representative value of the distribution. Measures of dispersion are statistics that indicate the amount of variety or heterogeneity in a distribution of scores.

Descriptive Statistics Measures of Central Tendency –Mean –Median –Mode Measures of Dispersion –Range –Variance –Standard Deviation

Mean: Single value that could describes the characteristics of the entire data Most representative Arithmetic mean or average Mean birth weight, mean DBP

Merits: Easy to Understand and compute Based on the value of every item in the series Limitations: Affected by extreme values Not useful for the study of qualities like intelligence, honesty and character

Computing Mean - Sample Problem Consider the number of children in 6 families. In the first family there are 4 children, in the second there are 2, in the third 5, in fourth & fifth 3, and in the sixth, 4. Find average number of children per family. Step 1: Summing the scores ie., = 21 Step 2: Dividing by the number of families ie., 21 ÷ 6 = 3.5

Interpretation: The average number of children per family is 3.5

Median: Arrange the data in ascending or descending order. Middle value is median. Not influenced by extreme values Unique and easy to calculate More appropriate when the measure is Duration (survival), age etc

Computing the Median To compute the median, we sort the values from low to high. The median is the middle score. If the number of cases in the sample is an odd number, the middle case is the case above and below which the same number of cases occur. ( e.g ) If the number of cases in the sample is an even number, there will be two middle scores and the median is halfway between these two middle scores. (e.g )

Mode: Most commonly occurring observation. Not Unique. Not very frequently used. Used in investigation of an epidemic.

Computing the Mode The mode can be read directly from the frequency distribution table. The mode for Race is the category 5 = White which has the largest frequency (231).

Is that Enough? Mean, Median and Mode

Example: Two sleep producing drugs were administered for two group of patients. Drug A: 6,2,4,3,5,2mean= 3.7 hours Drug B: 1,6,7,1,2,6mean= 3.7 hours

How do we measure the variability? 1. Measure the deviance from mean for each observation. Example:4, 5, 3 Mean = 12/3 = 4 x i - Mean x = 0 x = 1 x = -1

2. Square the deviance to get rid of the sign problem and find the total (sum). Example:4, 5, 3 Mean = 12/3 = 4 x i - Mean(x i - Mean) 2 x = 00 x = 11 x = -11 Total 02

3. Find the average of all deviance:  (x i - Mean) 2 Variance = n 2 = =.66 3 Standard Deviation =  var = .66 =.81

Variance or Standard Deviation: On an average, how far each and every observation deviates from the mean. About the study itself.

Standard Error: Sample mean is an estimate of the population mean. Mean birth weight of 100 babies is 2700g (sd=200). Can we say that the population mean is also 2700g? Uncertainty associated with our estimate 2700g How do we measure the uncertainty ?

Standard Error (contd..): Take many samples of same size from the population. Assess the variability of such means. These means follow Normal distribution. Mean of these means is the population mean. This variability can be estimated from a single study. SE =  /  n

Distributions of 16 samples of size 50 from the Normal distribution.

Normal Distribution Bell shaped Symmetrical about its mean Mean, Median and Mode are same Total area is one square unit

Point Estimate The prevalence of HIV in Tamil Nadu was 1.8% in 1998 and.7% in 2003.

As a special honey moon offer we will provide you a double Bed room at the cost of a single room.

Confidence Interval: Means of different samples follows normal distribution. Mean ± 1.96 SD covers 95% of the area. These limits which will cover population mean. 5% of the time these limits may not cover the population mean.

Scatter Plot of 95% CI: Confidence intervals for mean serum albumin constructed from 100 random samples of size 25. The vertical lines show the range within which 95% of sample means are expected to fall.

The Distribution of Data (Rule of thumb) The statistical & clinical applications of the term “normal” are often confused and vague SD>1/2 mean Skewed/Non-normal data Note: Applicable only for variable where negative values are impossible Altman BMJ1991.

Comparison of TLV by Ultrasound and BSA (Western Population)

Data described with a SD that exceeds one- half the mean are non-normally distributed (assuming that negative values are impossible) and should be described with the median and range/interquartile range Subtracting the median from the mean produces a crude estimate of the skewness of the data: The larger the difference, the greater the skewness Contd..

The terms “standard error” and “Standard deviation” are often confused. The contrast between these two terms reflects the important distinction between data description and precision/inference. SD: Is a measure of variability and explains how widely scattered some measurements are in a group. SE: Applicable for large samples & indicates the uncertainly around the estimate of the mean measurement. Presentation of Summary Statistics : SD or SE

Standard Deviation Description of data: Example: If the mean weight of a sample of 100 men is 72kg and the SD is 8kg. Assuming normal distribution 68% of the men are expected to weigh between 64 and 80kg.

Standard Error 72kg is also the best estimate of the mean weight of all men in the population. How precise is the estimate 72kg?. While testing hypothesis, Difference in mean or proportions between groups.

Characteristic3 day treatment (n=1095) 5 day treatment (n=1093) Mean (SD) Age (months)17.0(13.3)16.9(13.0) Mean (SD) height (cm)74.8(10.98)74.8(10.75) Mean (SD) weight (kg)8.7(2.49)8.7(2.4) Mean (SD)duration of illness days)4.7(3.43)4.5(3.12) Mean (SD) temperature ( o C)37.1(0.66)37.2(0.67) Mean (SD) respiratory rate (breath / minute): 2 – 11 months old 12 – 59 months old (5.02) (5.58) (4.54) (6.1) Male685(62.6)676(61.8) Age (months): 2 – – (43.7) (56.3) (43.5) (56.5) Weight for height z score*: -2 to (27.4) (17.2) (27.7) (16.7) Table1: Baseline characteristics of 2188 children with non-severe pneumonia randomised to 3 days or 5 days of treatment with amoxicillin. Values are numbers (percentages) of patients unless stated otherwise

Characteristic3 day treatment (n=1095) 5 day treatment (n=1093) Duration of illness (days):  3  (49.1) (50.9) (49.4) (50.6) Fever833(76.1)850(77.8) Cough1081(98.7)1078(98.6) Difficulty in breathing417(38.1)387(35.4) Vomiting135(12.3)141(12.9) Diahorrea71(6.5)55(5.0) Excess respiratory rate (breaths / minute)  10  (82.5) (17.5) (80.6) (19.4) Wheeze present140(12.8)147(13.4) Adherence to treatment: At day 3 At day (94.2) (85.6) (93.9) (84.9) RSV Positive252(23.0)261(23.9) Table1 (Cont….) *Z score given as number of standard deviations from normal value. †Rate above the age specific cut off RSV=respiratory syncytial virus.

ISCAP Study Group BMJ 2004;328;791

THANKS