Bios 101 Lecture 4: Descriptive Statistics Shankar Viswanathan, DrPH. Division of Biostatistics Department of Epidemiology and Population Health Albert.

Slides:



Advertisements
Similar presentations
Measures of Dispersion
Advertisements

Introduction to Summary Statistics
Introduction to Data Analysis
Descriptive Statistics
Introduction to Biostatistics. Biostatistics The application of statistics to a wide range of topics in biology including medicine.statisticsbiology.
Statistics.
QUANTITATIVE DATA ANALYSIS
Descriptive Statistics Chapter 3 Numerical Scales Nominal scale-Uses numbers for identification (student ID numbers) Ordinal scale- Uses numbers for.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Descriptive Statistics
Measures of Dispersion CJ 526 Statistical Analysis in Criminal Justice.
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Intro to Descriptive Statistics
Biostatistics Unit 2 Descriptive Biostatistics 1.
Introduction to Educational Statistics
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.
Central Tendency & Variability Dec. 7. Central Tendency Summarizing the characteristics of data Provide common reference point for comparing two groups.
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Central Tendency and Variability
Measures of Central Tendency
Descriptive Statistics Healey Chapters 3 and 4 (1e) or Ch. 3 (2/3e)
Describing Data: Numerical
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 2 NUMERICAL DATA REPRESENTATION.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Descriptive Statistics Anwar Ahmad. Central Tendency- Measure of location Measures descriptive of a typical or representative value in a group of observations.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Univariate Data Analysis.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
Chapter 2 Describing Data.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Descriptive Statistics, The Normal Distribution, and Standardization.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Skewness & Kurtosis: Reference
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
INVESTIGATION 1.
Agenda Descriptive Statistics Measures of Spread - Variability.
Introduction to Statistics Santosh Kumar Director (iCISA)
Chapter Eight: Using Statistics to Answer Questions.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
BASIC STATISTICAL CONCEPTS Chapter Three. CHAPTER OBJECTIVES Scales of Measurement Measures of central tendency (mean, median, mode) Frequency distribution.
IE(DS)1 Descriptive Statistics Data - Quantitative observation of Behavior What do numbers mean? If we call one thing 1 and another thing 2 what do we.
Statistical Analysis of Data. What is a Statistic???? Population Sample Parameter: value that describes a population Statistic: a value that describes.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
LIS 570 Summarising and presenting data - Univariate analysis.
Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University.
Descriptive Statistics(Summary and Variability measures)
Descriptive Statistics Dr.Ladish Krishnan Sr.Lecturer of Community Medicine AIMST.
Descriptive Statistics Printing information at: Class website:
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Statistics -Descriptive statistics 2013/09/30. Descriptive statistics Numerical measures of location, dispersion, shape, and association are also used.
Lecture 8 Data Analysis: Univariate Analysis and Data Description Research Methods and Statistics 1.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 18.
Statistical Methods Michael J. Watts
Doc.RNDr.Iveta Bedáňová, Ph.D.
Statistical Methods Michael J. Watts
Topic 3: Measures of central tendency, dispersion and shape
Measures of Central Tendency
Central Tendency and Variability
Descriptive Statistics
Description of Data (Summary and Variability measures)
Descriptive Statistics
Central tendency and spread
Descriptive Statistics
Basic Statistical Terms
Univariate Statistics
Chapter Nine: Using Statistics to Answer Questions
Presentation transcript:

Bios 101 Lecture 4: Descriptive Statistics Shankar Viswanathan, DrPH. Division of Biostatistics Department of Epidemiology and Population Health Albert Einstein College of Medicine, NY October 25, 2011

Definition Statistics is a science of variation. – Involves: collection, classification, analysis, and interpretation – Biostatistics is a segment of statistics that deals with data arising from biological sciences especially medicine and population based experiments

Variables IndependentDependent Scales Nominal E.g. Sex Race Study Group Ordinal E.g. Severity of disease Attitude Birth order Continuous E.g. Height, Age Forced expiratory volume

Measurement scales Nominal : Numbers or text representing unordered categories (e.g., 0=male, 1=female) Ordinal : Numbers or text representing categories where order counts (e.g., Grade of cancer 1= Grade I, 2= Grade II, Grade III) Continuous: Numerical data where any conceivable value is, in theory, attainable (e.g., height, weight, FEV etc.)

Summary of Data Two distinct step in processing the data – to describe the sample by means of descriptive statistics – to infer that the results observed can be generalized to other samples or population (inferential statistic) Descriptive measures: – Nominal/ordinal: Frequencies, Percentages, Proportions – Continuous: Measures of location, Measure of Spread

Descriptive Statistics Graphical and numerical approaches to summarizing data Measure of location -Arithmetic mean -Median -Mode

Measures of location Arithmetic mean: most frequently used measure of location The mean is calculated by summing all the observations in a set of data and dividing by the total number of observations

Example: Mean Example: Listed are the initial measurements of forced expiratory volume in 1 second for the 13 subjects involved in the study. Subject FEV 1 (liters) > fev<-c( 2.30, 2.15, 3.50,2.60, 2.75, 2.82, 4.05, 2.25, 2.68,3.00, 4.02, 2.85, 3.38) > mean(fev)

Measures of location Median: defined as the 50 th percentile of set of measurements – a list of observations is ranked from the smallest to the largest, then half the values would be greater than or equal to the median, whereas the other half would be less than or equal to it If n is even, the median is the average of two middle most values.

Example: Median Example: Listed are the initial measurements of forced expiratory volume in 1 second for the 13 subjects involved in the study. Arrange the data in ascending order 2.15, 2.25, 2.30, 2.60, 2.68, 2.75, 2.82, 2.85, 3.00, 3.38, 3.50, 4.02, 4.05 Find the [ (n+1)/2]th value i.e. [ (13+1)/2]th= 7 value. 2.15, 2.25, 2.30, 2.60, 2.68, 2.75, 2.82, 2.85, 3.00, 3.38, 3.50, 4.02, 4.05 Subject FEV 1 (liters) >median(fev) >fev1<-fev[1:12] >sort(fev1) >median(fev1)

Measures of location Mode: used as a summary measure of all types of data. The mode of set of values is the observation that occurs most frequently. Mostly used for nominal scale variables.

Measures of Dispersion or Spread Most common measures of spread (variability) of the data are – Variance – Standard deviation – Range – Interquartile range

Measures of Spread Variance: is the average of the square deviations of the observations from the mean Standard deviation: is given by the square root of the variance. It is attractive, because it is expressed in the same units as the mean > var(fev) > sd(fev) > sqrt(var(fev))

Measures of Spread Range : The range is defined as the difference between the largest and the smallest observations, also can be represented by (minimum, max) – FEV data: Range= = 1.90 liters Interquartile range: is calculated by subtracting the 25 th percentile data from the 75 th percentile data; it encompasses the middle 50% of the data. – 25 th percentile = [(n+1)/4]th value – 75 th percentile = [3(n+1)/4]th value – FEV data: Interquartile range = 3.38 – 2.60=0.78 liters  range(fev)  Stem(fev) > summary (fev) > boxplot(fev) > boxplot(fev,main="FEV data", horizontal=TRUE)

Standard deviation Vs. Standard Error Standard Error: the standard deviation of the distribution of sample means is known as standard error. is calculated by dividing the standard deviation of the sample by square root of the number of observations.

Guidelines for reporting descriptive statistics  Report all numbers with the appropriate degree of precision (two digits)  When reporting percentages, always give the numerators and denominators of the calculation.  Specify the denominators of rates, ratios, proportions and percentages  Provide appropriate measures of central tendency and dispersion  Approximately normally distributed data-mean, SD  Other distributions-median, range, Interquartile range  Do NOT summarize continuous data with the mean and the standard error of the mean  Avoid using percentages to summarize small samples Lang TA and Secic M, 2006