Data: Presentation and Description Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.
Measures of Dispersion
EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES
1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.
1 Chapter 1: Sampling and Descriptive Statistics.
Introduction to Biostatistics. Biostatistics The application of statistics to a wide range of topics in biology including medicine.statisticsbiology.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
ISE 261 PROBABILISTIC SYSTEMS. Chapter One Descriptive Statistics.
Summarising and presenting data
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Descriptive statistics (Part I)
Very Basic Statistics.
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Statistical Techniques in Hospital Management QUA 537
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Descriptive Statistics, Part Two Farrokh Alemi, Ph.D. Kashif Haqqi, M.D.
Frequency Distributions and Graphs
Describing distributions with numbers
Data Handling Collecting Data Learning Outcomes  Understand terms: sample, population, discrete, continuous and variable  Understand the need for different.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Chapter 1 Exploring Data
Data: Presentation and Description Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 3 Organizing and Displaying Data.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Descriptive Statistics F. Farrokhyar, MPhil, PhD, PDoc Department of Surgery Department of Clinical Epidemiology and Biostatistics March 18, 2009.
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
Methods for Describing Sets of Data
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
PPA 501 – Analytical Methods in Administration Lecture 5a - Counting and Charting Responses.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Chapter 2 Describing Data.
Data Analysis Qualitative Data Data that when collected is descriptive in nature: Eye colour, Hair colour Quantitative Data Data that when collected is.
Basic Statistics  Statistics in Engineering  Collecting Engineering Data  Data Summary and Presentation  Probability Distributions - Discrete Probability.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Descriptive Statistics1 LSSG Green Belt Training Descriptive Statistics.
Lecture 3 Describing Data Using Numerical Measures.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
Descriptive statistics Petter Mostad Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.
BUSINESS STATISTICS I Descriptive Statistics & Data Collection.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
LIS 570 Summarising and presenting data - Univariate analysis.
1 Never let time idle away aimlessly.. 2 Chapters 1, 2: Turning Data into Information Types of data Displaying distributions Describing distributions.
Plan for Today: Chapter 11: Displaying Distributions with Graphs Chapter 12: Describing Distributions with Numbers.
CHAPTER 1 Basic Statistics Statistics in Engineering
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
Data Presentation Numerical Summary Measures Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU.
Descriptive Statistics
Exploratory Data Analysis
Methods for Describing Sets of Data
EMPA Statistical Analysis
Exploring Data Descriptive Data
ISE 261 PROBABILISTIC SYSTEMS
Data: Presentation and Description
Chapter 3 Describing Data Using Numerical Measures
4. Interpreting sets of data
How could data be used in an EPQ?
Description of Data (Summary and Variability measures)
Chapter 3 Describing Data Using Numerical Measures
Descriptive Statistics
Describing Distributions of Data
Statistics: The Interpretation of Data
Honors Statistics Review Chapters 4 - 5
Review for Exam 1 Ch 1-5 Ch 1-3 Descriptive Statistics
Probability and Statistics
Presentation transcript:

Data: Presentation and Description Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research

Overview What is Data? What is Data? Summarising data Summarising data Displaying data Displaying data SPSS SPSS

Why have you collected data? Most important question! Most important question! Related to testing hypotheses Related to testing hypotheses If you have not got any hypotheses – Get some! If you have not got any hypotheses – Get some! Return to later Return to later

DATA – Where from? All data is a Sample – a subset of population All data is a Sample – a subset of population How was it collected? How was it collected? Potential for bias? Potential for bias? What does it represent? What does it represent?

Extrapolating from the sample to population Illustrations Ian Christie, Orthopaedic & Trauma Surgery, Copyright 2002 University of Dundee

Quantitative Data? Observation or measurement of one or more variables Observation or measurement of one or more variables Variable is any quantity measured on a scale Variable is any quantity measured on a scale Unit of analysis can be person, group (e.g. practice, specimen, cell, time……..) Unit of analysis can be person, group (e.g. practice, specimen, cell, time……..) Multilevel – patient and practice Multilevel – patient and practice

Cross-classified 3 level multilevel model Practice level j Patient level i Hospital k

Statistics Statistics encompasses - 1. Design of study; 2. Methods of collecting, and summarising data; 3. Analysing and drawing appropriate conclusions from data

Variable types Categorical (qualitative) Categorical (qualitative) –E.g. type of drug, eye colour, smoking status Numerical (quantitative) Numerical (quantitative) –E.g. age, birth weight, BP

Categorical Nominal Nominal Categories are mutually exclusive and unordered Eg Blood group type (A/B/AB/O) Ordinal Ordinal Categories are mutually exclusive and ordered Eg Disease stage (mild/moderate/ severe) Binary - two categories (yes, no)

Numerical Discrete Integer values, often counts Eg number of cigarettes smoked, No. days in hospital Continuous Continuous Takes any value in a range of values Eg Height in m, cholesterol, creatinine

Organisation of data Generally each variable in separate columns and one row per subject SubjectAgeGenderScore

1 st step in analysis? Look at the data!

Display and summarise data To get a feel for the data To get a feel for the data To spot errors and missing data To spot errors and missing data Assess the range of values Assess the range of values Also.. Also..

Summarising Categorical data 1. Campylobactor21. Giardia 2. Campylobactor22. Crytosporidium 3. Escherichia coli Crytosporidium 4. Shigella sonnei24. Campylobactor 5. Crytosporidium25. Shigella sonnei 6. Giardia26. SRSV 7. Crytosporidium27. Crytosporidium 8. Campylobactor28. Campylobactor 9. Campylobactor29. Giardia 10. Crytosporidium30. Giardia 11. Giardia31. Escherichia coli Shigella sonnei32. Shigella sonnei 13. SRSV33. Crytosporidium 14. Giardia34. SRSV 15. Escherichia coli Campylobactor 16. Campylobactor36. Campylobactor 17. Giardia37. Campylobactor 18. SRSV38. Giardia 19. Campylobactor39. Escherichia coli Crytosporidium40. Campylobactor

Infection N (%) Campylobactor 12 (30.0) Cryptosporidium 9 (22.5) Giardia 8 (20.0) SRSV 5 (12.5) Escherichia coli (7.5) Shigella Total Total 40 (100) Summarised by frequencies or percentage

Numerical data Frequency distributions for continuous variable can be unfeasibly large Frequency distributions for continuous variable can be unfeasibly large Grouping may be necessary for presentation Grouping may be necessary for presentation

Age group (years)Frequency Relative Frequency (%) Cumulative relative frequency (%) Frequency distribution for continuous variable

Baseline measure cholesterol N (%) (3.1) (3.0) (2.9) (3.9) (3.6) (4.8) (5.2) (5.9) (5.6) (5.0) (4.1) (3.9) (4.7) (4.4) (4.5) (4.5) (4.2) (3.6)

Baseline groupN (%) 4.0 to (16.5) 4.5 to (26.5) 5.0 to (21.6) 5.5 to (20.3) 6.0 to (15.1) Total Total 1677

Guide for grouping data Obtain min and max values Obtain min and max values Choose between 5 and 15 intervals Choose between 5 and 15 intervals Summarise but not obscure data especially continuous data Summarise but not obscure data especially continuous data Intervals of equal width Intervals of equal width – Good but not essential – Remember to label tables!

Take care with missing values SPSS gives % missing in output if missing left blank in data SPSS gives % missing in output if missing left blank in data Careful in reporting % as percentage of observed values or percentage of all subjects Careful in reporting % as percentage of observed values or percentage of all subjects These will differ! These will differ! Can use missing code (often 9) to make missing explicit in output Can use missing code (often 9) to make missing explicit in output

Graphs Simplicity Simplicity Consistency Consistency Not duplicating tables or text Not duplicating tables or text Remember Title Remember Title Remember Label axes Remember Label axes

Graphs – Categorical data Bar charts Bar charts Pie charts Pie charts

Bar charts Used to display categorical (or discrete numerical data) Used to display categorical (or discrete numerical data) One bar per category One bar per category Height of bar equals its frequency Height of bar equals its frequency Each bar same width and equally spaced Each bar same width and equally spaced Space between each bar Space between each bar Vertical axis must start at zero Vertical axis must start at zero

Most common cancer deaths in UK, 2009 Plots and Statistics from CRUK website

Pie charts Displays one variable only Displays one variable only Compare 2 groups using 2 charts Compare 2 groups using 2 charts

But avoid 3-dimensional plots!

Graphs – Numerical data Histograms Histograms Frequency polygon Frequency polygon Scatter plots Scatter plots Box plots Box plots

Histograms Like bar charts but no spaces Like bar charts but no spaces y axis always begins at zero y axis always begins at zero Area of bar represents the frequency in each group Area of bar represents the frequency in each group

Check data carefully

Florence Nightingale’s ‘Coxcomb’ diagram of Mortality in the Crimea War

Summary measures – Numerical data Central Location (average) Central Location (average) Spread or variability (distance of each data point from the average) Spread or variability (distance of each data point from the average)

Central Location Mean Mean Median Median Mode - most frequent value Mode - most frequent value

Mean _ x = x 1 + x 2 +x 3 + ….. + x n N Often written as ∑x i / N Where Sigma or ∑ is ‘Sum of’

_ x = = 3.00

Mean Advantages Advantages – Uses all data values – Very amenable to statistical analysis; most models use the mean Disadvantages (advantages to politicians and estate agents!) Disadvantages (advantages to politicians and estate agents!) – Distorted by outliers – Distorted by skewed data

Median Arrange values in increasing order Median is the middle value Easy if odd number of values, for even number: [ ] Median = = 2.96 litres 2

Median Advantages Advantages – Not distorted by outliers – Not distorted by skewed data Disadvantages Disadvantages – Ignores most of the information – Less amenable to statistical modelling

Measures of spread [47 52] Mean = 51.3 Median = [51 51] Mean = 51.3 Median = 51

Range [47 52] Range or [51 51] Range or 5

Range from percentiles Data ordered from smallest to largest value; then divide into equal chunks: Data ordered from smallest to largest value; then divide into equal chunks: Percentiles Percentiles Deciles –data in equal 10ths Deciles –data in equal 10ths Quartiles = data in equal 4ths Quartiles = data in equal 4ths

Interquartile range (IQR) Data is ordered into quartiles: | | | (lower quartile) 32.5 (upper quartile) Interquartile range (IQR) = = 24.5

Median Range IQR IQR in Multiple Box-plots Upper Quartile Lower Quartile Outlier

Distribution of data values around the mean MEAN MEAN

Variance mean=34.16 years _ (x-x) – – – – –

Variance mean=34.16 _ _ (x-x) (x-x)

Variance (s 2 ) _ S 2 =  (x-x) 2 n-1 S 2 = S 2 =182.16

Mean = years Variance = 182.2

Standard deviation (s) Standard deviation (s) _ Std deviation (s) = √  (x-x) 2 n-1 Std deviation = √ = 13.49

Mean = years SD = Coefficient of Variation (CV) = SD / Mean = 0.39 Measure of variability for comparison of different scales

Which central measure goes with which measure of spread? Mean (SD) Mean (SD) Median (IQR or Range) Median (IQR or Range)

Summary Summary Do not underestimate value of looking at the data Do not underestimate value of looking at the data Gives a feel for the data before testing or modelling Gives a feel for the data before testing or modelling Check for missing data Check for missing data Check for outliers Check for outliers