Download presentation
Presentation is loading. Please wait.
Published byCora Susanna May Modified over 9 years ago
1
Descriptive Statistics F. Farrokhyar, MPhil, PhD, PDoc Department of Surgery Department of Clinical Epidemiology and Biostatistics March 18, 2009
2
Objectives To understand and recognize different types of variables To learn how to explore your data ◙How to display data with numbers and tables ◙How to display data using graphs To understand the fundamental concept of variability To learn the notion of the distribution of a variable
3
Why and how are statistics relevant to medicine? Prevention – What causes a disease? Diagnosis – What symptoms and signs do patients with a given disease present with? Treatment – What treatments are effective for a given disease and for which patients? Prognosis – How will specific patients with a given disease fare in the long term?
4
Statistics – Why do we need it?
5
B A E W D S A Q P B B W E O N F O H E E R D T TY E D T E Q O N E G G O L T S D G F E W G E G G V B A Y A O E E D Y H E J U E G D E T E W W E T H E F E O P L U M R HOW MANY ‘ E”’s ?
6
Descriptive and Inferential statistics? Descriptive statistics are concerned with the presentation, organization, and summarization of data Inferential statistics allow us the generalization from a sample to a larger group of subjects.
7
What is data? Data is collected for some purpose and each collected information have a meaning in some context. Data is a set of information or observation about a group of individuals or subjects. This information is organized in form of variables. A variable is any characteristic of a person or a subject that can be measured or categorized and its value varies from individual to individual.
8
Dependent and Independent Variables? Independent variable Is the explanatory variable that explains the changes in the dependent variable demographics (age, gender, height), risk factors (diabetes, CAD) Is the intervention or exposure that causes the changes in the dependent variable. drug, surgery, radiation, smoking … Dependent variable Is the outcome of interest, which changes in response to some intervention or exposure. mortality, survival, post-op pain, quality of life, post-op complications
9
Type of variables …? Qualitative or attribute variable Nonnumeric gender, severity of injury, type of injury, tumour grade Categorical variables… Quantitative variable Numeric Discrete variable can assume only whole numbers: number of accidents, number of injuries, pain score Continuous variable may take any value, within a defined range: weight, height, age, blood pressure, level of cholesterol, pain score
10
Level of measurement … There are four level of measurement: ◙Nominal ◙Ordinal ◙Interval ◙Ratio Qualitative/Categorical Quantitative/Numeric
11
Level of measurement … cont’d Variable type: ◙Nominal ◙Ordinal. ◙Interval. ◙Ratio Assumptions: ◙Named categories ◙Same as nominal plus ordered categories ◙Same as ordinal plus equal intervals ◙Same as interval plus meaningful zero
12
Level of measurement … cont’d A nominal variable: consists of named categories, with no implied order among the categories. - gender, mortality ---- dichotomous or binary - type of injury, type of fracture, blood type An ordinal variable: consists of ordered categories, where the differences between categories cannot be considered to be equal. - Tumour stage – I, II, III, IV, tumour grade – I II, III, IV - Likert scale – excellent, very good, good, fair, poor
13
Level of measurement … cont’d An interval variable: has equal distances between values with no meaningful ‘zero’ value. - IQ test (the differences between numbers are meaningful but the ratios between them are not) An ratio variable: has equal intervals between values and a meaningful zero point. The ratio between them makes sense. - height, weight, laboratory test values, age
14
Primary objective: To compare the post-operative pain between laparoscopic and open surgery in patients with colorectal cancer Secondary objective: To compare the post-operative complications between laparoscopic and open surgery in patients with colorectal cancer For example
15
Independent (Explanatory) variables: Age, Sex, Pre-op pain Severity Dependent/outcome variables: Changes in pain, Complication Independent (Comparison) variable
16
Data Editing Validity edits: Ensure that: essential fields have been completed and there are no missing information ◘specified units of measure have been properly used and the measurements are within the acceptable range. Duplication edits: Ensure that each case/patient have been entered into the database only once. Statistical edits: Identify and double check all the extreme values, suspicious data and outliers.
17
Descriptive Statistics … are a means of organizing and summarizing observations. We examine variables in order to describe their main features. It is the basic strategies that help us organize our exploration of a set of data: ◙Begin by examining each variable. ◙Examine the distribution of each variable by creating frequency tables, numerical summaries and graphs. ◙Study the relationships between the variables.
18
Examining Distributions: Categorical … Numbers Frequencies (counts), cumulative frequencies Relative frequencies (%), cumulative relative frequencies (%) Graphs Bar charts Pie charts
19
Cross-tabulation of categorical data
21
Examining Distributions: Categorical … Numbers Frequencies (counts), cumulative frequencies Relative frequencies (%), cumulative relative frequencies (%) Graphs Bar charts Pie charts
22
Bar Charts
24
Bar charts … A bar chart can be used to depict any levels of measurement ( nominal, ordinal, interval, or ratio). A series of separated bars (vertical or Horizontal), one per category. Bars represent frequency (counts) or relative frequency (percent or proportion) of each category. A Bar chart is also useful for showing data for more than one group.
25
Pie Charts
26
Pie charts … Used primarily for nominal and ordinal data. Used to display relative frequency distribution. The circle is divided proportionally using relative frequency of each category. A pie chart is useful for showing data for one group but it is useless for graphic illustration of two or more groups.
27
Examining Distributions: Quantitative … Numbers Measures of central tendency – mean, median, mode Measures of variation around mean – variance, standard deviation, standard error of mean Measures of variation around median – percentiles, quintiles, quartiles Graphs Histograms The five-number summary Box plots
28
Mean: sum of observations divided by number of observations Median: is a midpoint of a distribution after arranging all observations in order of size, from smallest to largest. Mode: most frequent value – the highest peak Measures of central tendency
29
Properties of mean … It is used for interval or ratio data. A set of data has only a mean. All values are included in the computation. It is the only measure of central tendency where the sum of deviations of each value from the mean will always be zero. The mean is a useful measures for comparing two or more sets of data. The mean is sensitive toward extreme values.
30
Properties of median … It is used for interval or ratio data. There is a unique median for each data set. The median is not necessarily equal to one of the sample values. It is resistant (insensitive) toward extreme values. It is useful for summarising skewed data.
31
Variance: the average of the squares of the deviations of the data from their mean Standard deviation : square root of variance Standard error : Measures of variation around mean
32
Properties of variance … All values are used on calculation. The units are not the same as data, they are the square of the original units.
33
Properties of standard deviation … The units are the same as data It is used for Empirical Rule. For any symmetrical distribution: ◘About 68% of the observations will lie within 1 s. d. of the mean. ◘About 95% of the observations will lie within 2 s. d. of the mean. ◘About 99.8% of the observations will lie within 3 s. d. of the mean.
34
The Empirical Rule
35
Measures of variation around median Percentiles: Arrange the observations from smallest to largest. Divide into 100 equal parts; for example; the 5 th percentiles of a distribution is the value which 5% of the observations fall below and 95% fall above. Quartiles: 25 th, 50 th and 75 th percentiles Quintiles: 20 th, 40 th, 60 th, and 80 th percentiles Deciles: 10 th, 20 th, 30 th, 40 th, 50 th,……10 th percentiles
37
Examining Distributions: Quantitative … Numbers Measures of central tendency; mean, median, mode Measures of variation around mean – variance, standard deviation, standard error of mean Measures of variation around median – percentiles, quintiles, quartiles Graphs Histograms The five-number summary Boxplot
38
Histogram Outliers??
39
Histograms … Used for interval and ratio data. A histogram is a graph in which each bar (horizontal axis) represent a range of numbers called interval width. The vertical axis represents the frequency of each interval. There are no spaces between bars. Histogram is useful for graphic illustration of one group.
40
Box plot: 5 – number summary Range = Max - Min IQR = Q3 – Q1 Whiskers Q3 Q1 Median/Q2 Inner fence Outliers 1 st 100 th
41
Box plot of change in pain score
42
Box Plots … Used for interval and ratio data. Uses the five-number summary measures Median, Q1, Q3, minimum and maximum. It is useful in detecting outliers It is useful to illustrate the distribution of more than on group.
43
What are outliers … ? Outliers are extreme data values that fall outside of distribution of the data set.
44
Box plot: 5 – number summary IQR = Q3 – Q1 Whiskers Q3 Q1 Median/Q2 Inner fence 1 st 100 th
45
1.5 IQR Criterion for Outliers Interquartile range (IQR) is the distance between the first and third quartiles. IQR = Q 3 – Q 1 From data Q 1 = 59 yrs, Q 3 = 70 yrs, IQR = 70 – 59 = 11 1.5 IQR = 1.5 11 = 16.5 Q 1 – IQR = 59 – 16.5 = 42.5 Q 3 + IQR = 70 + 16.5 = 86.5 From data: Min= 44 and Max = 82
46
Properties of quartiles, quintiles… It is used for interval or ratio data. It is resistant (insensitive) to extreme values. It is useful for summarising skewed data.
47
How to deal with skewed data Transform the data: Square/square root – (Poisson) count data Log(x) or ln(x) – data is skewed toward right Reciprocal (1/X) - data is skewed toward left Transformation: Make skewed data more symmetric Makes distribution more normal Stabilize variability Liberalize a relationship between two or more variables Show summary stat in original but analyse on the transformed data
48
Summary of what we have learned …. Always plot your data: make a graph, e.i. histogram, box plot Look for overall pattern (shape, centre and spread) and for striking deviations such as outliers Check to see if overall pattern of distribution can be described by normal distribution. If not uniform, transform data to make skewed data more symmetric Calculate an appropriate numerical summary to describe centre and spread
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.