Ana Jerončić, PhD Department for Research in Biomedicine and Health
Location: main building, 5th floor, room 512 Phone:
1.Describing data - Central tendency and variability 2.Estimation - Accuracy, precision, standard error, confidence intervals 3.Hypothesis testing - Test statistics, P-value, choice of a statistical test 4.Interpretation of data - Causality and association, odds ratio, risk, correlation, linear regression 5.Sources of error - Type 1 and type 2 errors, power, bias, confounding
Critical appraisal of scientific papers NOT! Implementation of data analysis
To identify the best available treatment To prevent “medical zombies” To perform your own research
1.How the data should be organized prior to data analysis 2.Data types 3.Graphical & tabular techniques for description, summary statistics Qualitative Data Quantitative Data
Height measurements among 1st year medical students
the unit of measurement What is the unit of measurement ? observations per subject How many observations per subject ?
Entity Height (cm) Weight (kg) Age (years) Sex (category) Person 1 Person 2 Person 3 * * * * Male Female Male * OBSERVATIONSOBSERVATIONS VARIABLES Measurement/ Observation
VariableFeatures of variables ExampleDescriptive statistics Informativeness level Categorical, Nominal Unordered /unarranged categories Gender, urbanization Number, proportion Low OrdinalOrded/arranged categories Grades, scales MedianMedium NumericalArranged categories with equal intervals Height, weight Mean or median High
Categorical Nominal Qualitative Ordinal Numerical Quantitative
Height Grades Age in years Weight Insuline concentration Blood glucose
How many cigarettes do you smoke a day? 1-5 6-10 21 and more
Have you ever had a heart attack? Yes No Do you suffer from hypertension? Yes No ?
Gender: Male Female
Marital status: married divorced widowed single lives alone ?
Education: elementary school high school two-year college four-year college ?
Likert scale Claim: Violence among the youth is becoming an increasing problem in Croatia. I agree completely I agree Undecided I disagree I argue strongly against
Visually analogous scale E.g. pain level that examinee experiences I don’t feel pain I feel intolerable pain
NumericalDistance is meaningfull OrdinalAtributes can be ordered NominalAttributes are only named; weakest
Person No.Height [cm] Person 1148 Person 2142 Person 3154 Person 4153 Person 5160 Person 6177 Person 7204 Person 8192 Person 9191 Person Person Person Person 13177
Organized data are input for Graphical & Tabular data representations Person No.Height [cm] Person 1148 Person 2142 Person 3154 Person 4153 Person 5160 Person 6177 Person 7204 Person 8192 Person 9191 Person Person Person Person 13177
In one study researchers investigated genotype of the YPEL5 gene in a population sample from Split. They got the following results on 10 examinees : Individual YPEL5 Genotype 1AA 2B 3BB 4 5AB 6 7BB 8AA 9AB 10BB GenotypeFrequencyRelative Frequency Relative Frequency [%] AA20.220% AB30.330% BB50.550% Total % Table Frequency Distribution of YPEL5 genotypes proportionpercentage
Bar Charts are often used to display frequencies… categories’ names Counts Or Percentages
(84%) (16%) (100%) (19%) (81%) (100%)
The only allowable calculation => count the frequency of category. We can summarize the data in a contingency table that presents the categories and their counts called a frequency distribution. A relative frequency distribution lists the categories and the proportion with which each occurs.
Nominal data has no order. However, sometimes it is usefull to arrange the outcomes from the most frequently occurring to the least frequently occurring. We call this bar chart representation a “pareto chart” categories’ names counts
Chart with relative frequency is more informative categories’ names percentages
Pie Charts show relative frequencies…
Authors can use percentages to hide the true size of the data. To say that 50% of a sample has a certain condition when there are only four people in the sample is clearly not providing the same level of information as 50% of a sample based on 400 people. So, percentages should be used as an additional help for the reader rather than replacing the actual data
Height measurements among 1st year medical students IndividualHeight (cm) Frequency distribution for quantitative data: Building a Histogram
Category limits [cm]Freq. Relative Freq. Percent Relative Freq. >140;<=15030,1313% ,1313% ,1717% ,3030% ,2222% ,044% Total231,00100% Frequency distribution of height
There are several graphical methods that are used when the data are quantitative ( numeric). The most important of these graphical methods is the histogram. The histogram is not only a powerful graphical technique used to summarize interval data, but it is also used to help explain probabilities.
Qualitative Frequency Distribution – tabular summary of data Bar Chart Pie Chart Quantitative Frequency Distribution – tabular summary of data Histogram Line Chart (Time-Series Plot) Stem and Leaf Display
To compare two variables we use: Scatter plot/diagram (quantitative) Cross table (qualitative)
Scatter plot, showing the strong association between enzyme activity at pH 5.5 and the 5α-reductase 2-specific mRNA expression, as expressed on the basis of β-actin (n = 30; r s = 0.81; 95% confidence interval, 0.64–0.91; P < ).
Linearity and Direction are two concepts we are interested in Positive Linear RelationshipNegative Linear Relationship Weak or Non-Linear Relationship
Squamous cell carcinoma tumor and perilesional display distinctly different scatter plots from normal tissue. Expresion levels for gene subset 1 in patient 1
Used to compare two qualitative variables If first variable has r categories, second variable c categories, then we have an r × c cross table.
Disease X YESNOTOTAL YPEL5 Genotype AA202 AB134 BB044 TOTAL3710 Based on data presented do you think that YPEL5 could be associated with disease X?
Room 512 (5th floor)
The results of measuring the height among med. students IndividualHeight (cm) subjects Height [cm] subjects Height [cm]