Community Dental Health Review Algonquin College - Jan Ladas
Algonquin College - Jan Ladas Statistics Statistics is the field of study which concerns itself with the art and science of data analysis: Planning, collecting, organizing, analyzing, interpreting, summarizing and presenting the data Statistics, when used in the plural form, refers to the specific bits of data which either have been or are about to be gathered. Algonquin College - Jan Ladas
Introduction To BIOSTATISTICS The mathematics of collection, organization and interpretation of numeric data having to do with living organisms. Techniques to manage data: Descriptive Inferential Algonquin College - Jan Ladas
Algonquin College - Jan Ladas Facts About Data Two types of data: Qualitative: labels used to identify an item when it cannot be numerically identified. e.g.: marital status, car colour, occupation (attributes) n.b.: has absolutely nothing to do with the quality of the data Quantitative: characteristics that can be expressed numerically. Any mathematical manipulation that is carried out on them will have meaning. e.g.: height, length, volume, number of DMFT’s (variates) Algonquin College - Jan Ladas
Algonquin College - Jan Ladas Data Management Grouping data to make it easier to understand. Descriptive Technique: Used to describe and summarize a set of numerical data Tabular and graphical methods Apply to generalizations made about the group studied Algonquin College - Jan Ladas
Descriptive Data Display Types An Array: A group of scores arranged from lowest to highest in value. e.g.: Histology test results – 24 students: 19 28 30 44 41 25 33 39 49 42 38 26 35 40 31 36 46 = Raw Data Array: 19, 25, 26, 28, 30, 30, 31, 33, 33, 35, 36, 38, 38, 38, 39, 40, 41, 41, 41, 42, 44, 44, 46, 49 / 50 total Algonquin College - Jan Ladas
Descriptive Data Display Types Arrays are bulky and hard to read, thus an alternative is: Frequency Distribution: An organization of scores from lowest to highest which includes the number of times each score value occurs in the data set. Algonquin College - Jan Ladas
Descriptive Data Display Types Frequency Distribution – 3 Types: Ungrouped Each possible score value of the variable being measured is represented in the display and the frequency of occurrence of the value is recorded. Sample: Algonquin College - Jan Ladas
Descriptive Data Display Types Frequency Distribution – Ungrouped: Score F 50 40 1 30 2 49 39 29 48 38 3 28 47 37 27 46 36 26 45 35 25 Algonquin College - Jan Ladas
Descriptive Data Display Types 2. Grouped Frequency Distribution: When a broad range of values on the measurement is possible (i.e. > 30), the range is collapsed by grouping scores together into smaller value ranges. Scores Grouped Cumulative 16-20 1 21-25 2 26-30 4 6 31-35 3 9 36-40 15 41-45 7 22 46-50 24 Algonquin College - Jan Ladas
Descriptive Data Display Types 3. Cumulative Frequency Distribution: Used with score groupings where the frequency of any one group includes all instances of scores in that group plus all the groups of lower score values. Scores Grouped Cumulative 16-20 1 21-25 2 26-30 4 6 31-35 3 9 36-40 15 41-45 7 22 46-50 24 Algonquin College - Jan Ladas
Algonquin College - Jan Ladas Central Tendency Term in statistics that describes where the data set is located. Measures of Central Tendency Used to describe what is typical in the sample group based on the data gathered. Three Main Indicators: Mean - Median - Mode Algonquin College - Jan Ladas
Algonquin College - Jan Ladas Central Tendency Mean = arithmetic average of scores Mean symbol is ( x ) Scores are all added then divided by the number of scores. The most common measure: Data set {3, 7, 9, 4, 9, 16} = 48 / 6 = 8 Algonquin College - Jan Ladas
Algonquin College - Jan Ladas Central Tendency Median: Is the point that divides the distribution of scores into 2 equal parts – 50 / 50 With odd set of numbers, median is the datum in the middle: i.e.: {3, 7, 2, 5, 9} rearranged to {2, 3, 5, 7, 9} median = 5 With even set of numbers, median is the average of the two middle values: i.e.: {4, 7, 1, 3, 8, 2} rearranged to {1, 2, 3, 4, 7, 8} 3 + 4 = 7 / 2 median = 3.5 Algonquin College - Jan Ladas
Algonquin College - Jan Ladas Central Tendency Mode: Is the most frequently occurring score in a distribution: i.e.: {4, 3, 4, 9, 7, 2} mode = 4 i.e.: {3, 8, 4, 2, 4, 9, 7, 4, 9, 1, 9} bimodal data set 4 and 9 Algonquin College - Jan Ladas
BIOSTATISTICS Continued Previously discussed: Descriptive statistical techniques The first measures of spread / central tendency Information about central tendency is important. Equally important is information about the spread of data in a set. Algonquin College - Jan Ladas
Variability/Dispersion Three terms associated with variability / dispersion: Range Variance Standard Deviation (They describe the spread around the central tendency) Algonquin College - Jan Ladas
Variability/Dispersion Range: The numerical difference between the highest and lowest scores Subtract the lowest score from the highest score i.e.: c = {19, 21, 73, 4, 102, 88} Range = 102 – 4 = 98 n.b.: easy to find but unreliable Algonquin College - Jan Ladas
Variability/Dispersion Variance: The measure of average deviation or spread of scores around the mean - Based on each score in the set Calculation: Obtain the mean of the distribution Subtract the mean from each score to obtain a deviation score Square each deviation score Add the squared deviation scores Divide the sum of the squared deviation scores by the number of subjects in the sample Algonquin College - Jan Ladas
Variability/Dispersion Standard Deviation of a set of scores is the positive square root of the variance - a number which tells how much the data is spread around its mean Interpretation of Variance and Standard Deviation is always equal to the square root of the variance “The greater the dispersion around the mean of the distribution, the greater the standard deviation and variance” Algonquin College - Jan Ladas
Algonquin College - Jan Ladas Normal Curve (Bell) A population distribution which appears very commonly in life science Bell-shaped curve that is symmetrical around the mean of the distribution Called “normal” because its shape occurs so often May vary from narrow (pointy) to wide (flat) distribution The mean of the distribution is the focal point from which all assumptions may be made Think in terms of percentages – easier to interpret the distribution Algonquin College - Jan Ladas
Algonquin College - Jan Ladas Research Techniques Inferential Statistics (Statistical Inference) Techniques used to provide a basis for generalizing about the probable characteristics of a large group when only a portion of the group is studied The mathematic result can be applied to larger population Algonquin College - Jan Ladas
Definitions Relating To Research Techniques Population: Entire group of people, items, materials, etc. with at least one basic defined characteristic in common Contains all subjects of interest A complete set of actual or potential observations e.g. all Ontario dentists or all brands of toothpaste Sample: A subset (representative portion) of the population Do not have exactly the same characteristics as the population but can be made truly representative by using probability sampling methods and by using an adequate sample size (5 types of “sampling”) Algonquin College - Jan Ladas
Definitions Relating To Research Techniques Parameters: Numerical descriptive measures of a population obtained by collecting a specific piece of information from each member of the population Number inferred from sample statistics E.G.: 2,000 women over age 50 with heart disease Algonquin College - Jan Ladas
Definitions Relating To Research Techniques Statistic: A number describing a sample characteristic. Results from manipulation of sample data according to certain specified procedures A characteristic of a sample chosen for study from the larger population e.g.: 210 women out of 500 with diabetes have heart problems Algonquin College - Jan Ladas
Algonquin College - Jan Ladas Sampling Procedures 5 Types of Samples: A random sample – by chance A stratified sample – categorized then random A systematic sample – every nth item A judgment sample – prior knowledge A convenience sample – readily available Algonquin College - Jan Ladas
Concept Of Significance Probability – P (symbol) When using inferential statistics, we often deal with statistical probability. The expected relative frequency of a particular outcome by chance or likelihood of something occurring Coin toss Algonquin College - Jan Ladas
Algonquin College - Jan Ladas Probability Rules of probability: The (P) of any one event occurring is some value from 0 to 1 inclusive The sum of all possible events in an experiment must equal 1 * Numerical values can never be negative nor greater than 1 0 = non event P 1 = event will always happen Algonquin College - Jan Ladas
Algonquin College - Jan Ladas Probability Calculating probability: Number of possible successful outcomes / Number of all possible outcomes E.G.: Coin flip: 1 successful outcome of heads / 2 possible outcomes = P = .5 or 50% E.G.: Throw of dice 1 successful outcome / 6 possible outcomes = P = .17 or 16.6% Algonquin College - Jan Ladas
Algonquin College - Jan Ladas Hypothesis Testing The first step in determining statistical significance is to establish a hypothesis To answer questions about differences or to test credibility about a statement e.g.: ? – does brand X toothpaste really whiten teeth more than brand Y ? Algonquin College - Jan Ladas
Algonquin College - Jan Ladas Hypothesis Testing Null hypothesis (Ho) = there is no statistically significant difference between brand X and brand Y Positive hypothesis = brand X does whiten more * Ho – most often used as the hypothesis * Ho – assumed to be true Therefore the purpose of most research is to examine the truth of a theory or the effectiveness of a procedure and make them seem more or less likely! Algonquin College - Jan Ladas
Hypothesis Characteristics Hypothesis must have these characteristics in order to be researchable. Feasible Adequate number of subjects Adequate technical expertise Affordable in time and money Manageable in scope Interesting to the investigator Novel Confirms or refutes previous findings Extends previous findings Provides new findings Algonquin College - Jan Ladas
Hypothesis Characteristics Ethical Relevant To scientific knowledge To clinical and health policy To future research direction Algonquin College - Jan Ladas
Algonquin College - Jan Ladas Significance Level A number (a = alpha) that acts as a cut-off point below which, we agree that a difference exists = Ho is rejected. Alpha is almost always either 0.01, 0.05 or 0.10. Represents the amount of risk we are willing to take of being wrong in our conclusion P < 0.10 = 10% chance P < 0.01 = 1% chance (cautious) P < 0.05 = 5% chance Critical value cut-off point of sample is set before conducting the study (usually P < 0.05) Algonquin College - Jan Ladas
Algonquin College - Jan Ladas Degree Of Freedom (D.F.) Most tests for statistical significance require application of concept of d.f. d.f. refers to number of values observed which are free to vary after we have placed certain restrictions on the data collected * d.f. usually equals the sample size minus 1 e.g.: 8, 2, 15, 10, 15, 7, 3, 12, 15, 13 = 100 d.f. = number (10) minus 1 = 9 Takes chance into consideration A penalty for uncertainty, so the larger the sample the less the penalty Algonquin College - Jan Ladas