Ontology Representation of Biostatistics Terms

Slides:



Advertisements
Similar presentations
KRUSKAL-WALIS ANOVA BY RANK (Nonparametric test)
Advertisements

Statistical Issues in Research Planning and Evaluation
Statistical Decision Making
Statistical Tests Karen H. Hagglund, M.S.
ProportionMisc.Grab BagRatiosIntro.
Topic 2: Statistical Concepts and Market Returns
Statistics By Z S Chaudry. Why do I need to know about statistics ? Tested in AKT To understand Journal articles and research papers.
Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 8 Analyzing and Interpreting Quantitative.
Today Concepts underlying inferential statistics
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
AM Recitation 2/10/11.
Copyright © 2008 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. John W. Creswell Educational Research: Planning,
Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time.
Estimation of Various Population Parameters Point Estimation and Confidence Intervals Dr. M. H. Rahbar Professor of Biostatistics Department of Epidemiology.
Basic statistics 11/09/13.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Statistics for Infection Control Practitioners Presented By: Shana O’Heron, MPH, CIC Infection Prevention and Management Associates.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
User Study Evaluation Human-Computer Interaction.
Analyzing and Interpreting Quantitative Data
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
The binomial applied: absolute and relative risks, chi-square.
Average Arithmetic and Average Quadratic Deviation.
Introduction to Statistics Osama A Samarkandi, PhD, RN BSc, GMD, BSN, MSN, NIAC Deanship of Skill development Dec. 2 nd -3 rd, 2013.
Chapter Eight: Using Statistics to Answer Questions.
Data Analysis.
Chapter 6: Analyzing and Interpreting Quantitative Data
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
T Test for Two Independent Samples. t test for two independent samples Basic Assumptions Independent samples are not paired with other observations Null.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
1 Probability and Statistics Confidence Intervals.
A short introduction to epidemiology Chapter 6: Precision Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand.
Chapter 13 Understanding research results: statistical inference.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Direct method of standardization of indices. Average Values n Mean:  the average of the data  sensitive to outlying data n Median:  the middle of the.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Dr.Theingi Community Medicine
Methods of Presenting and Interpreting Information Class 9.
MEASURES OF CENTRAL TENDENCY Central tendency means average performance, while dispersion of a data is how it spreads from a central tendency. He measures.
Sample size calculation
ESTIMATION.
CHAPTER 4 Research in Psychology: Methods & Design
Inference and Tests of Hypotheses
Relative Values.
The binomial applied: absolute and relative risks, chi-square
PCB 3043L - General Ecology Data Analysis.
Descriptive Statistics (Part 2)
Understanding Results
Chapter 2 Simple Comparative Experiments
Analyzing and Interpreting Quantitative Data
Description of Data (Summary and Variability measures)
12 Inferential Analysis.
SAMPLE SIZE DETERMINATION
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Analyzing Reliability and Validity in Outcomes Assessment Part 1
Central tendency and spread
Basic Statistical Terms
Descriptive and inferential statistics. Confidence interval
12 Inferential Analysis.
15.1 The Role of Statistics in the Research Process
Interpreting Epidemiologic Results.
Analyzing Reliability and Validity in Outcomes Assessment
Chapter Nine: Using Statistics to Answer Questions
DESIGN OF EXPERIMENT (DOE)
PSY 250 Hunter College Spring 2018
Introduction to epidemiology
Presentation transcript:

Ontology Representation of Biostatistics Terms OBI Ann Arbor Workshop 2012: Ontology Representation of Biostatistics Terms Yongqun “Oliver” He Unit for Laboratory Animal Medicine Department of Microbiology and Immunology Center for Computational Medicine and Bioinformatics University of Michigan Medical School Ann Arbor, MI 48109 http://obi-ontology.org/page/Workshop_OBI_Vancouver_2010_Mar#Use_Case_.2B_Issues_Proposals

Advantages of Ontology-based Statistical Analyses Allow data consistency checking e.g., RB51 is a Brucella vaccine BCG is a TB vaccine but not a Brucella vaccine Data sharing in Semantic Web Advanced data analysis in Semantic Web Automated reasoning

Ontological Representation of Statistical Analyses OntoDM: Ontological representation of data mining tasks and complex data types. Align with OBI http://kt.ijs.si/panovp/doku.php?id=ontodm OBI statistical analysis: Provide general top structure Continuous efforts towards more details and deeper hierarchy

Build an OBI Biostatistics subset? Approach: Get biostatistics terms Use OntoFox to get the statistics subset. OntoFox input & outputs file in OBI SVN my presentation folder.  Get all branch terms: data transformation data visualization intervention design data item data transformation objective

OBI Biostatistics Subset Design Pattern Study design is_ about hypothesis textual entity is_about data item (e.g., p-value) realizes some (concretizes some ‘study design') has_specified_output has_ specified _input hypothesis driven investigation data item (input data set) data transformation has_specified_input data transformation objective data visualization Many statistics tests already represented in OBI Many missing. Check the next slides …

Measures of Central Tendency (arithmetic) mean: Def. = the sum of the values divided by the number of values. Or: arithmetic average of a set of values, or distribution. WEB: http://en.wikipedia.org/wiki/Mean Done in OBI: ‘average value’ OBI_0000679 median: Def. = the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. Done in OBI: ‘center value’ OBI_0000674 Suggestion: may add mean and median as alternative terms to existing OBI terms.

Measures of Dispersion (1) Dispersion refers to the degree to which data are scattered around a specific value (e.g., mean) Standard deviation: measures the variability of data around the mean. It provides information on how much variability can be expected among individuals within a population. In samples that follow a "normal" distribution (i.e., Gaussian), 68 and 95 percent of values fall within one and two standard deviations of the mean, respectively. Not in OBI yet. OBI has ‘standard deviation calculation’ OBI_0200121, which has_specified_output only 'data item' Standard error of the mean: describes how much variability can be expected when measuring the mean from several different samples. In OBI: OBI_0000235. Status: “pending final vetting”.

Measures of Dispersion (2) Range: equals the difference between the largest and smallest observation. Not in OBI Percentile: equals the percentage of a distribution that is below a specific value. As an example, a child is in 90th percentile for weight if only 10 percent of children the same age weigh more than she does. Interquartile range: refers to the upper and lower values defining the central 50 percent of observations. The boundaries are equal to the 25th and 75th percentiles. The interquartile range can be depicted in a box and whiskers plot Not in OBI. OBI has ‘interquartile-range calculation’ OBI_0200122

Terms Describing Event Frequency Incidence: the number of new events that have occurred in a specific time interval divided by the population at risk at the beginning of the time interval. The result gives the likelihood of developing an event in that time interval. Prevalence: the number of individuals with a given disease at a given point in time divided by the population at risk at that point in time. Point prevalence: the proportion of individuals with a condition at a specified point in time, Period prevalence: the proportion of individuals with a condition during a specified interval (e.g., a year). Both terms not in OBI yet.

Terms Describing Magnitude of an Effect Used to define the relationship among variables of interest in a data set. Relative risk (or risk ratio): equals the incidence in exposed individuals divided by the incidence in unexposed individuals. The relative risk can be calculated from studies in which the proportion of patients exposed and unexposed to a risk is known, such as a cohort study. Not in OBI yet Odds ratio: the odds that an individual with a specific condition has been exposed to a risk factor divided by the odds that a control has been exposed. The odds ratio is used in case-control studies. The odds ratio provides a reasonable estimate of the relative risk for uncommon conditions.

Terms Describing Quality of Measurements Reliability: the extent to which repeated measurements of a relatively stable phenomenon fall closely to each other. Validity: the extent to which an observation reflects the "truth" of the phenomenon being measured. Both terms not in OBI yet

Measures of Test Performance (1) Sensitivity: The number of patients with a positive test who have a disease divided by all patients who have the disease. A test with high sensitivity will not miss many patients who have the disease (i.e., few false negative results). Specificity: The number of patients who have a negative test and do not have the disease divided by the number of patients who do not have the disease. A test with high specificity will infrequently identify patients as having a disease when they do not (i.e., few false positive results). Both terms not in OBI yet

Measures of Test Performance (2) Likelihood ratio: a measure of the odds of having a disease relative to the prior probability of the disease. The estimate is independent of the disease prevalence. A positive likelihood ratio is calculated by dividing sensitivity by 1 minus specificity (sensitivity/(1-specificity)). A negative likelihood ratio is calculated by dividing 1 minus sensitivity by specificity ((1-sensitivity)/specificity). E.g., positive and negative likelihood ratios of 9 and 0.25, means that a positive result is seen 9 times as frequently while a negative test is seen 0.25 times as frequently in those with a specific condition than those without it. Not in OBI yet. OBI term ‘Likelihood-ratio test’ OBI_0000861 Accuracy: the number of true positives and true negatives divided by the total number of observations. Not in OBI yet.

Used in Making Inferences about Data (1) Errors: Two potential errors are commonly recognized when testing a hypothesis. Type I error (also known as alpha): the probability of incorrectly concluding that there is a statistically significant difference in a dataset. Alpha is the number after a p-value. Thus, a statistically significant difference reported as p<0.05 means that there is less than a 5 percent chance that the difference could have occurred by chance. Not in OBI yet Type II error (also known as beta): the probability of incorrectly concluding that there was no statistically significant difference in a dataset. This error often reflects insufficient power of the study.

Used in Making Inferences about Data (2) Confidence interval: The boundaries of a confidence interval give values within which there is a high probability (95 percent by convention) that the true population value can be found. The calculation of a confidence interval considers the standard deviation of the data and the number of observations. Thus, a confidence interval narrows as the number of observations increases, or its variance (dispersion) decreases. Not in OBI yet Power (calculated as 1 - beta): the ability of a study to detect a true difference. Negative findings may reflect that the study was underpowered to detect a difference. A "power calculation“: to be sure that there are a sufficient number of observations to detect a desired degree of difference. The larger the difference, the fewer the number of observations required.

Study Design Cohort study: starts with an exposure and moves forward to the outcome of interest, even if the data are collected retrospectively. As an example, a group of patients who have variable exposure to a risk factor of interest can be followed over time for an outcome. Case-control study: starts with the outcome of interest and works backward to the exposure. For instance, patients with a disease are identified and compared with controls for exposure to a risk factor. Randomized controlled trial (RCT): an experimental design in which patients are randomly assigned to two or more interventions. These terms not in OBI yet.

More Statistics Terms to be Included in OBI Philippe’s suggestions: Terms dealing with probability distribution: Statistics tests make assumption about distribution Exp: Normal distribution, Poisson, Binomial, Negative Binomial,.... Variance OBI has "variance calculation" but no 'variance data item' Al Hero’s suggestions: Multivariate analysis (Hotelling test, canonical correlation analysis, multivariate non-parametrics) Computational statistics (Fisher scoring, EM algorithm, Iterative reweighted LS) Variable selection (lasso, group lasso, elastic net, fused lasso) Topic models to identify more concepts in hierarchy

OBI Representation of ANOVA Reference: OBI SIG 2010 paper He Y, Xiang Z, Todd T, Courtot M, Brinkman R, Zheng J, Stoeckert CJ, Malone J, Rocca-Serra P, Sansone S, Fostel J, Soldatova LN, Peters B, Rutternberg A. Ontology representation and ANOVA analysis of vaccine protection investigation. Proceeding of Bio-Ontologies 2010: Semantic Applications in Life Sciences, ISMB, July 9-10, 2010. Boston, MA, USA. Full length paper. Links: https://obi.svn.sourceforge.net/svnroot/obi/trunk/docs/papers/SIG_2010/ANOVA_Vaccine_usecase_camera.pdf https://obi.svn.sourceforge.net/svnroot/obi/trunk/docs/presentations/ANOVA_Vaccine_He_SIG2010.ppt

Ontology Design Pattern of ANOVA for a Literature Meta-analysis

Transfer Instance Data to OWL Instance data in correct VO ontology hierarchy Only related ontology terms are included Ontobat: http://ontobat.hegroup.org/

Challenges How to represent mathematic formula using ontology? How to represent statistical null hypothesis? How to run ontology-supported statistical analysis within the context of semantic web?

More References http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms http://dorakmt.tripod.com/mtd/glosstat.html http://www.uth.tmc.edu/uth_orgs/educ_dev/oser/LGLOS1_0.HTM

Philippe Rocca-Serra, Alfred Hero, Jessica Turner Acknowledgements OBI, IAO Philippe Rocca-Serra, Alfred Hero, Jessica Turner NIH-NIAID Grant: R01AI081062