Representation of biomedical sublanguages using symbolic notation

Slides:



Advertisements
Similar presentations
Ontology Assessment – Proposed Framework and Methodology.
Advertisements

Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Induction and recursion
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
Conceptual modelling. Overview - what is the aim of the article? ”We build conceptual models in our heads to solve problems in our everyday life”… ”By.
Induction and recursion
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Adding Whole Numbers © Math As A Second Language All Rights Reserved next #5 Taking the Fear out of Math
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
INTRODUCTION TO THE THEORY OF COMPUTATION INTRODUCTION MICHAEL SIPSER, SECOND EDITION 1.
Copyright © Cengage Learning. All rights reserved. CHAPTER 5 Extending the Number System.
Problem Solving Techniques. Compiler n Is a computer program whose purpose is to take a description of a desired program coded in a programming language.
LOGIC AND ONTOLOGY Both logic and ontology are important areas of philosophy covering large, diverse, and active research projects. These two areas overlap.
Copyright © Cengage Learning. All rights reserved. CHAPTER 3 THE LOGIC OF QUANTIFIED STATEMENTS THE LOGIC OF QUANTIFIED STATEMENTS.
Copyright © Curt Hill Languages and Grammars This is not English Class. But there is a resemblance.
Technical Report of Web Mining Group Presented by: Mohsen Kamyar Ferdowsi University of Mashhad, WTLab.
1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
Distribution of information in biomedical abstracts and full- text publications M. J. Schuemie et al. Dept. of Medical Informatics, Erasmus University.
CONCLUSION The conclusion of this work is that it is possible to develop a problem-solving method providing evolutionary computational support to general.
THEORY OF COMPUTATION Komate AMPHAWAN 1. 2.
Copyright © Cengage Learning. All rights reserved. CHAPTER 4 ELEMENTARY NUMBER THEORY AND METHODS OF PROOF ELEMENTARY NUMBER THEORY AND METHODS OF PROOF.
NP-Completeness  For convenience, the theory of NP - Completeness is designed for decision problems (i.e. whose solution is either yes or no).  Abstractly,
Copyright © Cengage Learning. All rights reserved. Line and Angle Relationships 1 1 Chapter.
Chapter 5. Section 5.1 Climbing an Infinite Ladder Suppose we have an infinite ladder: 1.We can reach the first rung of the ladder. 2.If we can reach.
Of 24 lecture 11: ontology – mediation, merging & aligning.
WARM UP Solve: 1. 3x – 5 = (3x -5) = x – 3 + 4x = (2x – 4) = 6.
 Problem Analysis  Coding  Debugging  Testing.
A Probabilistic Quantifier Fuzzification Mechanism: The Model and Its Evaluation for Information Retrieval Felix Díaz-Hemida, David E. Losada, Alberto.
BOOLEAN INFORMATION RETRIEVAL 1Adrienn Skrop. Boolean Information Retrieval  The Boolean model of IR (BIR) is a classical IR model and, at the same time,
Chapter 5 1. Chapter Summary  Mathematical Induction  Strong Induction  Recursive Definitions  Structural Induction  Recursive Algorithms.
Logical Agents. Outline Knowledge-based agents Logic in general - models and entailment Propositional (Boolean) logic Equivalence, validity, satisfiability.
Modeling Arithmetic, Computation, and Languages Mathematical Structures for Computer Science Chapter 8 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesAlgebraic.
Knowledge Representation Part I Ontology Jan Pettersen Nytun Knowledge Representation Part I, JPN, UiA1.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Control Structures I Chapter 3
BİL711 Natural Language Processing
Chapter 7. Propositional and Predicate Logic
Relations, Functions, and Matrices
In the Name of God.
Statistical NLP: Lecture 7
Logics for Data and Knowledge Representation
Copyright © Cengage Learning. All rights reserved.
Induction and recursion
Algorithms An algorithm is a sequence of steps written in the form of English phrases that specific the tasks that are performed while solving the problem.It.
Terminology problems in literature mining and NLP
Introduction to Computer Programming
Great Theoretical Ideas in Computer Science
Survey of Knowledge Base Content
Direct Proof and Counterexample V: Floor and Ceiling
Using UMLS CUIs for WSD in the Biomedical Domain
Decision Properties of Regular Languages
Copyright © Cengage Learning. All rights reserved.
Extracting Semantic Concept Relations
Induction and recursion
INTRODUCTION TO THE THEORY OF COMPUTATION
Logics for Data and Knowledge Representation
INTRODUCTION TO HYPOTHESIS TESTING
The Programming Language L
Natural Language Processing
Chapter 7. Propositional and Predicate Logic
Copyright © Cengage Learning. All rights reserved.
Copyright © Cengage Learning. All rights reserved.
Presented by : Amna H.Ali MA Student
Bottom Up: Soundness and Completeness
Bottom Up: Soundness and Completeness
The Programming Language L
Copyright © Cengage Learning. All rights reserved.
Bottom Up: Soundness and Completeness
Logics for Data and Knowledge Representation
Presentation transcript:

Representation of biomedical sublanguages using symbolic notation John MacMullen SILS Bioinformatics Journal Club Fall 2003

Terminology article assumptions “Knowledge encoded in textual documents is organized around sets of domain-specific terms…” [938] “Terms represent the most important concepts in a domain and characterize documents semantically.” [939] “[T]he basic problem is to recognize domain-specific concepts and to extract instances of specific relationships among them.” [938] Terms are ambiguous and have variation; they are hardly ever mono-referential The lack of naming conventions (controlled vocabularies), the existence of acronyms, and the large existing heterogeneous literatures increase complexity. [from Nenadic, G., Spasic, I., & Ananiadou, S. (2003). Terminology-driven mining of biomedical literature. Bioinformatics 19(8), 938-943.] SILS Bioinformatics Journal Club – Fall 2003

SILS Bioinformatics Journal Club – Fall 2003 Harris’ Assumptions “[T]here is a particular structure to science information in general, and to the information of each subscience in particular”, [because] “for each subscience there are particular subsets of nouns that occur with particular subsets of verbs or other words” [215]. “[I]t is not intrinsic properties of sounds and meanings that determine the possible word-sequences of sentences.” […] “For each word, we find roughly stable inequalities of probability among the words in its required (positive probability) set” [216]. “What is common to the texts of a given subject matter is that first-level words of a given subset require zero-level words of only a particular subset” [217]. “We thus obtain for the science several statement-types […]” [217]. “What we have here is thus an information-theoretic approach to the structure of information, as against solely the amount of information” [217, emphasis added] SILS Bioinformatics Journal Club – Fall 2003

Linguistic probabilities “For each word, we find roughly stable inequalities of probability among the words in its required (positive probability) set” [216]. “The meaning of a word is indicated, and in part created, by the meanings of the words in respect to which it has higher than average probability” [217] “Words with highest probability in respect to another word, or which otherwise can be shown structurally to have highest expectancy, add little or no information” [217]. SILS Bioinformatics Journal Club – Fall 2003

Representing Meaning with Notation Movement towards structured rather than natural language Representing sentences as propositions whose truth can be tested Example [from 217-218]: If ‘G’ = “antigen”, and ‘J’ = “injected into”, and ‘B’ = “ear”, then ‘GJB’ = “antigen injected into ear” SILS Bioinformatics Journal Club – Fall 2003

Symbolic representation SILS Bioinformatics Journal Club – Fall 2003

SILS Bioinformatics Journal Club – Fall 2003 Applications Investigate “the possibilities of obtaining standard notations for science languages,not by fiat but by boiling down from actual use…” [220]. “[R]elate the information structure of a science to anything else that characterizes the field, in order to reach if possible a ‘‘structure’’of the science” [220]. “[S]ee how tabular or other two-dimensional displays can represent the data (or the Result statements) of articles, for human inspection or for computer processing” [220]. SILS Bioinformatics Journal Club – Fall 2003

SILS Bioinformatics Journal Club – Fall 2003 Other Propositions “[W]hen, in a given science, articles written in different languages are analyzed […], we obtain the same sentence-types and structures,with only small differences due to the languages. “The word class and subclass symbols,and the sentence-types,are therefore not just a sublanguage of a particular language, but an independent symbolic linguistic system” [219]. Difference between ‘equivalance’ and ‘equal’? Example: La casa di Gianni è bianco. [original] The house of Gianni is white. [literal or equal] John’s house is white. [equivalent] SILS Bioinformatics Journal Club – Fall 2003

SILS Bioinformatics Journal Club – Fall 2003 Questions Assume Harris’ notation method is valid and works well. How might it be implemented in practice? (This is both an algorithm question and a policy question.) Who would apply it? What would some of the barriers be? Do Harris' arguments hold in an interdisciplinary environment? SILS Bioinformatics Journal Club – Fall 2003

SILS Bioinformatics Journal Club – Fall 2003 References Harris, Zellig S. (2002). The structure of science information. Journal of Biomedical Informatics, 35, 215-221. Linguistic String Project @ NYU: http://www.cs.nyu.edu/cs/projects/lsp/ MedLEE project (Medical Language Extraction and Encoding System): http://cat.cpmc.columbia.edu/medleexml/ Zellig Harris homepage: http://www.dmi.columbia.edu/zellig/ SILS Bioinformatics Journal Club – Fall 2003