Presentation is loading. Please wait.

Presentation is loading. Please wait.

Representation of biomedical sublanguages using symbolic notation

Similar presentations


Presentation on theme: "Representation of biomedical sublanguages using symbolic notation"— Presentation transcript:

1 Representation of biomedical sublanguages using symbolic notation
John MacMullen SILS Bioinformatics Journal Club Fall 2003

2 Terminology article assumptions
“Knowledge encoded in textual documents is organized around sets of domain-specific terms…” [938] “Terms represent the most important concepts in a domain and characterize documents semantically.” [939] “[T]he basic problem is to recognize domain-specific concepts and to extract instances of specific relationships among them.” [938] Terms are ambiguous and have variation; they are hardly ever mono-referential The lack of naming conventions (controlled vocabularies), the existence of acronyms, and the large existing heterogeneous literatures increase complexity. [from Nenadic, G., Spasic, I., & Ananiadou, S. (2003). Terminology-driven mining of biomedical literature. Bioinformatics 19(8), ] SILS Bioinformatics Journal Club – Fall 2003

3 SILS Bioinformatics Journal Club – Fall 2003
Harris’ Assumptions “[T]here is a particular structure to science information in general, and to the information of each subscience in particular”, [because] “for each subscience there are particular subsets of nouns that occur with particular subsets of verbs or other words” [215]. “[I]t is not intrinsic properties of sounds and meanings that determine the possible word-sequences of sentences.” […] “For each word, we find roughly stable inequalities of probability among the words in its required (positive probability) set” [216]. “What is common to the texts of a given subject matter is that first-level words of a given subset require zero-level words of only a particular subset” [217]. “We thus obtain for the science several statement-types […]” [217]. “What we have here is thus an information-theoretic approach to the structure of information, as against solely the amount of information” [217, emphasis added] SILS Bioinformatics Journal Club – Fall 2003

4 Linguistic probabilities
“For each word, we find roughly stable inequalities of probability among the words in its required (positive probability) set” [216]. “The meaning of a word is indicated, and in part created, by the meanings of the words in respect to which it has higher than average probability” [217] “Words with highest probability in respect to another word, or which otherwise can be shown structurally to have highest expectancy, add little or no information” [217]. SILS Bioinformatics Journal Club – Fall 2003

5 Representing Meaning with Notation
Movement towards structured rather than natural language Representing sentences as propositions whose truth can be tested Example [from ]: If ‘G’ = “antigen”, and ‘J’ = “injected into”, and ‘B’ = “ear”, then ‘GJB’ = “antigen injected into ear” SILS Bioinformatics Journal Club – Fall 2003

6 Symbolic representation
SILS Bioinformatics Journal Club – Fall 2003

7 SILS Bioinformatics Journal Club – Fall 2003
Applications Investigate “the possibilities of obtaining standard notations for science languages,not by fiat but by boiling down from actual use…” [220]. “[R]elate the information structure of a science to anything else that characterizes the field, in order to reach if possible a ‘‘structure’’of the science” [220]. “[S]ee how tabular or other two-dimensional displays can represent the data (or the Result statements) of articles, for human inspection or for computer processing” [220]. SILS Bioinformatics Journal Club – Fall 2003

8 SILS Bioinformatics Journal Club – Fall 2003
Other Propositions “[W]hen, in a given science, articles written in different languages are analyzed […], we obtain the same sentence-types and structures,with only small differences due to the languages. “The word class and subclass symbols,and the sentence-types,are therefore not just a sublanguage of a particular language, but an independent symbolic linguistic system” [219]. Difference between ‘equivalance’ and ‘equal’? Example: La casa di Gianni è bianco. [original] The house of Gianni is white. [literal or equal] John’s house is white. [equivalent] SILS Bioinformatics Journal Club – Fall 2003

9 SILS Bioinformatics Journal Club – Fall 2003
Questions Assume Harris’ notation method is valid and works well. How might it be implemented in practice? (This is both an algorithm question and a policy question.) Who would apply it? What would some of the barriers be? Do Harris' arguments hold in an interdisciplinary environment? SILS Bioinformatics Journal Club – Fall 2003

10 SILS Bioinformatics Journal Club – Fall 2003
References Harris, Zellig S. (2002). The structure of science information. Journal of Biomedical Informatics, 35, Linguistic String NYU: MedLEE project (Medical Language Extraction and Encoding System): Zellig Harris homepage: SILS Bioinformatics Journal Club – Fall 2003


Download ppt "Representation of biomedical sublanguages using symbolic notation"

Similar presentations


Ads by Google