Neurocognitive approach to clustering of PubMed query results P. Matykiewicz, Włodzisław Duch, Dept. of Informatics, Nicolaus Copernicus Uni, Toruń, Poland.

Slides:



Advertisements
Similar presentations
Semantic Memory for Avatars in Cyberspace Julian Szymański, Tomasz Sarnatowicz, Włodzisław Duch Department of Informatics, Nicolaus Copernicus University,
Advertisements

Cognitive Systems, ICANN panel, Q1 What is machine intelligence, as beyond pattern matching, classification and prediction. What is machine intelligence,
New Technologies Supporting Technical Intelligence Anthony Trippe, 221 st ACS National Meeting.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
The Role of the UMLS in Vocabulary Control CENDI Conference “Controlled Vocabulary and the Internet” Stuart J. Nelson, MD.
Concept Description Vectors and the 20 Questions Game Włodzisław Duch Tomasz Sarnatowicz Julian Szymański.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Automating Discovery from Biomedical Texts Marti Hearst & Barbara Rosario UC Berkeley Agyinc Visit August 16, 2000.
A First Attempt towards a Logical Model for the PBMS PANDA Meeting, Milano, 18 April 2002 National Technical University of Athens Patterns for Next-Generation.
WMES3103 : INFORMATION RETRIEVAL
The Semantic Retrieval System: Real-time System for Classifying and Retrieving Unstructured Pediatric Clinical Annotations Charlotte Andersen John Pestian.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
Minimum Spanning Trees Displaying Semantic Similarity Włodzisław Duch & Paweł Matykiewicz Department of Informatics, UMK Toruń School of Computer Engineering,
Chapter Seven The Network Approach: Mind as a Web.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Multimedia Data Mining Arvind Balasubramanian Multimedia Lab (ECSS 4.416) The University of Texas at Dallas.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
Data Mining Techniques
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
Indexing 1/2 BDK12-3 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
1 Introduction to Modeling Languages Striving for Engineering Precision in Information Systems Jim Carpenter Bureau of Labor Statistics, and President,
Chapter 14: Artificial Intelligence Invitation to Computer Science, C++ Version, Third Edition.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Path Knowledge Discovery: Association Mining Based on Multi-Category Lexicons Chen Liu, Wesley W. Chu, Fred Sabb, Stott Parker and Joseph Korpela.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
Intelligent Systems Lecture 20 Examples of NLP in searching systems.
Nancy Lawler U.S. Department of Defense ISO/IEC Part 2: Classification Schemes Metadata Registries — Part 2: Classification Schemes The revision.
1 st June 2006 St. George’s University of LondonSlide 1 Using UMLS to map from a Library to a Clinical Classification: Improving the Functionality of a.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
Towards CI Foundations Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland Google: W. Duch WCCI’08 Panel Discussion.
Mining the Biomedical Research Literature Ken Baclawski.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College Bio Informatics January
Information Retrieval
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Data Mining and Decision Support
Automatically Identifying Candidate Treatments from Existing Medical Literature Catherine Blake Information & Computer Science University.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Lecture 5 Frames. Associative networks, rules or logic do not provide the ability to group facts into associated clusters or to associate relevant procedural.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
MEDLINE®/PubMed® PubMed for Trainers, Fall 2015 U.S. National Library of Medicine (NLM) and NLM Training Center An introduction.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
TDM in the Life Sciences Application to Drug Repositioning *
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
What is cognitive psychology?
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
NeurOn: Modeling Ontology for Neurosurgery
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Department of Informatics, Nicolaus Copernicus University, Toruń
Semantic Memory for Avatars in Cyberspace
Citation-based Extraction of Core Contents from Biomedical Articles
Chaitali Gupta, Madhusudhan Govindaraju
The Network Approach: Mind as a Web
Presentation transcript:

Neurocognitive approach to clustering of PubMed query results P. Matykiewicz, Włodzisław Duch, Dept. of Informatics, Nicolaus Copernicus Uni, Toruń, Poland P.M. Zender, K.A. Crutcher, J.P. Pestian Cincinnati Children's Hospital Medical Center, Ohio, USA Google: W. Duch ICONIP 2008,Auckland, NZ

Plan How can we help medical professionals to find relevant information? Neurocognitive informatics. Semantic memory and other types of memory. Creating semantic memory. UMLS as a semantic memory. Spreading activation. Literature based discovery. Neurocognitive approach to literature based discovery. Plans for the future.

Neurocognitive informatics Computational Intelligence. An International Journal (1984) + 10 other journals with “Computational Intelligence”, D. Poole, A. Mackworth R. Goebel, Computational Intelligence - A Logical Approach. (OUP 1998), GOFAI book, logic and reasoning. CI: lower cognitive functions, perception, signal analysis, action control, sensorimotor behavior. CI: lower cognitive functions, perception, signal analysis, action control, sensorimotor behavior. AI: higher cognitive functions, thinking, reasoning, planning etc. AI: higher cognitive functions, thinking, reasoning, planning etc. Neurocognitive informatics: brain processes can be a great inspiration for AI algorithms, if we could only understand them …. Neurocognitive informatics: brain processes can be a great inspiration for AI algorithms, if we could only understand them …. What are the neurons doing? Perceptrons, basic units in multilayer perceptron networks, use threshold logic – NN inspirations. What are the networks doing? Specific transformations, memory, estimation of similarity. How do higher cognitive functions map to the brain activity? Neurocognitive informatics = abstractions of this process.

Types of memory Neurocognitive approach to NLP: at least 4 types of memories. Long term (LTM): recognition, semantic, episodic + working memory. Input (text, speech) pre-processed using recognition memory model to correct spelling errors, expand acronyms etc. For dialogue/text understanding episodic memory models are needed. Working memory: an active subset of semantic/episodic memory. All 3 LTM are coupled mutually providing context for recognition. Semantic memory is a permanent storage of conceptual data. “Permanent”: data is collected throughout the whole lifetime of the system, old information is overridden/corrected by newer input. “Conceptual”: contains semantic relations between words and uses them to create concept definitions.

Semantic Memory Models Endel Tulving „Episodic and Semantic Memory” Semantic memory refers to the memory of meanings and understandings. It stores concept-based, generic, context-free knowledge. Permanent container for general knowledge (facts, ideas, words etc). Semantic network Collins Loftus, 1975 Hierarchical Model Collins Quillian, 1969

Semantic memory Hierarchical model of semantic memory (Collins and Quillian, 1969), followed by most ontologies. Connectionist spreading activation model (Collins and Loftus, 1975), with mostly lateral connections. Our implementation is based on connectionist model, uses relational database and object access layer API. The database stores three types of data: concepts, or objects being described; keywords (features of concepts extracted from data sources); relations between them. IS-A relation us used to build ontology tree, serving for activation spreading, i.e. features inheritance down the ontology tree. Types of relations (like “x IS y”, or “x CAN DO y” etc.) may be defined when input data is read from dictionaries and ontologies.

SM & neural distances Activations of groups of neurons presented in activation space define similarity relations in geometrical model (McClleland, McNaughton, O’Reilly, Why there are complementary learning systems, 1994).

Similarity between concepts Left: MDS on vectors from neural network. Right: MDS on data from psychological experiments with perceived similarity between animals. Vector and probabilistic models are approximations to this process. S ij ~  (w i,Cont)|  (w j,Cont) 

Creating SM The API serves as a data access layer providing logical operations between raw data and higher application layers. Data stored in the database is mapped into application objects and the API allows for retrieving specific concepts/keywords. Two major types of data sources for semantic memory: 1. 1.machine-readable structured dictionaries directly convertible into semantic memory data structures; 2. 2.blocks of text, definitions of concepts from dictionaries/encyclopedias. 3 machine-readable data sources are used: The Suggested Upper Merged Ontology (SUMO) and the the MId- Level Ontology (MILO), over 20,000 terms and 60,000 axioms. WordNet lexicon, more than 200,000 words-sense pairs. ConceptNet, concise knowledgebase with 200,000 assertions.

Creating SM – free text WordNet hypernymic (a kind of … ) IS-A relation + Hyponym and meronym relations between synsets (converted into concept/concept relations), combined with ConceptNet relation such as: CapableOf, PropertyOf, PartOf, MadeOf... Relations added only if in both Wordnet and Conceptnet. Free-text data: Merriam-Webster, WordNet and Tiscali. Whole word definitions are stored in SM linked to concepts. A set of most characteristic words from definitions of a given concept. For each concept definition, one set of words for each source dictionary is used, replaced with synset words, subset common to all 3 mapped back to synsets – these are most likely related to the initial concept. They were stored as a separate relation type. Articles and prepositions: removed using manually created stop-word list. Phrases were extracted using ApplePieParser + concept-phrase relations compared with concept-keyword, only phrases that matched keywords were used.

ULMS: Expert Semantic Memory Biomedical domain: hundreds of controlled vocabularies, hierarchies and ontologies. GO - gene ontology, used for gene annotation. ICD-9-CM - used for billing in US hospitals. SNOMED CT - used in electronic medical record systems. MeSH - used in annotation of biomedical literature in PubMed. Psychological Index Terms - used to annotate articles in psychology/psychiatry domain in PsycARTICLES citation database. Unified Medical Language System (ULMS). All of these sources and ~90 other sources connected together create: Unified Medical Language System (ULMS). This is the most detailed description of concepts and relations between them created so far.

Some facts about UMLS UMLS version 2007AC has: 92 English sources, including SNOMED CT, MeSH, ICD-9-CM, ICD-10 ect. 54,245 ambiguous phrases; 3,723,408 unique English phrases; 1,516,299 concepts. Concepts have: 16,918,281 unique structural (semantic) relations. 13,226,382 unique co-occurrence (associative) relations (e.g. PubMed medical subject headings co-occurrence). attributes, contexts, definitions, semantic types,... Is it a good basis for semantic/episodic memory and spreading activation networks approximating associations in expert’s brain?

Enhancing representations Experts reading the text activate their semantic memory and add a lot of knowledge that is not explicitly present in the text. Semantic memory is difficult to create: co-occurrence statistics does not capture structural relations of real objects and features. Better approximation (not as good as SM): use ontologies adding parent concepts to those discovered in the text. Ex: IBD => [C ] Inflammatory Bowel Diseases -> [C ] Disorder of small intestine -> [C ] Digestive System Disorders -> [C ] Inflammatory disorder of digestive tract -> [C ] Intestinal Precancerous Condition -> [C ] Gastrointestinal inflammatory disorders NEC -> [C ] Inflammation of specific body organs -> [C ] Intestinal Diseases -> [C ] [X]Non-infective enteritis and colitis [C ] Methotrexate (Pharmacologic Substance) => -> [C ] Antirheumatic Agents -> [C ] Analgesic/antipyretic/antirheumatic

Example without inhibition

Literature based discovery Biomedical research is divided into highly specialized fields and subfields, with poor communication between them. The rate of growth of publications makes it difficult for a researcher to derive connections between concepts from different research specialties. Mining hidden connections among biomedical concepts from large amounts of scientific literature is one of the important goals pursued in this field. Swanson explored biomedical literature to find novel connections between medical concepts. He proposed that “Fish Oil” may be used as a cure for “Reynaud's Disease”. Researchers followed up his finding and the hypothesis turned out be true.

Literature based discovery example Swanson found the hidden connection between “Fish Oil” and “Reynaud's Disease” by finding common set of concepts from the document set on “Fish Oil” and “Reynaud's Disease”. Fish Oil Raynaud’s disease High blood viscosity Platelet aggregation You can make medical disoveries!

Literature based discovery using Visual Language System VLS Hypothesis: quicker recognition of interesting relations when graph is presented as icons First consistent graphs are needed.

Graphs of consistent concepts General GCC idea: when the text is read and understood activation of semantic subnetwork in the expert brain is spread to new patterns, corresponding to related concepts; new concepts automatically have to fit to the active network, assuming meanings that increase overall network activation, or the consistency of text interpretation. Many approximations of this process may be defined. Success depends on the quality of semantic network. Explicit competition/inhibition among network nodes is important Recognition of concepts Spreading activation from concepts that are in the text to related concepts Build graph inhibiting concepts that are irrelevant.

PubMed queries Searching for: "Alzheimer disease“ [MeSH Terms] AND "apolipoproteins e“ [MeSH Terms] AND "humans“ [MeSH Terms] returns 2899 citations with 1924 MeSH terms. Out of 16 MeSH hierarchical trees only 4 trees have been selected: Anatomy; Diseases; Chemicals & Drugs; Analytical, Diagnostic and Therapeutic Techniques & Equipment. The number of concepts is Loop over: Cluster analysis; Feature space enhancement through ULMS relations between MeSH concepts; Inhibition, leading to filtering of concepts. Create graphical representation.

Initial step - MDS showing clusters

First step of activation: new concepts that are added

First step of activation: concepts that represent clusters/relations between them

First step of activation: clusters after enhancement

2 nd step of activation: new concepts that are added

2 nd step of activation: concepts that represent clusters/relations between them

6 th step of activation: concepts that represent clusters/relations

6 th step of activation: concepts that are used in all steps of spreading activation

6th step of activation: clusters

Future work Collaborative work with: Graphical designers Design glyphs as a basis of for icon Design rules how glyphs are connected to create an icon Design layout for consistent graphs Computer scientists Study effects of inhibition (different feature selection methods) Study properties of spreading activation algorithm Apply to other fields (e.g. text classification) Field experts Study performance of experts when text graph vs. icon graph is presented Rate graphs based on their content

Thank you for lending your ears... Google: W. Duch => Papers/presentations/projects