RaJoLink: Creative Knowledge Discovery by Literature Outlier Detection

Slides:



Advertisements
Similar presentations
Background Knowledge for Ontology Construction Blaž Fortuna, Marko Grobelnik, Dunja Mladenić, Institute Jožef Stefan, Slovenia.
Advertisements

1 Relational Data Mining Applied to Virtual Engineering of Product Designs Monika Žáková 1, Filip Železný 1, Javier A. Garcia-Sedano 2, Cyril Masia Tissot.
Facebook for scientists Titus Schleyer et al. 1 of 38 Digital Vita : Leveraging Personal Information Management Practices to Facilitate Research Collaborations.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Writing an original research paper Part one: Important considerations
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Social Pharmacy and Pharmacoepidemiology Lister Hill National Center for Biomedical Communications Text-based Discovery in Biomedicine The Architecture.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
A Framework for Detection of Anomalous and Suspicious Behavior from Agent’s Spatio-Temporal Traces Boštjan Kaluža Depratment of Intelligent Systems, Jožef.
SCOPUS AND SCIVAL EVALUATION AND PROMOTION OF UKRAINIAN RESEARCH RESULTS PIOTR GOŁKIEWICZ PRODUCT SALES MANAGER, CENTRAL AND EASTERN EUROPE KIEV, 31 JANUARY.
Cross-domain Knowledge Discovery with Literature Mining Tanja Urbančič University of Nova Gorica, Nova Gorica, Slovenia and Jozef Stefan Institute, Ljubljana,
Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center.
Data Mining Chun-Hung Chou
Data Mining and Knowledge Discovery prof. dr. Bojan Cestnik 1 Data Mining and Knowledge Discovery (DM & KD) prof. dr. Bojan Cestnik Temida d.o.o. & Jozef.
Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR.
The use of machine translation tools for cross-lingual text-mining Blaz Fortuna Jozef Stefan Institute, Ljubljana John Shawe-Taylor Southampton University.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Representing, Querying and Mining Knowledge about Autism Phenotypes
Location of JSI EuropeSlovenia Micro-location of JSI Department of Knowledge Technologies Jožef Stefan Institute Ljubljana.
Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.
1 How to find literature - A very short introduction SMED 8004 Medicine and Health Library October 2014.
Chris Luszczek Biol2050 week 3 Lecture September 23, 2013.
Chapter 1 Introduction to Data Mining
Marko Grobelnik Jozef Stefan Institute ( Ljubljana, Slovenia.
Knowledge Discovery in the Digital Library Access tools for mining science ICSTI Public Workshop Presented by: Bernard Dumouchel, Director-General February.
SCOPUS AND SCIVAL EVALUATION AND PROMOTION OF UKRAINIAN RESEARCH RESULTS PIOTR GOŁKIEWICZ PRODUCT SALES MANAGER, CENTRAL AND EASTERN EUROPE LVIV, 11 SEPTEMBER.
Information overload –more than 12 million references already in MEDLINE –thousands more each day –well-articulated queries retrieve many relevant articles.
Psychology (02): Finding the Research For Your Literature Review & Research.
Article by Dunja Mladenic, Marko Grobelnik, Blaz Fortuna, and Miha Grcar, Chapter 3 in Semantic Knowledge Management: Integrating Ontology Management,
Multi-agent Systems in Medicine Štěpán Urban. Content  Introduction to Multi-agent Systems (MAS) What is an Agent? Architecture of Agent MAS Platforms.
Why Biomedical Literature? Potentially 80% of the known biomedical knowledge is contained in PubMed Analysis Interpretation.
Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.
Jožef Stefan Institute Department of Systems and Control Systems Science XVI, September 2007, Wroclaw Gaussian Process Model Identification: a Process.
Matic Perovšek, Anže Vavpeti č, Nada Lavra č Jožef Stefan Institute, Slovenia A Wordification Approach to Relational Data Mining: Early Results.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.
Handling Floods of Literature: BioMedical Document Management Andrew Dolbey Center for Computational Pharmacology University of Colorado, School of Medicine.
Citation Searching Isabel Holowaty Juliet Ralph
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
Automatically Identifying Candidate Treatments from Existing Medical Literature Catherine Blake Information & Computer Science University.
The Thomson Reuters Journal Selection Policy – Building Great Journals - Adding Value to Web of Science Maintaining and Growing Web of Science Regional.
Text and Data Mining for Systematic Reviews Investigating Trends to Update Collaboration Services Virginia Pannabecker Virginia Tech, University Libraries.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
Data Mining for Expertise: Using Scopus to Create Lists of Experts for U.S. Department of Education Discretionary Grant Programs Good afternoon, my name.
Tim Friede Department of Medical Statistics
MINING DEEP KNOWLEDGE FROM SCIENTIFIC NETWORKS
Towards a framework for architectural design decision support
Inquiry learning and SimQuest
Chapter 1 The Science of Biology.
System for Semi-automatic ontology construction
Scholarship of Teaching and Learning
Guangbing Yang Presentation for Xerox Docushare Symposium in 2011
Lindsay & Gordon’s Discovery Support Systems Model
Inquiry, Pedagogy, & Technology: Automated Textual Analysis of 30 Refereed Journal Articles David A. Thomas Mathematics Center, University of Great Falls,
Department of intelligent systems
Harry Hochheiser Assistant Professor
PolyAnalyst Data and Text Mining tool
Finding the Literature for Systematic Reviews
The views expressed are the personal views of the presenter and do not reflect those of the PCAOB, members of the Board, or the PCAOB staff.
Lesson Overview 1.2 Science in Context.
Extracting claim sentences from biomedical documents:
SCIENCE AND ENGINEERING PRACTICES
Matjaž Gams Jozef Stefan Institute, Slovenia
PubMed.
WISER: Citiation searching
Semi-Automatic Data-Driven Ontology Construction System
PolyAnalyst™ text mining tool Allstate Insurance example
Presentation transcript:

RaJoLink: Creative Knowledge Discovery by Literature Outlier Detection Ingrid Petrič University of Nova Gorica Bojan Cestnik Temida, Ljubljana and Jozef Stefan Institute, Ljubljana Nada Lavrač Jozef Stefan Institute, Ljubljana and University of Nova Gorica Tanja Urbančič University of Nova Gorica and Jozef Stefan Institute, Ljubljana

Overview Motivation: Focus domain: Knowledge discovery support tools: to present a method that supports knowledge discovering by connecting information from different contexts in a new way Focus domain: biomedical research autism, a spectrum of pervasive developmental disorders Knowledge discovery support tools: RaJoLink (Petrič and Cestnik, 2007) OntoGen (Fortuna et al., 2006) Knowledge sources: MEDLINE (http://www.ncbi.nlm.nih.gov/sites/entrez) MeSH (http://www.nlm.nih.gov/mesh/)

Swanson’s model of literature-based discovery (Swanson, 1986) Literature about Literature about magnesium (A) migraine (C) (Bi)

Closed vs. open discovery process (Weeber et al., 2001)

Combined open and closed discovery in RaJoLink (Petrič et al., 2009) Open discovery (generation of hypothesis) Identifying rare terms r Finding joint terms a Closed discovery (hypothesis testing) Searching for linking terms b

The RaJoLink method: open discovery Literature R3 Literature R1 Joint term A Literature R2 Rare term R1 Rare term R2 Rare term R3 Literature C

The RaJoLink method: closed discovery Literature A Joint term A Linking term B1 Linking term B2 Linking term B3 Literature C

The RaJoLink method’s procedures (Petrič et al., 2009)

Steps of the RaJoLink method – step Ra Input Action Human involvement Output Ra Set of records about domain of interest (about phenomenon C) 1.1 Extraction of texts 1.2 Data preprocessing 1.3 Identification of rare terms 1.4 Terms filtering Indication of interesting rare terms Rare terms C_r1, C_r2,…C_rp

Step Ra

Steps of the RaJoLink method – step Jo Input Action Human involvement Output Jo Sets of records about C_r1, C_r2,…, C_rp 2.1 Extraction of texts 2.2 Data preprocessing 2.3 Search for joint terms Selection of a significant joint term Joint term a

Step Jo

Steps of the RaJoLink method – step Link Input Action Human involvement Output Link Joint set of records about a and articles about c 3.1 Extraction of texts 3.2 Data preprocessing 3.3 Identification of content related A and C records 3.4 Search for linking terms b Selection of meaningful linking terms Linking terms B1, B2,…Br

Step Link

Step Link - alternative

Conclusions Open discovery: RaJoLink represents a more interdisciplinary approach to hypotheses generation that bridges the overspecialization in the sciences. We provide connections between biomedical literature by analysis and explanation of rare terms. Closed discovery: With the combination of outlier detection and high frequency analysis approach we demonstrated that outlying documents could be used as a heuristic guidance to speed-up the search for the linking terms and alleviate the burden on the expert when hypotheses have to be tested. Recent experiments: Detection of published evidence of autism findings that coincide with specific calcineurin and NF-kappaB observations (Petrič et al., 2007, Urbančič et al., 2007). The gold standard evaluation: RaJoLink led to the Swanson’s relation of magnesium with migraine and to other three discoveries important for migraine.

Future work Automated identification of semantic variants such as abbreviations, acronyms and synonyms. Handling the language specifics for Slovenian and other languages. Providing visualizations of results. Implementing the similarity measure between documents in the Link step.

RaJoLink - references Petrič, I.; Urbančič, T.; Cestnik, B. Discovering hidden knowledge from biomedical literature. Informatica 31(1):15-20 (2007). Petrič, I.; Urbančič, T.; Cestnik, B. Literature mining: potential for gaining hidden knowledge from biomedical articles, In: Bohanec, M.; Gams, M.; Rajkovič, V.; Urbančič, T.; Bernik, M.; Mladenić, D. et al., editors. IS-2006. Proceedings of the 9th International multi-conference Information Society; Ljubljana, Slovenia. 52-55 (2006). Petrič, I.; Urbančič, T.; Cestnik, B.; Macedoni-Lukšič, M. Literature mining method RaJoLink for uncovering relations between biomedical concepts. Journal of Biomedical Informatics 42(2): 219-227 (2009). Urbančič, T.; Petrič, I.; Cestnik, B.; Macedoni-Lukšič, M. Literature mining: towards better understanding of autism. In: Bellazzi R; Abu-Hanna A; Hunter J, editors. AIME 2007. Proceedings of the 11th Conference on Artificial Intelligence in Medicine in Europe; Amsterdam, The Netherlands. 217-226 (2007).