Cross-domain Knowledge Discovery with Literature Mining Tanja Urbančič University of Nova Gorica, Nova Gorica, Slovenia and Jozef Stefan Institute, Ljubljana,

Slides:



Advertisements
Similar presentations
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Advertisements

Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
A view of life Chapter 1. Properties of Life Living organisms: – are composed of cells – are complex and ordered – respond to their environment – can.
Apoptosis By Dr Abiodun Mark .A.
Automating Discovery from Biomedical Texts Marti Hearst & Barbara Rosario UC Berkeley Agyinc Visit August 16, 2000.
1 Interfaces for Intense Information Analysis Marti Hearst UC Berkeley This research funded by ARDA.
Article Review Study Fulltext vs Metadata Searching Brad Hemminger School of Information and Library Science University of North Carolina.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Web Mining Research: A Survey
Social Pharmacy and Pharmacoepidemiology Lister Hill National Center for Biomedical Communications Text-based Discovery in Biomedicine The Architecture.
UCB CS Research Fair Search Text Mining Web Site Usability Marti Hearst SIMS.
Building Knowledge-Driven DSS and Mining Data
Triangulation of network metaphors The Royal Netherlands Academy of Arts and Sciences Iina Hellsten & Andrea Scharnhorst Networked Research and Digital.
B IOMEDICAL T EXT M INING AND ITS A PPLICATION IN C ANCER R ESEARCH Henry Ikediego
EXPLORING LIFE. What is SCIENCE? Derived from the Latin verb meaning “to know” Science is… …a process by which we know and understand how the natural.
The Science of Life Biology unifies much of natural science
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
Data Mining Chun-Hung Chou
Knowledgebase Creation & Systems Biology: A new prospect in discovery informatics S.Shriram, Siri Technologies (Cytogenomics), Bangalore S.Shriram, Siri.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
NURS 4006 Nursing Informatics
Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR.
Claudia Marzi Institute for Computational Linguistics, “Antonio Zampolli” – Italian National Research Council University of Pavia – Dept. of Theoretical.
Location of JSI EuropeSlovenia Micro-location of JSI Department of Knowledge Technologies Jožef Stefan Institute Ljubljana.
Bioinformatics and medicine: Are we meeting the challenge?
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Integrated Biomedical Information for Better Health Workprogramme Call 4 IST Conference- Networking Session.
Group Number: 2 Britney Porter, Sandra Nguyen, Eduardo Vargas and Samender Singh Randhawa.
Finish up array applications Move on to proteomics Protein microarrays.
1 What is Life? – Living organisms: – are composed of cells – are complex and ordered – respond to their environment – can grow and reproduce – obtain.
LOGO A comparison of two web-based document management systems ShaoxinYu Columbia University March 31, 2009.
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Generic Tasks by Ihab M. Amer Graduate Student Computer Science Dept. AUC, Cairo, Egypt.
Chapter 4 Decision Support System & Artificial Intelligence.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
BBN Technologies Copyright 2009 Slide 1 The S*QL Plugin for Cytoscape Visual Analytics on the Web of Linked Data Rusty (Robert J.) Bobrow Jeff Berliner,
1 Semantic Relations for Interpreting DNA Microarray Data and for Novel Hypotheses Generation Dimitar Hristovski, 1 PhD, Andrej Kastrin, 2 Borut Peterlin,
Chapter 2. From Complex Networks to Intelligent Systems in Creating Brain-like Systems, Sendhoff et al. Course: Robots Learning from Humans Baek, Da Som.
L&I SCI 110: Information science and information theory Instructor: Xiangming(Simon) Mu Sept. 9, 2004.
Association of functional polymorphisms of Bax and Bcl2 genes with schizophrenia Kristina Pirumya, PhD, Laboratory of Human Genomics and Immunomics Institute.
Aiding Biomedical Researchers with Tools to Assist Discovery Neil R. Smalheiser May 18, 2006.
Peter Brusilovsky. Index What is adaptive navigation support? History behind adaptive navigation support Adaptation technologies that provide adaptive.
Automatically Identifying Candidate Treatments from Existing Medical Literature Catherine Blake Information & Computer Science University.
Citation-Based Retrieval for Scholarly Publications 指導教授:郭建明 學生:蘇文正 M
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Principals of Biomedical Research Guri Tzivion, PhD Extension 506 PBMR 611: Winter 2016 Windsor University School of Medicine.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
The Science of Biology Chapter 1.
RaJoLink: Creative Knowledge Discovery by Literature Outlier Detection
Lindsay & Gordon’s Discovery Support Systems Model
John MacMullen SILS Bioinformatics Journal Club Fall 2002
PubMed Database Interface (Basic Course Module 4 Part A)
The Science of Biology Chapter 1.
Ingenuity Knowledge Base
1 Department of Engineering, 2 Department of Mathematics,
Data Warehousing and Data Mining
1 Department of Engineering, 2 Department of Mathematics,
Interfaces for Intense Information Analysis
1 Department of Engineering, 2 Department of Mathematics,
The Science of Biology Chapter 1.
Class Prediction Based on Gene Expression Data Issues in the Design and Analysis of Microarray Experiments Michael D. Radmacher, Ph.D. Biometric Research.
KNOWLEDGE MANAGEMENT (KM) Session # 37
Volume 70, Issue 5, Pages (June 2011)
The Science of Biology Chapter 1.
PubMed Database Interface (Basic Course: Module 4 Part A)
PubMed Database Interface Part A (Basic Course Module 4)
Presentation transcript:

Cross-domain Knowledge Discovery with Literature Mining Tanja Urbančič University of Nova Gorica, Nova Gorica, Slovenia and Jozef Stefan Institute, Ljubljana, Slovenia Ingrid Petrič University of Nova Gorica, Nova Gorica, Slovenia Nada Lavrač, Bojan Cestnik, Matjaž Juršič, Borut Sluban Jozef Stefan Institute, Ljubljana, Slovenia Marta Macedoni-Lukšič University Children’s Hospital, Ljubljana, Slovenia (previous), Institute for Autism (now)

Motivation Situation: –Specialization of researchers –Potentially useful connections between “islands” of knowledge may remain hidden –Speed of knowledge growth, huge ammounts of literature available on-line Our objective: –To develop methods and SW tools to support researchers in the dicovery of new knowledge from literature

Ideas for new discoveries often come from other contexts Examples: From Jacquard’s loom (1801) to Babbage’s Analitic Engine (1837) –Punched cards representing colorful textil patterns –Punched cards representing data, problems etc. From evolution in nature to evolutionary computing (Lawrence J. Fogel, 1964) –Populations of individuals in nature developing through generations (“survival of the fittest”) –Populations of candidate solutions developing through simulated evolution (“maximization of the fitness function”)

Cross-context link discovery Finding links which lead “out-of-the-plane” Bisociations as a basis of human creativity “ … the perceiving of a situation or idea, L, in two self-consistent but habitually incompatible frames of reference, M1 and M2. The event L... is not merely linked to one associative context but bisociated with two.” (Koestler, 1964) BISON: Bisociation Networks for Creative Information Discovery 7FP project ( ) Two concepts are bisociated if and only if: –There is no direct, obvious evidence linking them –One has to cross contexts to find the link –This new link provides some novel insight

Chains of associations across domains (BISON, M. Berthold, 2008)

Heterogeneous data sources (BISON, M. Berthold, 2008)

Bridging concepts (BISON, M. Berthold, 2008)

BISON approaches Main BISON approach: graph exploration –Fusion of heterogeneous data/knowledge sources into a joint representation format - a large information network named BisoNet (consisting of nodes and relatioships between nodes) –Find previously unknown links between BisoNet nodes belonging to different contexts (domains) Complementary BISON approach: text mining –Find yet unknown terms in the intersection of documents, crossing different contexts (domains/literatures)

Cross-context link discovery from literature Most influential related work: R.D. Swanson: Medical literature as a potential source of new knowledge, 1990 Literature about Literature about magnesium (A) migraine (C) ( articles) (4.600 articles) (B)

Cross-context link discovery from literature Swanson: Medical literature as a potential source of new knowledge, 1990 Domain A: Domain C: Literature about magnesium Literature about migraine ( articles) (4.600 articles) Calcium channel blockers can prevent migraine attacks. Stress and Type A behavior are associated with migraine. Migraine may involve sterile inflammation of the cerebral blood vessels.... Mg is a natural calcium channel blocker. Stress and Type A behavior can lead to body loss of Mg. Magnesium has anti-inflammatory properties....

Medical literature as a potential source of new knowledge: What do we have at disposal today? Biomedical bibliographical database PubMed US National Library of Medicine More than 21M citations More than journals – references added each working day! Do we use such sources effectively?

Generation of a hypothesis A may influence C _________________ For a given C, how do we find A? Swanson: “Search proceeds via some intermediate literature (B) toward an unknown destination A. … Success depends entirely on the knowledge and ingenuity of the searcher.” Can we provide a more systematic support?

Closed vs. open discovery (as defined by Weeber et al., 2001) Closed discovery Open discovery C and A known in advance Only C known in advance

Combined open and closed discovery in RaJoLink ( Petrič, Urbančič, Cestnik, Macedoni-Lukšič, 2009) Open discovery (upper part of the figure): -Identifying Rare terms r -Finding a Joint term a Closed discovery (lower part of the figure): -Searching for Linking terms b

RaJoLink method: Idea (open discovery part) Literature C Literature R3 Literature R1 Literature R2 Rare term R1 Rare term R2 Rare term R3 Joint term A

RaJoLink method: Idea (closed discovery part) Literature C Literature A Joint term A Linking term B1 Linking term B2 Linking term B3

Procedures of the RaJoLink

Case 1: “Re-investigating” migraine ( Urbančič, Petrič, Cestnik 2009)  Magnesium “re-discovered” from articles about migraine up to 1988  Other 3 important connections identified with RaJoLink:  interferon  interleukin  tumor necrosis factor Their relation with migraine appeared in PubMed after 1988!

Case 2: Autistic spectrum disorders Abnormal development of cognitive, communication and social interaction skills Estimated prevalence 5.8 / 1000, increasing Causal explanation in 10-15% of cases Complex connections of genetic and environmental factors Research fragmented (behavioral psychology, genetics, biochemistry, brain anatomy and physiology, …) Early diagnosis and treatment very important

Results obtained on autism domain, and on autism+fragile_X domain

Autism literatureNF-kappaB literature Araghi-Niknam and Fatemi [2] showed reduction of Bcl-2, an important marker of apoptosis, in frontal, parietal and cerebellar cortices of autistic individuals. Mattson [12] reported in his review that activation of NF-kappaB in neurons can promote their survival by inducing the expression of genes encoding antiapoptotic proteins such as Bcl-2 and the antioxidant enzyme Mn-superoxide dismutase. Vargas et al. [21] reported altered cytokine expression profiles in brain tissues and cerebrospinal fluid of patients with autism. Ahn and Aggarwal [1] reported that on activation NF-kappaB regulates the expression of almost 400 different genes, which include enzymes, cytokines (such as TNF, IL-1, IL-6, IL-8, and chemokines), adhesion molecules, cell cycle regulatory molecules, viral proteins, and angiogenic factors. Ming et al. [13] reported about the increased urinary excretion of an oxidative stress biomarker - 8-iso- PGF2alpha in autism. Zou and Crews [24] reported about increase in NF-kappaB DNA binding following oxidative stress neurotoxicity.

 Urbančič T., Petrič I., Cestnik B., Macedoni-Luk š ič M. Literature mining: towards better understanding of autism. In: Bellazzi, R. et al., editors. AIME Proceedings of the 11th Conference on Artificial Intelligence in Medicine in Europe; Amsterdam, The Netherlands (2007). “… according to our analysis one possible point of convergence between oxidative stress and immunological disorder paradigm in autism is NF-kappaB.”  Naik U.S., Gangadharan, Abbagani K., Nagalla B., Dasari N., Manna S. K. A Study of Nuclear Transcription Factor –Kappa B in Childhood Autism. PLoS ONE 6(5): e19488 (2011). “We have noticed significant increase in NF-kB DNA binding activity in peripheral blood samples of children with autism.” “This finding has immense value in understanding many of the known biochemical changes reported in autism.” Results - Towards better understanding of autism

Open discovery - Conclusions It is worthwhile to explore rare terms for generation of hypotheses (RaJoLink method). Expert’s involvement is crucial for speeding up the process (selections) and for evaluations of candidate hypotheses, but it’s well supported by the method. Expert evaluation confirmed the relevance of discovered relations in the autism domain (published in medical journals). RaJoLink: Method for finding seeds of future discoveries in nowadays literature

Closed Discovery – Role of Outlier documents The majority of bridging concepts can be found in outlier documents. (Sluban, Juršič, Cestnik, Lavrač, 2012) Focusing on outliers from literatures A and C resulted in substantial search reduction. (Petrič, Cestnik, Lavrač, Urbančič, 2012)

Improved Closed Discovery Web application CrossBee (Juršič, Cestnik, Urbančič, Lavrač 2013): –Ranking of candidate b-terms according to their bisociation potential –Side-by-side document view, customized for expert inspection of potential cross-domain links –Standard document navigation – filtering, keyword and similarity search –Color-based domain separation scheme –User interface customization (emphasizing features on/off, …) –Hierarchical semi-automated document clustering (Top Circle)

B-TERM DISCOVERY B-term 1 Term 2 B-term 3 Term 4 Term 5 Term 6 Term 7 B-term 8 Term 9 Term 10 Term 11 B-term 12 Term 13.

B-TERM DISCOVERY B-term 1 Term 2 B-term 3 Term 4 Term 5 Term 6 Term 7 B-term 8 Term 9 Term 10 Term 11 B-term 12 Term 13.

B-TERM DISCOVERY B-term 1 B-term 3 B-term 8 B-term 12 Term 2 Term 4 Term 5 Term 6 Term 7 Term 9 Term 10 Term 11 Term 13.

Side-by-side view of documents in CrossBee

CrossBee Topic Circle for top-down document clustering

CrossBee: indicating similar clusters in two different domains, potentially leading to a novel bisociative link

Conclusions Open discovery (suggesting which domains to connect) RaJoLink – a method for finding seeds of future discoveries in nowadays literature Closed discovery (connecting domains via bridging terms) CrossBee – a web application with empowered HCI for exploration of documents for bridging term New discoveries based on cross-domain connections uncovered from literature in biomedical domains Potential for other domains