BeeSpace Informatics Research: From Information Access to Knowledge Discovery ChengXiang Zhai Nov. 14, 2007.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
A Researcher’s Workbench in 2020: Intelligent Information Systems for Knowledge Synthesis and Discovery ChengXiang (“Cheng”) Zhai Department of Computer.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
SFU, CMPT 741, Fall 2009, Martin Ester 418 Outlook Outline Trends in KDD research Graph mining and social network analysis Recommender systems Information.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Automating Discovery from Biomedical Texts Marti Hearst & Barbara Rosario UC Berkeley Agyinc Visit August 16, 2000.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
A Flexible Workbench for Document Analysis and Text Mining NLDB’2004, Salford, June Gulla, Brasethvik and Kaada A Flexible Workbench for Document.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Basic IR Concepts & Techniques ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Building Knowledge-Driven DSS and Mining Data
The LINDI Project Linking Information for New Discoveries UIs for building and reusing hypothesis seeking strategies. Statistical language analysis techniques.
Overview of Search Engines
B IOMEDICAL T EXT M INING AND ITS A PPLICATION IN C ANCER R ESEARCH Henry Ikediego
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
Knowledge Integration for Gene Target Selection Graciela Gonzalez, PhD Juan C. Uribe Contact:
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Chapter 14: Artificial Intelligence Invitation to Computer Science, C++ Version, Third Edition.
BeeSpace Informatics Research: From Information Access to Knowledge Discovery ChengXiang Zhai Nov. 7, 2007.
Information Systems: Databases Define the role of general information systems Describe the elements of a database management system (DBMS) Describe the.
Information Extraction from Literature Yue Lu BeeSpace Seminar Oct 24, 2007.
Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved. Decision Support Systems Chapter 10.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Text Mining Special Interest Group Stuart Murray, Wyeth Research Novartis Institute for Biomedical Research, Cambridge, MA 6-8 th October 2004.
Chapter 3 DECISION SUPPORT SYSTEMS CONCEPTS, METHODOLOGIES, AND TECHNOLOGIES: AN OVERVIEW Study sub-sections: , 3.12(p )
EADGENE and SABRE Post-Analyses Workshop 12-14th November 2008, Lelystad, Netherlands 1 François Moreews SIGENAE, INRA, Rennes Cytoscape.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Page 1 Alliver™ Page 2 Scenario Users Contents Properties Contexts Tags Users Context Listener Set of contents Service Reasoner GPS Navigator.
University of Illinois at Urbana-Champaign BeeSpace Navigator v4.0 and Gene Summarizer beespace.uiuc.edu `
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
EB3233 Bioinformatics Introduction to Bioinformatics.
BBN Technologies Copyright 2009 Slide 1 The S*QL Plugin for Cytoscape Visual Analytics on the Web of Linked Data Rusty (Robert J.) Bobrow Jeff Berliner,
Mining the Biomedical Research Literature Ken Baclawski.
Domain Adaptation for Biomedical Information Extraction Jing Jiang BeeSpace Seminar Oct 17, 2007.
A collaborative tool for sequence annotation. Contact:
Bioinformatics and Computational Biology
Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Introduction to biological molecular networks
Opportunities for Text Mining in Bioinformatics (CS591-CXZ Text Data Mining Seminar) Dec. 8, 2004 ChengXiang Zhai Department of Computer Science University.
Information Design Trends Unit Five: Delivery Channels Lecture 2: Portals and Personalization Part 2.
Artificial Intelligence: Research and Collaborative Possibilities a presentation by: Dr. Ernest L. McDuffie, Assistant Professor Department of Computer.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
High throughput biology data management and data intensive computing drivers George Michaels.
Artificial Intelligence
Development of the Amphibian Anatomical Ontology
A Researcher’s Workbench in 2020: Intelligent Information Systems for Knowledge Synthesis and Discovery ChengXiang (“Cheng”) Zhai Department of Computer.
Topics Covered in COSC 6340 Data models (ER, Relational, XML (short))
Data Warehousing and Data Mining
Topics Covered in COSC 6340 Data models (ER, Relational, XML)
CSE 635 Multimedia Information Retrieval
Course Summary ChengXiang “Cheng” Zhai Department of Computer Science
Supporting High-Performance Data Processing on Flat-Files
The Data Civilizer System
Presentation transcript:

BeeSpace Informatics Research: From Information Access to Knowledge Discovery ChengXiang Zhai Nov. 14, 2007

BeeSpace Technology: From V3 to V4 Literature Search & Navigation Query Docs Function Analysis Entities Relations ER Graph Mining Question Answers Knowledge Base Inference Engine Question Answers Expert Knowledge Genes Function

New Functions in V4 Massive Entity/Relation Extraction Graph Indexing and Mining Integration of Expert Knowledge & Reasoning Personalization & Info/Knowledge Sharing “Plug and Play” (PnP)

Massive Entity Recognition Class1: Small Variation (Dictionary/Ontology) –Organism, Anatomy, Biological Process, Pathway, Protein Family Class2: Medium Variation –Gene, cis Regulatory Element Class3: Large Variation –Phenotype, Behavior

Massive Relation Extraction Expression Location –the expression of a gene in some location (tissues, body parts) Homology/Orthology –one gene is homologous to another gene Biological process –one gene has some role in a biological process Genetic/Physical/Regulatory Interaction –one gene interacts with another gene in a certain fashion (3 types of relations) –a simple case: Protein-Protein Interaction (PPI)

Entity Relation Graph Mining The extracted entities and relations form a weighted graph Need to develop techniques to mine the graph for knowledge –Store graphs –Index graphs –Mining algorithms (neighbor finding, path finding, entity comparison, outlier detection, frequent subgraphs,….) –Mining language

Integration of Expert Knowledge How can we combine expert knowledge with knowledge extracted from literature? Possible strategies: –Interactive mining (human knowledge is used to guide the next step of mining) –Trainable programs (focused miner, targeting at certain kind of knowledge) –Inference-based integration

Inference-Based Discovery Encode all kinds of knowledge in the same knowledge representation language Perform logic inferences Example –Regulate (GeneA, GeneB, ContextC). [Literature mining] –SeqSimilar(GeneA,GeneA’) [Sequence mining] –Regulate(X,Y,C)  Regulate(Z,Y,C) & SeqSimilar(X,Z) [Human knowledge] –  Regulate(GeneA’,GeneB,ContextC) –ADD: InPathway(GeneB, P1) –InPathway(X,P)  Regulate(X,Y,C) & InPathway(Y,P) [Human knowledge] –  InvolvedInPathway(GeneA’,P1)

Personalization & Workflow Management Different users have different tasks  personalization –Tracking a user’s history and learning a user’s preferences –Exploiting the preferences to customize/optimize the support –Allowing a user to define/build special function modules Workflow management

Information/Knowledge Sharing Different users may perform similar tasks  Information/Knowledge sharing –Capturing user intentions –Recommend information/knowledge –How do we solve the problem of privacy? Massive collaborations? –Each user contributes a small amount of knowledge –All the knowledge can be combined to infer new knowledge

Plug and Play Users’ tasks vary significantly Need flexible combinations of basic modules Need to move toward a “discovery workbench” –How do we design basic modules? –How do we support synthesis of information and knowledge?

BeeSpace V4 Literature Search & Navigation Text Mining Entities Relations ER Graph Mining Knowledge Base Inference Engine Expert Knowledge Vertical Search Services PnP Function Analyzers Customized Knowledge Base User

Discussion Task Model? PnP Modules? Massive Collaboration?

BeeSpace V4: System Architecture Literature Search & Navigation Entities Relations ER Graph Mining Machine Learning NLP Expert Knowledge Special Search PnP Function Analyzers User Information Extraction User Modeling & Personalization Topic Modelng NCBI Genome Databases … Hypothesis Knowledge Base Inference Engine User Interface/ Workflow Manager

BeeSpace V4: System Architecture Literature Search & Navigation Entities Relations ER Graph Mining Machine Learning NLP Expert Knowledge Special Search PnP Function Analyzers User Information Extraction User Modeling & Personalization Topic Modelng NCBI Genome Databases … Hypothesis Knowledge Base Inference Engine User Interface/ Workflow Manager Yue Peixiang Xin, Xu, Yue Xin, Xu, Moushumi Peixiang Yuanhua Xu, Yue Moushumi Yuanhua Xin, Yuanhua Yuanhua, Moushumi Yue, Xin, Moushumi

Modules Navigation & Search (Improve V3) [Yuanhua] Information Extraction [Yue] ER Graph Mining [Peixiang] Specialized Search [Xu] Function Analyzers [Xin] User Modeling, Personalization, Workflow [Yuanhua] Inference Engine [Yue]

Informatics Research Themes Specialized Search –Hypothesis search Information Extraction –Entities, relations Graph Mining –Indexing, query language, mining algorithms Function analyzers –Gene set annotator Personalization –User model Inference engine –Knowledge representation language, uncertainty

Example of Interactive Graph Mining Gene A2 Gene A1 Gene A4 Gene A3 Gene A4’ Gene A1’ Behavior B4Behavior B3 Behavior B2 Behavior B1 isa Co-occur-fly Orth-mos Co-occur-mos Co-occur-bee Co-occur-fly Reg orth Reg 1.X=NeighborOf(B4, Behavior, {co-occur,isa}) {B1,B2,B3} 2. Y=NeighborOf(X, Gene, {c-occur, orth} {A1,A1’,A2,A3} 3. Y=Y + {A5, A6} {A1,A1’, A2, A3,A5,A6} 4. Z=NeighborOf(Y, Gene, {reg}) {A4, A4’} Gene A5 Reg X= PathBetween({A4,A4’}, B4, {co-occur, reg,isa})

Inference-Based Discovery Encode all kinds of knowledge in the same knowledge representation language Perform logic inferences Example –Regulate (GeneA, GeneB, ContextC). [Literature mining] –SeqSimilar(GeneA,GeneA’) [Sequence mining] –Regulate(X,Y,C)  Regulate(Z,Y,C) & SeqSimilar(X,Z) [Human knowledge] –  Regulate(GeneA’,GeneB,ContextC) –ADD: InPathway(GeneB, P1) –InPathway(X,P)  Regulate(X,Y,C) & InPathway(Y,P) [Human knowledge] –  InvolvedInPathway(GeneA’,P1)

PnP Function Analyzers Basic objects –GeneSet, DocSet, SentSet, TermSet Basic operators –Gene summarizer –GeneSet annotator –…

EntitySet GeneSet BehaviorSet … Doc/SentSet ModelOrg …. Splitter Filter/Attractor Converter …. GeneSearch: GeneSet  Doc/SentSet DocSplitter: Doc/SentSet  {Set1, …,Setk}