Opportunities for Text Mining in Bioinformatics (CS591-CXZ Text Data Mining Seminar) Dec. 8, 2004 ChengXiang Zhai Department of Computer Science University.

Slides:



Advertisements
Similar presentations
FP7 meeting - Gent - Carlos Rodríguez - April 18 WP4: Conceptual Mining from Text for Knowledge Engineering State of the Art WP Coordinators: Alfonso Valencia.
Advertisements

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Introduction to IR Research ChengXiang Zhai Department of Computer.
Application of Unstructured Learning in Computational Biology Tony C Smith Department of Computer Science University of Waikato
Contents of this Talk [Used as intro to Genome Databases Seminar, 2002] Overview of bioinformatics Motivations for genome databases Analogy of virus reverse-eng.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Interoperation of Molecular Biology Databases Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International Menlo Park, CA
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Workshop in Bioinformatics 2010 What is it ? The goals of the class… How we do it… What’s in the class Why should I take the class..
Pathways and Networks for Realists Barry Smith 1.
Introduction to Bioinformatics (Lecture for CS498-CXZ Algorithms in Bioinformatics) Aug. 25, 2005 ChengXiang Zhai Department of Computer Science University.
Algorithms in Computational Biology Tanya Berger-Wolf Compbio.cs.uic.edu/~tanya/teaching/CompBio January 13, 2006.
Mining the Medical Literature Chirag Bhatt October 14 th, 2004.
B IOMEDICAL T EXT M INING AND ITS A PPLICATION IN C ANCER R ESEARCH Henry Ikediego
Cis-Regulatory/ Text Mining Interface Discussion.
1 iProLINK: An integrated protein resource for literature mining and literature-based curation 1. Bibliography mapping - UniProt mapped citations 2. Annotation.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Pick a Good IR Research Problem ChengXiang Zhai Department of Computer.
BeeSpace Informatics Research: From Information Access to Knowledge Discovery ChengXiang Zhai Nov. 7, 2007.
Concept Clustering, Summarization and Annotation Qiaozhu Mei.
Bioinformatics Dr. Víctor Treviño BT4007
Bioinformatics and medicine: Are we meeting the challenge?
Chapter 1 Introduction to Data Mining
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Frame an IR Research Problem and Form Hypotheses ChengXiang Zhai Department.
PattArAn – From Annotation Triplets to Sentence Fingerprints Motivation Motivation  Scientific concepts are annotated with controlled vocabulary (CV)
Flexible Text Mining using Interactive Information Extraction David Milward
MMAP: mouse Metabolomics Analysis Platform Preeti Bais 09/09/2014.
Automatically Generating Gene Summaries from Biomedical Literature (To appear in Proceedings of PSB 2006) X. LING, J. JIANG, X. He, Q.~Z. MEI, C.~X. ZHAI,
Bioinformatics: Theory and Practice – Striking a Balance (a plea for teaching, as well as doing, Bioinformatics) Practice (Molecular Biology) Theory: Central.
Real World IR Challenges (CS598-CXZ Advanced Topics in IR Presentation) Jan. 20, 2005 ChengXiang Zhai Department of Computer Science University of Illinois,
Agent-based methods for translational cancer multilevel modelling Sylvia Nagl PhD Cancer Systems Science & Biomedical Informatics UCL Cancer Institute.
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
Playing Biology ’ s Name Game: Identifying Protein Names In Scientific Text Daniel Hanisch, Juliane Fluck, Heinz-Theodor Mevissen and Ralf Zimmer Pac Symp.
University of Illinois at Urbana-Champaign BeeSpace Navigator v4.0 and Gene Summarizer beespace.uiuc.edu `
Introduction to Bioinformatics (Lecture for CS397-CXZ Algorithms in Bioinformatics) Jan. 21, 2004 ChengXiang Zhai Department of Computer Science University.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in.
BeeSpace Informatics Research: From Information Access to Knowledge Discovery ChengXiang Zhai Nov. 14, 2007.
Bioinformatics lectures at Rice University Li Zhang Lecture 11: Networks and integrative genomic analysis-3 Genomic data
Mining the Biomedical Research Literature Ken Baclawski.
A collaborative tool for sequence annotation. Contact:
An approach to carry out research and teaching in Bioinformatics in remote areas Alok Bhattacharya Centre for Computational Biology & Bioinformatics JAWAHARLAL.
Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.
Annotating Gene List From Literature Xin He Department of Computer Science UIUC.
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 龙星计划课程 : 信息检索 Course Summary ChengXiang Zhai ( 翟成祥 ) Department of.
Pairwise Sequence Alignment (cont.) (Lecture for CS397-CXZ Algorithms in Bioinformatics) Feb. 4, 2004 ChengXiang Zhai Department of Computer Science University.
Machete: Charting Excursions through Bioscience Literature Shannon Bradshaw 1 and Marc Light 2 1 Department of Management Sciences 2 School of Library.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Overview  Introduction  Biological network data  Text mining  Gene Ontology  Expression data basics  Expression, text mining, and GO  Modules and.
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
TDM in the Life Sciences Application to Drug Repositioning *
Biological Databases By: Komal Arora.
Databases, Ontologies and Text mining Session Introduction Part 2
School of Computer Science & Engineering
Development of the Amphibian Anatomical Ontology
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
Annotation: linking literature to gene products
Lecture 16: Probabilistic Databases
PIR: Protein Information Resource
Data Warehousing and Data Mining
An ecosystem of contributions
Lecture 7: Biological Network Crosstalk Y. Z
Batyr Charyyev.
Course Summary ChengXiang “Cheng” Zhai Department of Computer Science
Web Mining Department of Computer Science and Engg.
Data Mining.
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
Presentation transcript:

Opportunities for Text Mining in Bioinformatics (CS591-CXZ Text Data Mining Seminar) Dec. 8, 2004 ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign

Why Biology Text Mining? Strong motivations from biology side –Difficulty for biologists to access literature No theory in biology, so we must keep all literature “alive” Observations about the same biology mechanism may be described in different terms (e.g., due to different perspectives of study) –Many unanswered research questions –Text mining may help better organize, link biology literature, and answer simple questions… (e.g., what do we know about this gene? )

Why Biology Text Mining? (cont.) Potentially high impact from CS side –Any “discovery” from biology text could be potentially significant –Biology text is relatively “easy” for mining Literature is cleaner (compared with web data) Biology text often has many annotations Many other kinds of biology data can be exploited (e.g., DNA/Protein sequences, gene expression information, metabolic networks) –Simple techniques may work

Characteristics of Biology Text Large number of entities (e.g., genes, proteins) that have well-defined semantics No standard for terminology (inconsistencies) Ambiguities (e.g., many acronyms) Synonyms High complexity in phrases and sentence structures

Research Topics General goal: Applying known text mining techniques to help biology research Problem 1: Data/Information Integration –How can we integrate text information (discovering terminology linkages) –How can we link text with databases (semantic interpretations of text on top of entities/relations in DB, e.g., entity extraction) –How can we integrate biology DBs (many fields are text) Problem 2: Functional annotations –How can we annotate a biological entity (e.g., a gene) with functional information extracted from literature –How can we annotate a set of related genes with functional information –How can we exploit the ontologies/thesauri in biology?

Research Topics (cont.) Problem 3: Data/Information Cleanup & Curation –How can we detect suspicious data/information in existing databases? –How can we automate many manual tasks of database curation? Problem 4: Research question answering –How can we answer simply research questions? (e.g., what functional connections are there between these two genes?) –How can we support exploratory access and digest of literature information? (e.g., a biology research workbench)