Download presentation
Presentation is loading. Please wait.
Published byAngelina Ward Modified over 9 years ago
1
Opportunities for Text Mining in Bioinformatics (CS591-CXZ Text Data Mining Seminar) Dec. 8, 2004 ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign
2
Why Biology Text Mining? Strong motivations from biology side –Difficulty for biologists to access literature No theory in biology, so we must keep all literature “alive” Observations about the same biology mechanism may be described in different terms (e.g., due to different perspectives of study) –Many unanswered research questions –Text mining may help better organize, link biology literature, and answer simple questions… (e.g., what do we know about this gene? )
3
Why Biology Text Mining? (cont.) Potentially high impact from CS side –Any “discovery” from biology text could be potentially significant –Biology text is relatively “easy” for mining Literature is cleaner (compared with web data) Biology text often has many annotations Many other kinds of biology data can be exploited (e.g., DNA/Protein sequences, gene expression information, metabolic networks) –Simple techniques may work
4
Characteristics of Biology Text Large number of entities (e.g., genes, proteins) that have well-defined semantics No standard for terminology (inconsistencies) Ambiguities (e.g., many acronyms) Synonyms High complexity in phrases and sentence structures
5
Research Topics General goal: Applying known text mining techniques to help biology research Problem 1: Data/Information Integration –How can we integrate text information (discovering terminology linkages) –How can we link text with databases (semantic interpretations of text on top of entities/relations in DB, e.g., entity extraction) –How can we integrate biology DBs (many fields are text) Problem 2: Functional annotations –How can we annotate a biological entity (e.g., a gene) with functional information extracted from literature –How can we annotate a set of related genes with functional information –How can we exploit the ontologies/thesauri in biology?
6
Research Topics (cont.) Problem 3: Data/Information Cleanup & Curation –How can we detect suspicious data/information in existing databases? –How can we automate many manual tasks of database curation? Problem 4: Research question answering –How can we answer simply research questions? (e.g., what functional connections are there between these two genes?) –How can we support exploratory access and digest of literature information? (e.g., a biology research workbench)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.