Lars Juhl Jensen Biomedical text mining. exponential growth.

Slides:



Advertisements
Similar presentations
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Advertisements

April 21, 2005EPSRC E-Science Meeting, NeSC Real-time Text Mining for the Biomedical Literature a collaboration between Discovery Net & myGrid Rob Gaizauskas.
THE BIOLOGY OF CANCER A group of diseases identified by uncontrolled cell growth and proliferation VirusesGenetic make-upImmune statusRadiationCarcinogens.
FP7 meeting - Gent - Carlos Rodríguez - April 18 WP4: Conceptual Mining from Text for Knowledge Engineering State of the Art WP Coordinators: Alfonso Valencia.
Mining External Resources for Biomedical IE Why, How, What Malvina Nissim
Biological literature mining
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
13 th September 2007 UK e-Science All Hands Meeting Text Mining Services to Support e-Research Brian Rea and Sophia Ananiadou National Centre for Text.
U. S. National Library of Medicine NLM Indexing Initiative Tools for NLP: MetaMap and the Medical Text Indexer Natural Language Processing: State of the.
Literature Mining for the Biologists Santhosh J. Eapen
G1 Cell Cycle Eric Niederhoffer SIU-SOM G0 6-8 h DNA, RNA, Protein 3-4 h RNA, Protein 1 h Mitosis, Cytokinesis S G2 Cyc D’s CDK4,6 Cyc B/A CDK1 Cyc A CDK2.
The STRING database Michael Kuhn EMBL Heidelberg.
STRING Modeling of biological systems through cross-species data integration.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Biomedical Information Extraction. Outline Intro to biomedical information extraction PASTA [Demetriou and Gaizauskas] Biomedical named entities Name.
1 CS 502: Computing Methods for Digital Libraries Lecture 12 Information Retrieval II.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Literature Mining and Systems Biology Lars Juhl Jensen EMBL.
BioText Infrastructure Ariel Schwartz Gaurav Bhalotia 10/07/2002.
Class Projects. Future Work and Possible Project Topic in Gene Regulatory network Learning from multiple data sources; Learning causality in Motifs; Learning.
Mining text and data on chemicals Lars Juhl Jensen.
Aguda & Friedman Chapter 6 The Eukaryotic Cell-Cycle Engine.
Mining the Medical Literature Chirag Bhatt October 14 th, 2004.
B IOMEDICAL T EXT M INING AND ITS A PPLICATION IN C ANCER R ESEARCH Henry Ikediego
1 iProLINK: An integrated protein resource for literature mining and literature-based curation 1. Bibliography mapping - UniProt mapped citations 2. Annotation.
ELN – Natural Language Processing Giuseppe Attardi
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
RLIMS-P: A Rule-Based Literature Mining System for Protein Phosphorylation Hu ZZ 1, Yuan X 1, Torii M 2, Vijay-Shanker K 3, and Wu CH 1 1 Protein Information.
Defining Text Mining Preprocessing Transforming unstructured data stored in document collections into a more explicitly structured intermediate format.
Outline Quick review of GS Current problems with GS Our solutions Future work Discussion …
IProLINK – A Literature Mining Resource at PIR (integrated Protein Literature INformation and Knowledge ) Hu ZZ 1, Liu H 2, Vijay-Shanker K 3, Mani I 4,
Finding High-frequent Synonyms of a Domain- specific Verb in English Sub-language of MEDLINE Abstracts Using WordNet Chun Xiao and Dietmar Rösner Institut.
Flexible Text Mining using Interactive Information Extraction David Milward
BioLINK Talks BioLINK,Detroit, June 24 (Edinburgh July 11) Linking Literature, Information and Knowledge for Biology.
Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
Correlating Knowledge Using NLP: Relationships between the concepts of blood cancers, stem cell transplantation, and biomarkers Katy Zou and Weizhong Zhu.
GTRI.ppt-1 NLP Technology Applied to e-discovery Bill Underwood Principal Research Scientist “The Current Status and.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
Copyright OpenHelix. No use or reproduction without express written consent1.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 3. Word Association.
Distribution of information in biomedical abstracts and full- text publications M. J. Schuemie et al. Dept. of Medical Informatics, Erasmus University.
Copyright OpenHelix. No use or reproduction without express written consent1.
Data Mining: Text Mining
5/6/04Biolink1 Integrated Annotation for Biomedical IE Mining the Bibliome: Information Extraction from the Biomedical Literature NSF ITR grant EIA
Automatically Identifying Candidate Treatments from Existing Medical Literature Catherine Blake Information & Computer Science University.
Retrospective study of a gene by mining texts : The Hepcidin use-case Fouzia Moussouni-Marzolf.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Document Indexing in Large Medical Collections.
Literature Mining and Database Annotation of Protein Phosphorylation Using a Rule-based System Z. Z. Hu 1, M. Narayanaswamy 2, K. E. Ravikumar 2, K. Vijay-Shanker.
Open access – making the most of biomedical literature mining Lars Juhl Jensen EMBL Heidelberg.
Consumer Health Question Answering Systems Rohit Chandra Sourabh Singh
Johns Hopkins Library Resources and Services Victoria Goode, MLIS Clinical Informationist Welch Medical Library Tahirah Akbar-Williams,
An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.
BioCreAtIvE Critical Assessment for Information Extraction in Biology Granada, Spain, March28-March 31, 2004 Task 2: Functional annotation of gene products.
Medical informatics Linking diseases, drugs, and adverse reactions Lars Juhl Jensen.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Genomics research paper presentation
Biomedical Text Mining and Its Applications
Protein association networks with STRING
STRING Large-scale data and text mining
Multimedia Information Retrieval
STRING Protein networks from data and text mining
CSE 635 Multimedia Information Retrieval
Batyr Charyyev.
CS246: Information Retrieval
Network biology An introduction to STRING and Cytoscape
Information Retrieval and Web Design
Presentation transcript:

Lars Juhl Jensen Biomedical text mining

exponential growth

~45 seconds per paper

information retrieval

named entity recognition

augmented browsing

text corpora

information extraction

information retrieval

find the relevant papers

ad hoc retrieval

user-specified query

“yeast AND cell cycle”

PubMed

indexing

fast lookup

stemming

word endings

dynamic query expansion

MeSH terms

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation

no tool will find that

named entity recognition

computer

as smart as a dog

teach it specific tricks

identify the concepts

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation

comprehensive lexicon

proteins

chemicals

compartments

tissues

diseases

organisms

CDC2

cyclin dependent kinase 1

orthographic variation

upper- and lower-case

CDC2

Cdc2

spaces and hyphens

cyclin dependent kinase 1

cyclin-dependent kinase 1

prefixes and postfixes

CDC2

hCDC2

“black list”

SDS

scalable implementation

text corpora

>10 km <10 hours

most use Medline

~22 million abstracts

few use full-text articles

no access

PDF files

layout-aware extraction

millions of full-text articles

information extraction

formalize the facts

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation

two approaches

co-mentioning

counting

within documents

within paragraphs

within sentences

co-mentioning score

NLP Natural Language Processing

grammatical analysis

part-of-speech tagging

multiword detection

semantic tagging

sentence parsing

Gene and protein names Cue words for entity recognition Verbs for relation extraction [ nxexpr The expression of [ nxgene the cytochrome genes [ nxpg CYC1 and CYC7]]] is controlled by [ nxpg HAP1]

extract stated facts

high precision

poor recall

Exercise Go to Find TYMS disease associations Inspect the text-mining evidence Look for examples of synonym usage Find genes linked to colorectal cancer

thank you!