Open access – making the most of biomedical literature mining Lars Juhl Jensen EMBL Heidelberg.

Slides:



Advertisements
Similar presentations
STRING Prediction of protein networks through integration of diverse large-scale data sets Lars Juhl Jensen EMBL Heidelberg.
Advertisements

Announcements Monday, April 16: the cell cycle, pp Wednesday, April 18: protein synthesis, pp Friday, April 20: protein targeting,
Regulators of Cell Cycle Progression (Literature Review) Prepared by Cai Chunhui.
THE BIOLOGY OF CANCER A group of diseases identified by uncontrolled cell growth and proliferation VirusesGenetic make-upImmune statusRadiationCarcinogens.
FP7 meeting - Gent - Carlos Rodríguez - April 18 WP4: Conceptual Mining from Text for Knowledge Engineering State of the Art WP Coordinators: Alfonso Valencia.
Cell and Molecular Biology Behrouz Mahmoudi Cell cycle 1.
Biological literature mining
Pathways analysis Iowa State Workshop 11 June 2009.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Literature Mining for the Biologists Santhosh J. Eapen
G1 Cell Cycle Eric Niederhoffer SIU-SOM G0 6-8 h DNA, RNA, Protein 3-4 h RNA, Protein 1 h Mitosis, Cytokinesis S G2 Cyc D’s CDK4,6 Cyc B/A CDK1 Cyc A CDK2.
The STRING database Michael Kuhn EMBL Heidelberg.
Developing a protein-interactions ontology Esther Ratsch European Media Laboratory.
STRING Modeling of biological systems through cross-species data integration.
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
What is an ontology and Why should you care? Barry Smith with thanks to Jane Lomax, Gene Ontology Consortium 1.
Literature Mining and Systems Biology Lars Juhl Jensen EMBL.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Class Projects. Future Work and Possible Project Topic in Gene Regulatory network Learning from multiple data sources; Learning causality in Motifs; Learning.
Mining text and data on chemicals Lars Juhl Jensen.
Aguda & Friedman Chapter 6 The Eukaryotic Cell-Cycle Engine.
Pathway Informatics 6 th July, 2015 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services Health Sciences Library System University of.
B IOMEDICAL T EXT M INING AND ITS A PPLICATION IN C ANCER R ESEARCH Henry Ikediego
Modeling Functional Genomics Datasets CVM Lessons 4&5 10 July 2007Bindu Nanduri.
1 iProLINK: An integrated protein resource for literature mining and literature-based curation 1. Bibliography mapping - UniProt mapped citations 2. Annotation.
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
Moving beyond free text. Authors Scientist does research Scientist publishes research results in journal article Old Paradigm:
Knowledge Integration for Gene Target Selection Graciela Gonzalez, PhD Juan C. Uribe Contact:
1 hr hr 8 hr 4-6 hr (M) Spindle-assembly checkpoint - confirms that all the chromosomes are properly attached to the spindles. Resting phase (G0)
IProLINK – A Literature Mining Resource at PIR (integrated Protein Literature INformation and Knowledge ) Hu ZZ 1, Liu H 2, Vijay-Shanker K 3, Mani I 4,
Biomedical Databases & Tools Rolando Garcia-Milian Biomedical & Health Information Services Department Health Sciences Center Library.
Flexible Text Mining using Interactive Information Extraction David Milward
Lars Juhl Jensen Biomedical text mining. exponential growth.
Cell Signaling Ontology Takako Takai-Igarashi and Toshihisa Takagi Human Genome Center, Institute of Medical Science, University of Tokyo.
Protein Information Resource Protein Information Resource, 3300 Whitehaven St., Georgetown University, Washington, DC Contact
Ontology based analyses methods ++ develop a grammar for making productions using mf, bp, cl: –derive a higher level grammar for next level of productions.
Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in.
Gene Nomenclature Budding Yeast: CDC28 = wild type gene cdc28 = recessive mutant allele Cdc28 or Cdc28p = CDC28 protein Fission Yeast: cdc2 = the wild.
Chapter 12: The Cell Cycle
What is an Ontology? A representation of knowledge in a domain In theory Thomas Gruber (1993) “An ontology is a formal, explicit specification of a shared.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
NLP pipeline for protein mutation knowledgebase construction Jonas B. Laurila, Nona Naderi, René Witte, Christopher J.O. Baker.
1 MedAT: Medical Resources Annotation Tool Monika Žáková *, Olga Štěpánková *, Taťána Maříková * Department of Cybernetics, CTU Prague Institute of Biology.
A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,
Towards a Top-Domain Ontology for Linking Biomedical Ontologies Holger Stenzhorn a,b Elena Beißwanger c Stefan Schulz a a Department of Medical Informatics,
Opportunities for Text Mining in Bioinformatics (CS591-CXZ Text Data Mining Seminar) Dec. 8, 2004 ChengXiang Zhai Department of Computer Science University.
Text Mining and Knowledge Management Junichi Tsujii GENIA Project, Kototoi Project ( tokyo.ac.jp/GENIA/) Computer Science, University.
Lecture 10: Cell cycle Dr. Mamoun Ahram Faculty of Medicine
Biomax Informatics AG Bioinformatics designed with you in mind. FunCat TM, a controlled vocabulary encompassing the biology of prokaryotes, plants and.
Retrospective study of a gene by mining texts : The Hepcidin use-case Fouzia Moussouni-Marzolf.
Literature Mining and Database Annotation of Protein Phosphorylation Using a Rule-based System Z. Z. Hu 1, M. Narayanaswamy 2, K. E. Ravikumar 2, K. Vijay-Shanker.
Joined up ontologies: incorporating the Gene Ontology into the UMLS.
DNA Replication ORC anneals to origin ORC recruits MCM MCM recruits Cdc45p Cdc45p recruits pol  /primase complex RFC displaces pol  and recruits PCNA.
Computational Biology Signaling networks and drug repositioning Lars Juhl Jensen.
Compiling Information and Inferring Useful Knowledge for Systems Biology by Text Mining the Literature Anália Lourenço IBB – Institute for Biotechnology.
MOLECULAR CELL BIOLOGY
Protein association networks with STRING
STRING Large-scale data and text mining
Developing a protein-interactions ontology
STRING Protein networks from data and text mining
PIR: Protein Information Resource
Cyclin activity time Cdk cyclin Cdk cyclin Cdk cyclin Cdk cyclin Cdk
Chapter 12: The Cell Cycle
Dan Gordon  Gastroenterology  Volume 114, Issue 4, (April 1998)
Chapter 12: The Cell Cycle
Chapter 12: The Cell Cycle
Network biology An introduction to STRING and Cytoscape
AP Biology The Cell Cycle.
Chapter 12: The Cell Cycle
Chapter 12: The Cell Cycle
Presentation transcript:

Open access – making the most of biomedical literature mining Lars Juhl Jensen EMBL Heidelberg

why open access?

why biomedicine?

why literature mining?

M EDLINE

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation

information retrieval

finding the papers

if you can’t find them …

… they don’t exist!

ad hoc retrieval

users-specified query

“yeast AND cell cycle”

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation

M EDLINE

abstracts

complete papers

tricks

stemming

yeast / yeasts

synonyms

yeast / S. cerevisiae

dynamic query expansion

next logical step

ontologies

annotation

Cdc28  yeast gene

Cdc28  cell cycle

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation

“yeast AND cell cycle”

entity recognition

identifying the substance(s)

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation

if you can’t find them …

… they don’t exist!

abstracts

M EDLINE

tricks

good synonyms list

manual curation

orthographic variation

CDC28

Cdc28p

disambiguation

hairy

SDS

Cdc2

information extraction

formalizing the facts

co-mentioning

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation

NLP Natural Language Processing

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation

Gene and protein names Cue words for entity recognition Verbs for relation extraction [ nxexpr The expression of [ nxgene the cytochrome genes [ nxpg CYC1 and CYC7]]] is controlled by [ nxpg HAP1]

new discoveries

text mining

temporal trends

buzzwords

grant applications

global correlations

RegulatesRegulated P < 9  10 -9

transcriptional networks

PhosphorylatesPhosphorylated P < 2  10 -7

signal cascades

ExpressionPhosphorylation P < 5  10 -4

integration of text and data

network mining

linking genes to diseases

multifactorial diseases

genotype to phenotype

where are we now?

abstracts

complete papers

restricted access

open access

the tools are there

now we need the text!

Acknowledgments Jasmin Saric Rossitza Ouzounova Michael Kuhn Isabel Rojas Miguel Andrade Peer Bork