Open access – making the most of biomedical literature mining Lars Juhl Jensen EMBL Heidelberg
why open access?
why biomedicine?
why literature mining?
M EDLINE
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
information retrieval
finding the papers
if you can’t find them …
… they don’t exist!
ad hoc retrieval
users-specified query
“yeast AND cell cycle”
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
M EDLINE
abstracts
complete papers
tricks
stemming
yeast / yeasts
synonyms
yeast / S. cerevisiae
dynamic query expansion
next logical step
ontologies
annotation
Cdc28 yeast gene
Cdc28 cell cycle
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
“yeast AND cell cycle”
entity recognition
identifying the substance(s)
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
if you can’t find them …
… they don’t exist!
abstracts
M EDLINE
tricks
good synonyms list
manual curation
orthographic variation
CDC28
Cdc28p
disambiguation
hairy
SDS
Cdc2
information extraction
formalizing the facts
co-mentioning
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
NLP Natural Language Processing
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
Gene and protein names Cue words for entity recognition Verbs for relation extraction [ nxexpr The expression of [ nxgene the cytochrome genes [ nxpg CYC1 and CYC7]]] is controlled by [ nxpg HAP1]
new discoveries
text mining
temporal trends
buzzwords
grant applications
global correlations
RegulatesRegulated P < 9 10 -9
transcriptional networks
PhosphorylatesPhosphorylated P < 2 10 -7
signal cascades
ExpressionPhosphorylation P < 5 10 -4
integration of text and data
network mining
linking genes to diseases
multifactorial diseases
genotype to phenotype
where are we now?
abstracts
complete papers
restricted access
open access
the tools are there
now we need the text!
Acknowledgments Jasmin Saric Rossitza Ouzounova Michael Kuhn Isabel Rojas Miguel Andrade Peer Bork