Download presentation
Presentation is loading. Please wait.
1
Mining text and data on chemicals Lars Juhl Jensen
2
three parts
3
text mining
4
data integration
5
medical records
6
Part 1 text mining
7
exponential growth
10
some things are constant
12
~45 seconds per paper
13
information retrieval
14
find the relevant papers
15
still too much to read
16
computer
17
as smart as a dog
18
teach it specific tricks
21
named entity recognition
22
identify the concepts
23
small molecules
24
proteins
25
diseases
26
comprehensive lexicon
27
synonyms
28
orthographic variation
29
“black list”
30
unfortunate names
31
Reflect
32
augmented browsing
33
browser add-on
34
Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009 O’Donoghue et al., Journal of Web Semantics, 2010
35
Firefox
36
Internet Explorer
37
Google Chrome
38
Safari
39
Utopia Documents
40
web services
41
collaboration
45
SciVerse
51
information extraction
52
formalize the facts
53
co-mentioning
54
NLP Natural Language Processing
55
Gene and protein names Cue words for entity recognition Verbs for relation extraction [ nxexpr The expression of [ nxgene the cytochrome genes [ nxpg CYC1 and CYC7]]] is controlled by [ nxpg HAP1]
56
Part 2 data integration
57
STITCH
58
Kuhn et al., Nucleic Acids Research, 2012
59
~300,000 small molecules
60
~2.6 million proteins
61
1100+ genomes
62
experimental data
63
physical binding
64
chemical–protein
65
protein–protein
67
curated knowledge
68
drug targets
69
complexes
70
pathways
71
Letunic & Bork, Trends in Biochemical Sciences, 2008
72
text mining
73
co-mentioning
75
NLP Natural Language Processing
77
many data types
78
many databases
79
different formats
80
different identifiers
81
variable quality
82
not comparable
83
spread over many genomes
84
quality scores
85
von Mering et al., Nucleic Acids Research, 2005
86
calibrate vs. gold standard
87
von Mering et al., Nucleic Acids Research, 2005
88
probabilistic scores
89
orthology transfer
90
combine the evidence
91
Part 3 patient records
92
a hard problem
93
in Danish
94
by busy doctors
95
about psychiatric patients
96
no lexicon
97
acronyms
98
typos
99
delusions
100
domain specific system
101
patient record excerpt
102
F20 F200 Negation Family
103
medication
104
adverse drug events
105
diagnoses
106
pharmacovigilance
107
patient stratification
108
Roque et al., PLoS Computational Biology, 2011
109
disease comorbidity
110
Roque et al., PLoS Computational Biology, 2011
111
DNA sequencing
112
genotype
113
phenotype
114
Acknowledgments Reflect Sune Frankild Heiko Horn Evangelos Pafilis Juan-Carlos Silla-Castro Michael Kuhn Reinhardt Schneider Sean O’Donoghue STITCH Michael Kuhn Damian Szklarczyk Andrea Franceschini Milan Simonovic Alexander Roth Pablo Minguez Tobias Doerks Manuel Stark Christian von Mering Peer Bork EPJ-mining Francisco S Roque Peter B Jensen Robert Eriksson Henriette Schmock Marlene Dalgaard Massimo Andreatta Thomas Hansen Karen Søeby Søren Bredkjær Anders Juul Thomas Werge Søren Brunak
115
larsjuhljensen
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.