Download presentation
Presentation is loading. Please wait.
Published byClara Quinn Modified over 9 years ago
1
Lars Juhl Jensen Biomedical text mining
2
exponential growth
5
~45 seconds per paper
6
information retrieval
7
named entity recognition
8
augmented browsing
9
text corpora
10
information extraction
11
information retrieval
12
find the relevant papers
13
ad hoc retrieval
14
user-specified query
15
“yeast AND cell cycle”
16
PubMed
18
indexing
19
fast lookup
20
stemming
21
word endings
22
dynamic query expansion
23
MeSH terms
24
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
25
no tool will find that
26
named entity recognition
27
computer
28
as smart as a dog
29
teach it specific tricks
32
identify the concepts
33
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
34
comprehensive lexicon
35
proteins
36
chemicals
37
compartments
38
tissues
39
diseases
40
organisms
41
CDC2
42
cyclin dependent kinase 1
43
orthographic variation
44
upper- and lower-case
45
CDC2
46
Cdc2
47
spaces and hyphens
48
cyclin dependent kinase 1
49
cyclin-dependent kinase 1
50
prefixes and postfixes
51
CDC2
52
hCDC2
53
“black list”
54
SDS
55
scalable implementation
56
text corpora
57
>10 km <10 hours
58
most use Medline
59
~22 million abstracts
60
few use full-text articles
61
no access
62
PDF files
64
layout-aware extraction
65
millions of full-text articles
66
information extraction
67
formalize the facts
68
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
69
two approaches
70
co-mentioning
71
counting
72
within documents
73
within paragraphs
74
within sentences
75
co-mentioning score
76
NLP Natural Language Processing
77
grammatical analysis
78
part-of-speech tagging
79
multiword detection
80
semantic tagging
81
sentence parsing
82
Gene and protein names Cue words for entity recognition Verbs for relation extraction [ nxexpr The expression of [ nxgene the cytochrome genes [ nxpg CYC1 and CYC7]]] is controlled by [ nxpg HAP1]
83
extract stated facts
84
high precision
85
poor recall
86
Exercise Go to http://diseases.jensenlab.org Find TYMS disease associations Inspect the text-mining evidence Look for examples of synonym usage Find genes linked to colorectal cancer
87
thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.