Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lars Juhl Jensen Biomedical text mining. exponential growth.

Similar presentations


Presentation on theme: "Lars Juhl Jensen Biomedical text mining. exponential growth."— Presentation transcript:

1 Lars Juhl Jensen Biomedical text mining

2 exponential growth

3

4

5 ~45 seconds per paper

6 information retrieval

7 named entity recognition

8 augmented browsing

9 text corpora

10 information extraction

11 information retrieval

12 find the relevant papers

13 ad hoc retrieval

14 user-specified query

15 “yeast AND cell cycle”

16 PubMed

17

18 indexing

19 fast lookup

20 stemming

21 word endings

22 dynamic query expansion

23 MeSH terms

24 Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation

25 no tool will find that

26 named entity recognition

27 computer

28 as smart as a dog

29 teach it specific tricks

30

31

32 identify the concepts

33 Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation

34 comprehensive lexicon

35 proteins

36 chemicals

37 compartments

38 tissues

39 diseases

40 organisms

41 CDC2

42 cyclin dependent kinase 1

43 orthographic variation

44 upper- and lower-case

45 CDC2

46 Cdc2

47 spaces and hyphens

48 cyclin dependent kinase 1

49 cyclin-dependent kinase 1

50 prefixes and postfixes

51 CDC2

52 hCDC2

53 “black list”

54 SDS

55 scalable implementation

56 text corpora

57 >10 km <10 hours

58 most use Medline

59 ~22 million abstracts

60 few use full-text articles

61 no access

62 PDF files

63

64 layout-aware extraction

65 millions of full-text articles

66 information extraction

67 formalize the facts

68 Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation

69 two approaches

70 co-mentioning

71 counting

72 within documents

73 within paragraphs

74 within sentences

75 co-mentioning score

76 NLP Natural Language Processing

77 grammatical analysis

78 part-of-speech tagging

79 multiword detection

80 semantic tagging

81 sentence parsing

82 Gene and protein names Cue words for entity recognition Verbs for relation extraction [ nxexpr The expression of [ nxgene the cytochrome genes [ nxpg CYC1 and CYC7]]] is controlled by [ nxpg HAP1]

83 extract stated facts

84 high precision

85 poor recall

86 Exercise Go to http://diseases.jensenlab.org Find TYMS disease associations Inspect the text-mining evidence Look for examples of synonym usage Find genes linked to colorectal cancer

87 thank you!


Download ppt "Lars Juhl Jensen Biomedical text mining. exponential growth."

Similar presentations


Ads by Google