Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining text and data on chemicals Lars Juhl Jensen.

Similar presentations


Presentation on theme: "Mining text and data on chemicals Lars Juhl Jensen."— Presentation transcript:

1 Mining text and data on chemicals Lars Juhl Jensen

2 three parts

3 text mining

4 data integration

5 medical records

6 Part 1 text mining

7 exponential growth

8

9

10 some things are constant

11

12 ~45 seconds per paper

13 information retrieval

14 find the relevant papers

15 still too much to read

16 computer

17 as smart as a dog

18 teach it specific tricks

19

20

21 named entity recognition

22 identify the concepts

23 small molecules

24 proteins

25 diseases

26 comprehensive lexicon

27 synonyms

28 orthographic variation

29 “black list”

30 unfortunate names

31 Reflect

32 augmented browsing

33 browser add-on

34 Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009 O’Donoghue et al., Journal of Web Semantics, 2010

35 Firefox

36 Internet Explorer

37 Google Chrome

38 Safari

39 Utopia Documents

40 web services

41 collaboration

42

43

44

45 SciVerse

46

47

48

49

50

51 information extraction

52 formalize the facts

53 co-mentioning

54 NLP Natural Language Processing

55 Gene and protein names Cue words for entity recognition Verbs for relation extraction [ nxexpr The expression of [ nxgene the cytochrome genes [ nxpg CYC1 and CYC7]]] is controlled by [ nxpg HAP1]

56 Part 2 data integration

57 STITCH

58 Kuhn et al., Nucleic Acids Research, 2012

59 ~300,000 small molecules

60 ~2.6 million proteins

61 1100+ genomes

62 experimental data

63 physical binding

64 chemical–protein

65 protein–protein

66

67 curated knowledge

68 drug targets

69 complexes

70 pathways

71 Letunic & Bork, Trends in Biochemical Sciences, 2008

72 text mining

73 co-mentioning

74

75 NLP Natural Language Processing

76

77 many data types

78 many databases

79 different formats

80 different identifiers

81 variable quality

82 not comparable

83 spread over many genomes

84 quality scores

85 von Mering et al., Nucleic Acids Research, 2005

86 calibrate vs. gold standard

87 von Mering et al., Nucleic Acids Research, 2005

88 probabilistic scores

89 orthology transfer

90 combine the evidence

91 Part 3 patient records

92 a hard problem

93 in Danish

94 by busy doctors

95 about psychiatric patients

96 no lexicon

97 acronyms

98 typos

99 delusions

100 domain specific system

101 patient record excerpt

102 F20 F200 Negation Family

103 medication

104 adverse drug events

105 diagnoses

106 pharmacovigilance

107 patient stratification

108 Roque et al., PLoS Computational Biology, 2011

109 disease comorbidity

110 Roque et al., PLoS Computational Biology, 2011

111 DNA sequencing

112 genotype

113 phenotype

114 Acknowledgments Reflect Sune Frankild Heiko Horn Evangelos Pafilis Juan-Carlos Silla-Castro Michael Kuhn Reinhardt Schneider Sean O’Donoghue STITCH Michael Kuhn Damian Szklarczyk Andrea Franceschini Milan Simonovic Alexander Roth Pablo Minguez Tobias Doerks Manuel Stark Christian von Mering Peer Bork EPJ-mining Francisco S Roque Peter B Jensen Robert Eriksson Henriette Schmock Marlene Dalgaard Massimo Andreatta Thomas Hansen Karen Søeby Søren Bredkjær Anders Juul Thomas Werge Søren Brunak

115 larsjuhljensen


Download ppt "Mining text and data on chemicals Lars Juhl Jensen."

Similar presentations


Ads by Google