Download presentation
Presentation is loading. Please wait.
Published byMarianna Cobb Modified over 9 years ago
1
Biosemantics group Martijn Schuemie
2
Overview The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile creation Nucleolus
3
Biosemantics group ErasmusMC University Medical Center Rotterdam Department of Medical Informatics Biosemantics group Jan Kors Barend Mons Erik van Mulligen Martijn Schuemie Rob Jelier Kristina Hettne Antoinne van Veldhoven
4
Biosemantics group Biosemantics Molecular Biology High througput experiment data (genomics and proteomics) Gene and protein databases, MEDLINE, Gene Ontology Biosemantics Concept-based text-mining Interpretation of experiment data Knowledge discovery
5
Ontology assembly Entrez GeneSwiss-ProtHUGO Combination Add spelling variations ABC1 -> ABC-1 DEF3 -> DEF-III Remove highly ambiguous terms CO2, membrane-bound obesity, open reading frame P=37%, R=76% P=50%, R=75%
6
Concept tagging MEDLINE text Malaria fever is a disease. It is spread by mosquitos. Sentence splitting [Malaria fever is a disease.] [It is spread by mosquitos.] Tokenization [Malaria] [fever] [is] [a] [disease] Word normalisation [malaria] [fever] [be] [a] [disease] Concept mapping [malaria fever] C24530 [disease] C12634 Homonym disambiguation PSA -> Prostate Specific Antigen or Poultry Science Association? Concept profile of text
7
Homonym disambiguation Some simple rules: Is it likely that a term has multiple meanings? - 3-letter-acronym (e.g. PSA): highly likely - long forms (e.g. Prostate Specific Antigen): highly unlikely - terms that refer to several conceptsby definition Is a synonym found? (e.g. “KLK3 (PSA)”) Is a keyword found? (e.g. “PSA is secreted by the prostate”) These simple rules change performance from P=50%, R=75% to P=71%, R=71%.
8
Homonym disambiguation Concept profile of text containing PSA Concept profile of Prostate Specific Antigen Concept profile of Phosphoserine Aminotransferase Unknown meaning Similarity? Previous tests showed an overall accuracy of 93%
9
Concept profile creation Concept profile of text Concept profile of concept Text Concept - From databases - By concept mapping
10
Concept profile creation Binary Log likelihood XIDF Uncertainty cf.
11
Concept profile creation Profile of gene ESR1: estrogen receptor1 breast neoplasm0.5 BRCA10.34 PGR0.30 Estrogen0.28 BRCA20.25 TP530.15 gene suppressor tumor0.12 genetics polymorphism0.12 genetic predisposition to disease 0.10 female0.05
12
Concept profile comparison
13
Concept NameWeightRAB27BMYRIPMLPHRAB27A 52.170.610.740.731 MLPH11.16-0.4410.29 Myosin Type V7.220.040.680.40.22 Melanosomes6.70.120.30.470.27 RAB27B4.0610.14-0.11 MYRIP2.980.0710.090.06 Melanocytes2.730.130.140.280.17 Myosins2.330.040.380.220.12 Myosin Heavy Chains1.72-0.460.180.09 GTP Phosphohydrolases1.310.170.230.040.08 Actins1.170.050.320.120.06 Exocytosis0.870.080.120.080.12 Secretory Vesicles0.680.070.160.060.09 Carrier Proteins0.59-0.110.170.09 Organelles0.540.11-0.120.09 rab GTP-Binding Proteins0.520.16-0.040.12
14
Nucleolus main function: ribosome biogenesis over 700 proteins identified and classified into 8 main categories
15
MEDLINE article Nucleolus – Concept profiles Concept profile of text Concept profile of protein Protein - From databases MEDLINE article
16
Nucleolus – Concept profiles BLAST (Basic Local Alignment Search Tool) Query: nucleolar protein Results: homologs in human mouse fruitfly yeast
17
Nucleolus – Concept profiles
18
Nucleolus – fun with protein profiles 2D visualization of high-dimensional space Automatic functional annotation of proteins Finding similar proteins
19
Nucleolus - visualisation SRP PARN Exosome comp. 10 O43390 P98179 Q8N220 Multi-Dimensional Scaling
20
Nucleolus – Assigning GO terms MEDLINE article Concept profile of text Concept profile of GO term GO term - From GO MEDLINE article
21
Nucleolus – Assigning GO terms AuC : Area under Curve
22
Nucleolus – Assigning GO terms 1.Manual assignment to one category only e.g. SFRS protein kinase 1 plays a role in splicing, but is also in kinase 2.Assumptions do not always hold Sequence homology ≠ function homology Concept co-occurrence ≠ functional relationship 3.Homonyms ‘Mistakes’ in automatic annotation
23
Nucleolus – Finding new proteins Concept profile of nucleolar protein Concept profile of human protein Concept profile of human protein Concept profile of human protein
24
Nucleolus – Finding new proteins 60S ribosomal protein L3-like Probable ATP-dependent RNA helicase DDX4 ATP-dependent RNA helicase DDX3Y Guanine nucleotide binding protein-like 3 Importin-11 (importin beta family) Putative Brix domain containing protein 1P Probable ATP-dependent RNA helicase DDX20 (Gemin 3) 60S acidic ribosomal protein P0 Helicase SKI2W ATP-dependent RNA helicase DDX39 40S ribosomal protein S20 Probable ATP-dependent RNA helicase DDX6 Probable ATP-dependent RNA helicase DDX23 Double-stranded RNA-binding protein Staufen homolog 1 ATP-dependent RNA helicase DDX25 Probable nucleolar complex protein 14 Eukaryotic initiation factor 4A-II ATP-dependent RNA helicase DDX19B 40S ribosomal protein S3 Ribosomal protein DEAD-box Found in nucleolus Associated with nucleolar p. DEAD-box Found in nucleolus DEAD-box Ribosomal protein DEAD-box Indirect evidence DEAD-box Nucleolar DEAD-box Ribosomal protein
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.