Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies Joyce A. Mitchell, Ph.D. National Library of Medicine University of Missouri.

Similar presentations


Presentation on theme: "Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies Joyce A. Mitchell, Ph.D. National Library of Medicine University of Missouri."— Presentation transcript:

1 Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies Joyce A. Mitchell, Ph.D. National Library of Medicine University of Missouri

2 2 Research Collaborators Olivier Bodenreider, M.D., Ph.D. Alexa T. McCray, Ph.D. Allen C. Browne

3 3 Research Goals Investigating methods of connecting the disease and genomic information. Overall goals are to: – Overcome difficulties traversing multiple information resources – Examine coverage of Unified Medical Language System ® (UMLS ® ), Gene Ontology TM (GO), LocusLink-OMIM – Develop methods to use ontologies more effectively – Present data in understandable manner

4 4 Background – UMLS NLM developed, maintains Purpose: facilitate retrieval & integration of information from multiple biomedical sources Interrelates 60 biomedical terminologies – MeSH, SNOMED, Read Codes, ICD, etc – No vocabulary focused on molecular biology 1.5 million English terms; 800,000 concepts; 134 Semantic Types; 54 Semantic Relationships

5 5 Background – Gene Ontology GO Consortium developed, maintains Purpose: – promoting cross-species methodologies for functional comparisions – Allows annotation of molecular information on genes, gene products – “an essential start to creating a shared language of biology” ** Focused on – molecular function (5626 terms) – biological processes (4677 terms) – cellular components (1077 terms) Two semantic relations (is-a and part-of) **Genome Research 2001; 11:1425-33.

6 6 Background - LocusLink Curated, gene-centered resource of National Center for Biotechnology Information (NLM) Gene names, gene product names, gene product functions, and reference sequences (DNA, RNA, protein) Associates phenotype (diseases) to the genotype via Online Mendelian Inheritance in Man (OMIM) Online links to major bioinformatics knowledge bases and the literature

7 7 Specific Questions This study looked at coverage in UMLS of 1. 1244 genes associated with human diseases 2. 1702 diseases associated with the genes 3. 11,380 Gene Ontology terms 4. 38,832 genes/gene products in GO database (141,071 names) 5. Associations of genes and their functions in UMLS 6. Representation of gene function in GO compared to the UMLS

8 8 Methods LocusLink query: – human genes whose sequence is known and associated with disease (1244 loci) LocusLink data: – Genes/gene products (official names, synonyms, symbols) – Phenotypes (diseases) (1702 diseases) GO data: – all concepts (ontology terms), excluding obsolete terms (11,380 terms) – Gene products from all species (134,646 unique names, 38,832 genes)

9 9 Methods LocusLink and GO terms mapped to UMLS concepts – normalization used – mappings constrained by semantic type LocusLink loci studied for relationships in UMLS – Gene/GP – phenotype – Gene/GP – molecular function – Gene/GP – biological process – Gene/GP – cellular component For specific genes compared annotations in GO to representation in UMLS

10 10 Results - 1 For 1244 genes from LocusLink – 18% found in the UMLS Official gene name20%244/1244 Official gene symbol16%200/1244 Alias symbol15%394/2669 Gene product18%266/1460 Preferred product18%266/1460 Alias protein24%339/1425

11 11 Results - 2 For 1702 phenotypes (diseases) corresponding to 1244 genes – 34% found in the UMLS (575/1244) Most frequent single gene diseases covered – Huntington Disease – Cystic Fibrosis – Marfan Syndrome – Phenylketonuria – Achondroplasia

12 12 Results - 3 GO terms found in MeSH2764 terms GO terms found in SNOMED1366 terms GO terms found overall: 27% 3062/11,380 Molecular function44%2435/5626 Biological process 5%256/4677 Cellular component35%370/1077

13 13 Results - 4 For 134,646 unique gene names in GO database Full name11%4392/38,832 Symbol2%1167/60,381 Synonym6%1964/35,433

14 14 Results - 5 LocusLink – UMLS Relationship Categories found overall:72% Genes & gene products Phenotype64%754/1182 M. Function85%1192/1409 B. Process61%762/1240 C. component76%841/1107

15 15 Results - 5 Type of Relationship Associative 613 Co-occurrence3353 Hierarchical1168 G/GP and AssocCo-ocHier Phenotype2757245 M. Function2061069933 B. Process57737147 C. Component7582383

16 16 Results - 6 Representation of gene function in GO compared to the UMLS

17 17 Neurofibromin 2 – merlin in GO

18 18

19 19

20 20 Discussion

21 21 Best & Worst Mappings Best mapping categories Molecular function (GO)44% Cellular component (GO)35% Phenotype (LL)34% Worst mapping categories Gene synonym (GO) 6% Biological process (GO) 5% Gene symbol(GO) 2%

22 22 Only 34% of diseases? In OMIM-LL, diseases are subdivided by genetic causes but not in UMLS E.g. Limb Girdle Muscular Dystrophy LGMD is represented in UMLS A SNOMED term in MeSH it is an entry term for muscular dystrophies MeSH notes for MD: A general term for a group of inherited disorders which are characterized by progressive degeneration of skeletal muscles (ed, 2000)

23 23 Limb Girdle Muscular Dystrophy – genetic types LGMD typeGene NameLGMD typeGene Name 1AMyotilin2CSarcoglycan-gamma 1BLamin A/C2DSarcoglycan-alpha 1CCaveolin-32ESarcoglycan-beta 1DUnknown2FSarcoglycan-delta 2ACalpain-32GTelethonin 2BDysferlin2HTRIM32 2IFukutin-related protein

24 24 Only 5% of Biological Processes? Only 256 of the biological processes mapped to terms in UMLS. In GO, processes are elaborated & organism specific Example: UMLS - Mitotic spindle GO – Mitotic spindle assembly – Mitotic spindle assembly (sensu Saccharomyces) – Mitotic spindle assembly (sensu Fungi) – Mitotic spindle checkpoint – Mitotic spindle elongation – Mitotic spindle orientation – Mitotic spindle positioning – Mitotic spindle positioning and orientation

25 25 Why so few gene names and synonyms mapped? Official gene names have metadata and comments. – dystrophin (muscular dystrophy, Duchenne and Becker types), includes DXS143, DXS164, DXS206, DXS230, DXS239, DXS 268, DXS269, DXS270 DXS272 No single source has all names and synonyms GO synonym field contains IPI number for well known genes, does not match UMLS (useful cross reference but not a synonym) Symbols are short acronyms and match poorly

26 26 Summary 1 UMLS needs improvement in molecular biology domain but has considerable content: – 27% of GO concepts map – 34% of single gene diseases – Existing UMLS terms come primarily from MeSH and SNOMED Overall, positive mapping for 13,000 terms

27 27 Summary continued If the terms are in UMLS, it is possible to find a relationship between genes and phenotypes and gene function much of the time. UMLS does better with the human genes (20%+) than with genes from all organisms (11%) UMLS and GO representations complement each other.


Download ppt "Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies Joyce A. Mitchell, Ph.D. National Library of Medicine University of Missouri."

Similar presentations


Ads by Google