Download presentation
Presentation is loading. Please wait.
Published byChristiana Gordon Modified over 9 years ago
1
Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations
2
Sample to Insight A comprehensive source of mutation data Focus on peer-reviewed scientific literature Experimental results are extracted by highly trained genetic experts Content is updated 4x per year
3
Sample to Insight More than 170,000 curated mutations HGMD ® Professional Spring 2015.2 Release Mutation TypeNumber of Entries Micro Lesions: Missense / Nonsense94860 Splicing15476 Regulatory3242 Small Deletions25454 Small Insertions10617 Small Indels2436 Gross Lesions: Repeat Variations476 Gross Insertions / Duplication3086 Complex Rearrangements1638 Gross Deletions12833 Total170118
4
Sample to Insight HGMD ® advantages Identifying the known genetic causes of a given inherited disease Understanding the mutational spectrum of a particular gene Verifying novel mutations Assessing individual disease risk Reducing time for literature review relating to a given inherited disease HGMD ® is the industry standard for:
5
Sample to Insight LRRK2 Mutation report for CM074929
6
Sample to Insight Categorization of mutations & polymorphisms DM = Disease causing (pathological) mutation DM? = Likely disease causing (likely pathological) mutation DP = Disease associated polymorphism DFP = Disease associated polymorphism with additional supporting functional evidence FP = Polymorphism affecting the structure, function or expression of a gene but with no disease association reported yet
7
Sample to Insight Ongoing review of classified mutations & polymorphisms
8
Sample to Insight PGMD TM Comprehensive pharmacogenomic database PGx/ADME panels FDA and EMA approved drugs containing PGx labels Associations from 6500+ publications from 500+ journals studying >1400 drugs
9
Sample to Insight Facilitates mapping of variants onto genome at position or genotype level Associations from 6500+ publications from 500+ journals studying >1400 drugs Median dose requirement of warfarin in patients with CYP2C9*1/CYP2C9*3 haplotype is 2.6 mg Genotype/haplotype specific findings p-value -.001 Relative Risk, Hazards Ratio, 95% Confidence Interval when available Statistical significance 22 cases with A/C genotype, 159 subjects studied, Design - Clinical Trial Pop: European Continental Ancestry Group, Age: 24-95, Treatment: All patients are treated with 0.5 mg to 10 mg/day of warfarin Study details (All studies are in vivo) PGMD: PharmacoGenomic Mutation Database
10
Sample to Insight Types of evidence
11
Sample to Insight HapMap D’, LOD, and R 2 scores Computed for all PGMD sites Includes between non-PGMD sites Linkage Disequilibrium
12
Sample to Insight Allele frequencies Major sources including: EVS 1000 Genomes HapMap
13
Sample to Insight Delivery models Online PGMD Web Interface Subject specific annotation via Genome Trax Download MySQL database TSV BED GFF Custom Pipeline Integration
14
Sample to Insight Genome Trax™
15
Sample to Insight NGS analysis pipeline
16
Sample to Insight Genome Trax™ Candidate Genes Disease causing variants Regulatory variants Over 190 million annotations total TrackRelease 2015.1 HGMD® inherited disease mutations146,581 HGMD® imputed mutations14,570 Pharmacogenomic Variants806,806 GWAS Catalogue18,735 COSMIC somatic disease mutations2,626,811 ClinVar127,638 TRANSFAC® experimentally verified TFBS15,330 ChIP-seq Transcription Factor Binding Sites9,178,528 Predicted TF@DNase I hypersensitivity sites10,732,462 miRNA gene sites2,735 PTMs (Post-Translational Modifications)35,079 PROTEOME ™ disease genes14,905 PROTEOME ™ Drug target genes2,976 PROTEOME ™ Pathway genes2,057 HGMD® disease genes27,257 SIFT &Polyphen predictions, conservation88,986,833 EVS allele frequencies3,663,071 Allele frequency from 1000 Genomes12,330,177 dbSNP common SNPs13,604,359 dbSNP60,879,061 Function prediction & frequency
17
Sample to Insight Genome Trax™ 2015.1 Statistics TRACK NAMESOURCE & VERSIONNO. OF FEATURES (HG19) Mutations and Variants HGMD® inherited disease mutationsHGMD® professional 2014.4144076 COSMIC somatic disease mutationsv712626811 GWAS CatalogueDownloaded on 11/27 201418166 EVS Exome VariationsESP65003668136 HGMD® imputed inherited disease mutations HGMD® professional 2014.414144 ClinVar VariantsClinVar-2014-11116238 PharmacoGenomic Mutation Database (Beta) 2014.4 (beta)588693 Allele frequencies from 1000 Genome (Beta) dbSNP13712369546 dbSNPdbSNP14160879061 dbNSFP Nonsynonymous funct. predictions v2.489617738 Regulatory Features Predicted ChIP-Seq TFBSTRANSFAC® 2014.39038289 TRANSFAC® experimentally verified TFBS TRANSFAC® 2014.414648 CpG IslandsTRANSFAC® 2014.440815 MicrosatellitesTRANSFAC® 2014.3946133 Virtual Transcription Start Sites (TSSs)TRANSFAC® 2014.471491 Post translational modificationsPROTEOME™ 2014.329916 miRNAmiRBase 212756 Predicted TFBSs in DNAse hypersens. regions TRANSFAC® 2014.39272893 Gene Functional Assignments Disease associationsPROTEOME™ 2014.413902 Pathway membershipPROTEOME™ 2014.42057 Drug targetsPROTEOME™ 2014.42860 HGMD® disease genesHGMD® professional 2014.427609 Orphanet (Beta)Downloaded on 09/11 20147996 TRACK NAMESOURCE & VERSIONTOTAL NO. OF FEATURES (HG19) Mutations and Variants HGMD® inherited disease mutationsHGMD® professional 2015.1146581 COSMIC somatic disease mutationsv712626811 GWAS CatalogueDownloaded on 17th February 201518735 EVS Exome VariationsESP65003663071 HGMD® imputed inherited disease mutationsHGMD® professional 2015.114570 ClinVar VariantsClinVar-2015-02127638 PharmacoGenomic Mutation Database (Beta)2015.1 (beta)806806 Allele frequencies from 1000 Genome (Beta)dbSNP14112330177 dbSNPdbSNP14160879061 dbNSFP Nonsynonymous functional predictionsv2.988986833 Regulatory Features Predicted ChIP-Seq TFBSTRANSFAC® 2015.19178528 TRANSFAC® experimentally verified TFBSTRANSFAC® 2015.115330 CpG IslandsTRANSFAC® 2015.141364 MicrosatellitesENSEMBL 781378729 Virtual Transcription Start Sites (TSSs)TRANSFAC® 2015.172883 Post translational modificationsPROTEOME™ 2015.135079 miRNAmiRBase 212735 Predicted TFBSs in DNAse hypersensitivity regionsTRANSFAC® 2015.110732462 Gene Functional Assignments Disease associationsPROTEOME™ 2015.114905 Pathway membershipPROTEOME™ 2015.12082 Drug targetsPROTEOME™ 2015.12976 HGMD® disease genesHGMD® professional 2015.127257 Orphanet (Beta)Downloaded on 18th February 20158188 Track Statistics
18
Sample to Insight Use it as you like it Download Flat files, MySQL dump Use with genome browsers, excel, tools, scripts, ANNOVAR, CLC bio Workbenches, Alamut, Cartagenia…
19
Sample to Insight HGMD – inherited mutations
20
Sample to Insight HGMD CAC (Histidine) changing to CAA (Glutamine) is causative for disease X CAC > CAG, leads to the same Histidine to Glutamine change but would not be a match for the mutation The HGMD equivalent track covers such cases HGMD imputed
21
Sample to Insight Facilitates mapping of variants onto genome at position or genotype level Associations from 6500+ publications from 500+ journals studying >1400 drugs Median dose requirement of warfarin in patients with CYP2C9*1/CYP2C9*3 haplotype is 2.6 mg Genotype/haplotype specific findings p-value -.001 Relative Risk, Hazards Ratio, 95% Confidence Interval when available Statistical significance 22 cases with A/C genotype, 159 subjects studied, Design - Clinical Trial Pop: European Continental Ancestry Group, Age: 24-95, Treatment: All patients are treated with 0.5 mg to 10 mg/day of warfarin Study details (All studies are in vivo) PGMD: PharmacoGenomic Mutation Database
22
Sample to Insight ClinVar Variants Version: ClinVar-2015-02 Track Description: This track contains data from the ClinVar. ClinVar is a public archive of reports that lists relationship between human variations and phenotypes with supporting evidence. Thus ClinVar facilitates access to and communication about the relationships asserted between human variation and observed health status, and how interpretation of variation may change over time. ClinVar collects reports of variants found in patient samples, assertions made regarding their clinical significance, information about the submitter, and other supporting data. The alleles described in the submissions are mapped to reference sequences, and reported according to the HGVS standard.ClinVar Benefit: This data set contains experimentally observed, clinically significant variants that are reviewed by experts. Filename: clinvar Link-out base URL: http://preview.ncbi.nlm.nih.gov/clinvar/$$ Links to: An individual variant report in ClinVar site at NCBI. Accession: ClinVar ID. Feature: HGVS description and the phenotype. For eg: NT_011109.15:g.14128514A>G:Diaphyseal dysplasia;
23
Sample to Insight COSMIC somatic disease mutations Version: v71 Track Description: This track contains data from the Catalogue of Somatic Mutations in Cancer (COSMIC).Catalogue of Somatic Mutations in Cancer (COSMIC) COSMIC contains somatic mutation information relating to human cancers. The mutation data and associated information is extracted from the primary literature and entered into the COSMIC database. In order to provide a consistent view of the data a histology and tissue ontology has been created and all mutations are mapped to a single version of each gene. A central aim of COSMIC is to provide somatic mutation frequencies. This track contains SNPs, insertions and deletions from COSMIC. We include COSMIC mutations for which a chromosomal position can be determined. The percentage of mutations with position is approximately 75%. Benefit: These somatic mutations complement the set of germ-line mutations from HGMD to allow for a more comprehensive assessment of prior knowledge about observed mutations. Filename: cosmic Link-out base URL: http://www.sanger.ac.uk/perl/genetics/CGP/cosmic?action=mut_summary&id=$$ Links to: An individual mutation report in COSMIC site at the Welcome Trust Sanger Institute. Accession: COSMIC Mutation ID. Feature: The histology and mutational change, eg "carcinoma:c.775G>T".
24
Sample to Insight EVS Exome Variations Version: ESP6500 Track Description: The EVS annotation source contains exome sequencing variants retrieved from the Exome Variant Server (EVS) for NHLBI Exome Sequencing Project (ESP) 1. The EVS data release (ESP6500) The dataset is comprised of a set of 2203 African-Americans and 4300 European- Americans unrelated individuals, totaling 6503 samples (13,006 chromosomes).. All data were simultaneously analyzed for exome variants at the University of Michigan (Abecasis Laboratory). The methods used for analysis is explained in detail at http://evs.gs.washington.edu/EVS/Exome Variant Server (EVS) for NHLBI Exome Sequencing Project (ESP) Benefit: EVS provides the population based genotype, allele counts and MAF scores for the variations observed in exome regions. Filename: evs Accession: a uniqe number identifying the EVS record. e.g. EVS2265387 Feature: rsID and hgnc symbol of the gene eg. "rs138751118:C4orf21".
25
Sample to Insight Orphanet (Beta) Version: 02/18/2015 Track Description: Orphanet is the reference portal for information on rare diseases and orphan drugs, for all audiences. Orphanet's aim is to help improve the diagnosis, care and treatment of patients with rare diseases. Benefit: Allows you to associate known patterns of inheritance (dominant, recessive) with rare diseases and the genes implicated in them. Togehter with the observed zygosity, and the disease causing mutations in HGMD, this can help you to focus only on dominant disease causing variants, or on recessive disease causing variants that are homozygous in the patient sample. Filename: Orpha Accession: The numerical part of the 'Orpha number‘, for example 79314 associated with the 'Orpha number' ORPHA79314
26
Sample to Insight GWAS Catalogue Version: 02/17/2015 Track Description: This track contains data from the GWAS Catalogue 1. These are literature derived disease associations for polymorphisms from GWAS studies that assayed at least 100,000 single nucleotide polymorphisms, associations listed are limited to those with p-values < 1.0 x 10 -5. The dataset provides Odds Ratios for common variants that can be used to calculate increased or decreased risk for the disease. A detailed description of the methods to assemble the dataset can be found in Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, and Manolio TA. Potentialetiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA. May 27, 2009., available http://www.genome.gov/pages/about/od/newsandfeatures/pnasgwasonlinecatalog.pdf, and at the GWAS Catalogue at www.genome.gov/gwastudies.GWAS Catalogue http://www.genome.gov/pages/about/od/newsandfeatures/pnasgwasonlinecatalog.pdfwww.genome.gov/gwastudies Benefit: These disease association data are manually curated, experimentally determined associations from the scientific literature, mapped to coordinates. They allow you to identify common SNPs that influence the risk for common diseases. Filename: gwas Link-out base URL: http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=$$ Links to: dbSNP record. As the GWAS catalog does not provide reports for the individual SNPs, we link to dbSNP instead. Accession: dbSNP rsid Feature: The disease, risk allele, and odds-ratio or beta (denoted by OR or beta), e.g. “Ovarian_cancer; rs2363956-T;1.1OR
27
Sample to Insight dbNSFP Nonsynonymous functional predictions Version: version:v2.9 Track Description: This track contains data from dbNSFP(Database for Non-synonymous SNPs Functional Predictions) 1. href="#fn4">4. dbNSFP is an integrated database of functional predictions from multiple algorithms for the comprehensive collection of human non-synonymous SNPs (NSs).It compiles prediction scores from four new and popular algorithms (SIFT, Polyphen2, LRT, and MutationTaster), along with a conservation score (PhyloP) and other related information, for every potential NS SNP in the human genome. More details about the methods of prediction is available at http://www.ncbi.nlm.nih.gov/pubmed/21520341dbNSFP(Database for Non-synonymous SNPs Functional Predictions) Benefit: This track also provides a calculated consensus prediction based on the results from different prediction algorithms from dbNSFP data. The prediction of each NSs is accreted according to its deleterious tendency ("Probably Deleterious", "Unknown", "Probably Harmless", "Harmless"). Filename: dbnsfp Accession: Gene ID; eg: "85440" Feature: Aminoacid reference base > Aminoacid alternate reference base: Consensus prediction; eg: > N: Probably Deleterious 50%.
28
Sample to Insight TRANSFAC – gene regulation
29
Sample to Insight PROTEOME – candidate genes
30
Sample to Insight PROTEOME – disease genes & drugs
31
Sample to Insight Trio dataset from clinical practice Bloom SyndomeOur Patient Autosomal recessiveCompound heterozygote Short stature Facial Anomalies Skin hypo- and hyperpigmentation Feeding difficulties Mild intellectual disabilitySevere intellectual disability Cancer PredispostionCancer Predisposition Frequent childhood infections No frequent infections After 20 years, following Genome Trax trio analysis finally able to be diagnosed with BLOOM SYNDROME
32
Sample to Insight Stand-alone Application ANNOVAR Introduction 32
33
Sample to Insight ANNOVAR requires the annotation databases saved in local disk for annotating genetic variants. A simple command can be issued to download the database directly from the internet (from UCSC browser, 1000 genome project or the ANNOVAR website). annotate_variation.pl -downdb [optional arguments] Database preparation 33
34
Sample to Insight Gene anno databases gene / refgene / refGene knowngene / knownGene ensgene / ensGene Region anno databases Cytoband tfbsConsSites GenomicSuperDups omimGene Filter databases 1000g2012apr snp137 snp135 Database preparation 34
35
Sample to Insight Database download 35
36
Sample to Insight ANNOVAR takes text-based input files, where each line corresponds to one variant. On each line, the first five space- or tab- delimited columns represent chromosomestart positionend positionref nucleotidesobs nucleotides Input files 36
37
Sample to Insight Isolate tumor specific variants by removing the germ line variants This file, containing filtered results is used as input for gene based annotation which extracts variants in the exonic, intronic, intergenic and other regions Profiling Breast Cancer variants – Input file 37
38
Sample to Insight This result file can be searched for specific, high risk genes such as TP53, BRCA1 and BRCA2 Profiling Breast Cancer variants 38
39
Sample to Insight 39
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.