Download presentation
Presentation is loading. Please wait.
Published byValentine McCoy Modified over 8 years ago
1
Preparing published variants with Mutalyzer webservices Gerard C.P. Schaafsma Department of Human Genetics
2
Tuesday 8 March 2011Work discussion2 Why investigate the possibility of loading published data into LOVD databases ● Pilot project for loading GenomeNL and/or 1000 genomes data ● Loading data from exome-capture project(s) ● Showcase for (editors of) journals ● Load data into empty databases (e.g. those created for mendelian genes)
3
Tuesday 8 March 2011Work discussion3 Pilot data source Bell, C.J. et al., 2011 “Carrier Testing for Severe Recessive Diseases by Next-Generation Sequencing” Science Translational Medicine 3, 65ra4
4
Tuesday 8 March 2011Work discussion4 Description of data ● Preconception carrier testing for 448 severe recessive childhood diseases ● Target enrichment and next-generation sequencing of 7717 regions from 437 target genes ● 104 DNA samples ● subset: disease mutations with > 5% incidence and reported in HGMD
5
Tuesday 8 March 2011Work discussion5 Data from authors: Excel file
6
Tuesday 8 March 2011Work discussion6 Problems encountered ● Get these data "accessible", i.e. in electronic format, in a database using correct genomic coordinates ● Missing information ● Incorrect information ● Inconsistent notation
7
Tuesday 8 March 2011Work discussion7 Core information in LOVD: variant data
8
Tuesday 8 March 2011Work discussion8 What we got from the authors: ● a chromosome number: 13 ● a genome position relative to the human genome build 18: 51413355 ● the mutant allele: G ● the gene: ATP7B
9
Tuesday 8 March 2011Work discussion9 What do we need/want also ● the original allele: A ● a reference sequence: NM_000053.2 ● a coding DNA position relative to this reference sequence: c.3419T>C (reversed!) ● the (predicted) protein change: p.(Val1140Ala)
10
Tuesday 8 March 2011Work discussion10 Tools ● Webservices: programmatic access to a remote program (use functionality located elsewhere in a local program/script) ➢ Ensembl Perl API ➢ LOVD RESTful / Atom webservice ➢ Mutalyzer 2.0 SOAP webservices ● All used in Python script, including Database API (DBAPI) to store data in MySQL table
11
Tuesday 8 March 2011Work discussion11 Which webservice for what? ● Ensembl: original allele ● LOVD: is there a database for a given gene, and if so which reference sequence is used ● Mutalyzer: (longest) transcript ID, HGVS variant description, protein prediction ● Python Database API used to store data in MySQL tables
12
Tuesday 8 March 2011Work discussion12 Mutalyzer webservices used in script ● For genes without a reference sequence in an LOVD database: - getTranscriptsByGeneName(build, gene): provides 1 or more transcript id's: NM_000053.2 ● To choose the longest transcript: - transcriptInfo(LOVD_ver, build, accNo): provides translation start and stop and CDS stop positions: trans_start = -157, trans_stop = 6485, CDS_stop = 4398
13
Tuesday 8 March 2011Work discussion13 Mutalyzer webservices used in script (cont.) ● To get a converted position (i.e. g. → c. positions) - numberConversion(build, variant) provides a HGVS variant description: c.3419T>C ● To check the HGVS variant description and predict a protein description - runMutalyzer(variant) provides a predicted protein description: p.(Val1140Ala)
14
Tuesday 8 March 2011Work discussion14 Script outline ● adapt the tab-delimited input file ● insert values in MySQL table ● for each gene, use LOVD webservice for transcript ID - if not found, use Mutalyzer to find transcript IDs - use Mutalyzer to determine longest transcript ID ● write chromosome number, start and end positions to intermediate file
15
Tuesday 8 March 2011Work discussion15 Script outline (cont.) ● use this file for Perl script to get original alleles from Ensembl ● use Mutalyzer to get HGVS variant descriptions in c. notation (c.3419T>C) ● use Mutalyzer to check these descriptions and to get protein descriptions, p.(Val1140Ala) ● add all acquired info to MySQL table with Python database API
16
Tuesday 8 March 2011Work discussion16 Data flow
17
Tuesday 8 March 2011Work discussion17 To do: ● adapt script to make it suitable for variant types other than single nucleotide substitutions ● extend script with a Mutalyzer webservice providing exon/intron numbers ● replace hard-coded variables - column names ● automatically load data into LOVD databases
18
Tuesday 8 March 2011Work discussion18 Acknowledgements ● Martijn Vermaat ● Jeroen Laros ● Ivo Fokkema ● Peter Taschner ● Johan den Dunnen
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.