Download presentation
Presentation is loading. Please wait.
Published byCori Nora O’Neal’ Modified over 6 years ago
1
DeBaser: An online tool for NGS data assembly and fast polymorphism detection.
Rod Eyles1, John Juma1, Morag Ferguson1, Trushar Shah1 1 IITA, Nairobi OVERVIEW The advent of Next Generation Sequencing (NGS) represented a dynamic leap in the capacity to study the genomic basis of variation within and between species. NGS has allowed large scale comparison of genetic variation both in terms of expression level and sequence composition. Regions of difference, or polymorphisms, between genomes can explain variation between development, morphology and responses to external biotic or abiotic influences. Knowledge of this variation is not only important in understanding possible causes of phenotypic diversity but is also crucial for the successful design of RNAi based laboratory tools such as artificial microRNAs or viral induced gene silencing constructs (VIGS). Such techniques require knowledge of a variety’s exact sequence to ensure efficient knockdown and prediction off-targeting effects. To help facilitate the rapid discovery of polymorphisms between plant varieties we have utilised the increasing amount of NGS data available to construct an on-line database; “DeBaser”. The database stores assembly transciptomic and genomic NGS data for a range of compare varieties within selected species. This enables DeBaser to also function as a polymorphism finder through the option to provide output via an integrated multi-alignment tool. Polymorphisms between assembled transcriptomes are determined by selecting multiple varieties in the web interface. Users retrieve sequence information for each by entering selected gene identifiers or FASTA files. Multi-sequence alignment files showing polymorphisms between varieties are generated via MultiAlin or Muscle. The backend of DeBaser incorporates a NGS assembly pipeline. This pipeline serves to process NGS data and is available to users as an assembly tool. To utilise this pipeline, users upload NGS data and, if required, a reference genome or transcriptome. After processing, the assembly is then stored permanently on the website and can be retrieved in full or the user can specify individual transcripts by entering identifiers for genes of interest. The designers have also provided pre-assembled plant transcriptomes which can be utilised along with the user provided data in polymorphism detection. The DeBaser pipeline combines existing bioinformatic software to produce a mapped assembly and consensus sequences for each gene/transcript in the full data set. Initially Bowtie [1] is used to align raw NGS reads to the reference set. Then Samtools [1] sorts, indexes and converts .sam files to .bam files. These are piped to ANGSD [2] which is used to measure the read depth and base variants at each position and produce consensus files for every gene/cds identifier. These files can then be added to the existing species database for retrieval or for polymorphism detection. We believe DeBaser offers a number of advantages over existing polymorphism detection tools. Many of these operate on linux systems which require the user to possess the relevent programming skills. Others, such as InSNP, NovoSNP or VarScan can be installed on Windows or Mac operating systems but require stand-alone installation and users do not get the benefit of the computing power available from high performance servers. HaploSNPer is a platform independent SNP detection tool available online however it requires that the sequences being compared are in the form of assembled sequence data. Therefore, we anticipate that Polymorph will be utilised by users seeking to rapidly detect polymorphisms within specific genes of interest. The tool accessible entirely online and provides a complete pipeline, starting from raw NGS data through to multiple sequence alignment. Initially DeBaser will store assemblies for IITA and other CGIAR center mandated crops as well as several plant model organism species. DeBaser is in the final stages of development and will be released in the second half of 2017. 1 Or Input Select varieties to align FASTA files Gene identifiers Sequence collection from archived assembly Multi-alignment tool Graphic alignment file/s -Muscle -MultiAlin Output FASTA text files Archived Reference transcriptome or genome NGS raw data Archived assembly Bowtie Alignment Samtools sam to bam conversion ANGSD Generation of consensus sequences Reference transcriptome or genome Full assembly B A Figure 1 Overview of workflow within the DeBaser pipeline. The pipeline functions as both a polymorphism finder using preassembled datasets (A) and as a NGS assembly tool (B) Africa. 2 Figure 2 Example of a Polymorph output file. The transcription factor WRKY36 sequences from four cassava varieties are assembled from raw NGS data and aligned via Multalin. The result reveals significant polymorphism between varieties. 3 Figure 3 Image of Polymorph home page. A simple interface allows users to select multiple varieties within one of a range of species as well as one or more preferred output styles Acknowledgements DeBaser is a tool initially developed to facilitate candidate gene selection within the IIITA Cassava VIGS project. We thank the German Organisation for Technical Cooperation (GTZ) for funding provided for this project. References 1. Langmead, B., et al., Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome biol, (3): p. R25. 2. Korneliussen, T.S., A. Albrechtsen, and R. Nielsen, ANGSD: analysis of next generation sequencing data. BMC bioinformatics, (1): p. 356.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.