Download presentation
Presentation is loading. Please wait.
Published byRudolf Carpenter Modified over 8 years ago
1
Alexis DereeperCIBA courses – Brasil 2011 Detection and analysis of SNP polymorphisms
2
To know and manipulate available packages/tools for SNP and INDEL detection from NGS data (assembly of NGS data) To think about difficulties encountered when analysing new generation sequencing data (differentiate sequencing errors, paralogs and allelic variation) Detect SNP and assign genotypes to every polymorphic positions Simply exploit polymorphisms data via a Web- based application (genetic diversity, LD) Obtain an exploitable dataset to send for the design of a high-throughput SNP chip (Illumina VeraCode technology) Short reads Solexa Mapping SAM Exploitation of polymorphism data Design of a Illumina SNP chip Assignation of genotypes Ind1 ATTGTGTCGTAACGTATGTCATGTCGT Ind2 ATTGTGTCGGAACGTATGTCATGTCGT Ind3 ATTGTGTCGKAACGTATGTCATGTCGT Allelic variations List of SNPs 867 A/G 1998 T/C 2341 T/G Objectives
3
Alexis Dereeper Tablet Graphical viewer for assembly of NGS data Accepts different formats: ACE, SAM, BAM CIBA courses – Brasil 2011
4
Alexis Dereeper Automatic detection of SNP from SAM assembly SAM assembly SAM-to-BAM Generate Pileup Pileup2snp Pileup file FastQ Groomer Mapping BWA SAM-to-BAM IndelRealigner CountCovariates TableRecalibration UnifiedGenotyper VCF file SamTools GATK PicardTools VarScan SNP tabular file SNiPlay Utilities SamToFastaAlignments FASTA alignments with IUPAC Fastq AddReadGroupIntoSam VCFToFastaAlignments Example of pipeline faisable with the Galaxy system: 3 alternatives CIBA courses – Brasil 2011
5
Alexis Dereeper Program for SNP detection from Pileup file : Pileup2snp Another module exists for indel Pileup2indel but not implemented yet in Galaxy SouthGreen Text file describing for each position: base for reference, depth of coverage, variations, quality seq1 272 T 24,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<& seq1 273 T 23,.....,,.,.,...,,,.,..A <<<;<<<<<<<<<3<=<<<;<<+ seq1 274 T 23,.$....,,.,.,...,,,.,... 7<7;<;<<<<<<<<<=<;<;<<6 seq1 275 A 23,$....,,.,.,...,,,.,...^l. <+;9*<<<<<<<<<=<<:;<<<< seq1 276 G 22...T,,.,.,...,,,.,.... 33;+<<7=7<<7<&<<1;<<6< seq1 277 T 22....,,.,.,.C.,,,.,..G. +7<;<<<<<<<&<=<<:;<<&< seq1 278 G 23....,,.,.,...,,,.,....^k. %38*<<;<7<<7<=<<<;<<<<< seq1 279 C 23 A..T,,.,.,...,,,.,..... ;75&<<<<<<<<<=<<<9<<:<< Varscan Pileup format CIBA courses – Brasil 2011
6
Alexis Dereeper genotype2 genotype3 Depth threshold Depth threshold Heterozygosity genotype1 DepthFrequencyDepth Threshold values per genotype 1 0 1 4 0.3 2 4 2 W YA A T Assemblage: Ace format For each contig CL1Contig1 CL1Contig1.align.fa + CL1Contig2.align.fa, CL2Contig1.align.fa … FASTA alignments including IUPAC List of heterozygous positions + Mapping: SAM format Stats: estimation of average heterozygosity for each genotype + For heterozygosity estimation For position SamToFastaAlignments and AceToFastaAlignments: SNiPlay utilities for management of NGS data CIBA courses – Brasil 2011
7
Alexis Dereeper GATK (Genome Analysis ToolKit) Package for analysis of NGS data. Developed for the analysis of Human medical resequencing projects (1000 Genomes, The Cancer Genome Atlas) Includes tools for depth analysis, quality score recalibration, SNP/InDel discovery Complementary of 2 other packages: SamTools, PicardTools PREPROCESS: * Index human genome (Picard), we used HG18 from UCSC. * Convert Illumina reads to Fastq format * Convert Illumina 1.6 read quality scores to standard Sanger scores FOR EACH SAMPLE: 1. Align samples to genome (BWA), generates SAI files. 2. Convert SAI to SAM (BWA) 3. Convert SAM to BAM binary format (SAM Tools) 4. Sort BAM (SAM Tools) 5. Index BAM (SAM Tools) 6. Identify target regions for realignment (Genome Analysis Toolkit) 7. Realign BAM to get better Indel calling (Genome Analysis Toolkit) 8. Reindex the realigned BAM (SAM Tools) 9. Call Indels (Genome Analysis Toolkit) 10. Call SNPs (Genome Analysis Toolkit) 11. View aligned reads in BAM/BAI (Integrated Genome Viewer) CIBA courses – Brasil 2011
8
Global SAM with read group FastQ Groomer Mapping BWA SAM-to-BAM IndelRealigner CountCovariates TableRecalibration UnifiedGenotyper VCF file Fastq (RC1) AddReadGroupIntoSam SAM with read group FastQ Groomer Mapping BWA Fastq (RC2) AddReadGroupIntoSam SAM with read group FastQ Groomer Mapping BWA Fastq (RC3) AddReadGroupIntoSam SAM with read group FastQ Groomer Mapping BWA Fastq (RC4) AddReadGroupIntoSam SAM with read group …. mergeSam
9
Global SAM with read group SAM-to-BAM IndelRealigner CountCovariates TableRecalibration UnifiedGenotyper VCF file FastQ Groomer Mapping BWA Fastq global AddReadGroupIntoSam Fastq (RC1)Fastq (RC2)Fastq (RC3)Fastq (RC4)
10
Alexis Dereeper VCF format (Variant Call Format) ##fileformat=VCFv4.0 ##fileDate=20090805 ##source=myImputationProgramV3.1 ##reference=1000GenomesPilot-NCBI36 ##phasing=partial ##INFO= ##FILTER= ##FORMAT= #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 20 17330. T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 Advantages: describes the variations for each position + genotype assignation CIBA courses – Brasil 2011
11
Alexis Dereeper Other functionalities of GATK DepthOfCoverage module: Enables to inform sequencing depth of coverage for each gene, each position and each individual ReadBackedPhasing module: Enables to define if possible allele association (phase or haplotype) in case of heterozygosity… And not AGG GGA CIBA courses – Brasil 2011
12
Alexis Dereeper SNiPlay: Web- based application for polymorphism analysis http://sniplay.cirad.fr CIBA courses – Brasil 2011
13
Alexis Dereeper SAM assembly SAM-to-BAM Generate Pileup Pileup2snp Pileup file FastQ Groomer Mapping BWA SAM-to-BAM IndelRealigner CountCovariates TableRecalibration UnifiedGenotyper VCF file SamTools GATK PicardTools VarScan SNP tabular file SNiPlay Utilities SamToFastaAlignments FASTA alignments with IUPAC Fastq AddReadGroupIntoSam VCFToFastaAlignments CIBA courses – Brasil 2011 Automatic detection of SNP from SAM assembly Example of pipeline faisable with the Galaxy system: 3 alternatives
14
Options of SNiPlay Select the VCF format Load the VCF file Load reference file Select the Rice genome as reference
15
Alexis DereeperCIBA courses – Brasil 2011
16
Alexis Dereeper Cartesian coordinates Genotyping file Submission file for Illumina Analysis with the BeadStudio software Design of Illumina chip CIBA courses – Brasil 2011
17
Alexis Dereeper @DARwin 5.0 - ALLELIC - 2 3320 N°5050122122218218245245261261290290356 11111333344222 21111331344222 31111333344222 41111333344222 33 10 P 49 121 217 244 260 289 SSSSSSSSSS #cARB A A G G T C C A T T #cSYR A A G A T C C A T C A A G G T C C A T T PED format DARwin format.inp format for Phase Format for TASSEL (association studies) cARB100101 11 13 33 34 42 22 21 14 44 4 cSYR200101 11 13 31 34 42 22 21 14 42 4 cARA300101 11 13 33 34 42 22 21 14 44 4 3310:2 50122218245261290356461467560 cARBA:AA:AG:GG:GT:TC:CC:CA:AT:TT:T cSYRA:AA:AG:GA:GT:TC:CC:CA:AT:TC:T cARAA:AA:AG:GG:GT:TC:CC:CA:AT:TT:T cORLA:AA:AG:GG:GT:TC:CC:CA:AT:TT:T cLARA:GA:GA:GA:GC:TC:CC:CA:AT:TC:T Allelic files CIBA courses – Brasil 2011
18
Alexis Dereeper Annotation of SNPs CIBA courses – Brasil 2011
19
Alexis Dereeper Annotation of SNPs CIBA courses – Brasil 2011
20
SeqLib library Diversity analysis
21
Alexis Dereeper Haplotype networks High frequency haplotypes Low frequency haplotype Group distribution whithin this haplotype Distance between 2 haplotypes (nb of mutations) CIBA courses – Brasil 2011
22
Alexis Dereeper Individu, group Ind1, Table Ind2, Table Ind3, Table Ind4, East Ind5, East Ind6, East Ind7, East Ind8, West External file (optional) Allele sharing between groups CIBA courses – Brasil 2011
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.