Pindel user manual Kai Ye Preparation of Pindel input Alignment BAM file generated by BWA Alignment BAM file generated by other aligners.

Slides:



Advertisements
Similar presentations
In Silico Primer Design and Simulation for Targeted High Throughput Sequencing I519 – FALL 2010 Adam Thomas, Kanishka Jain, Tulip Nandu.
Advertisements

SCHOOL OF COMPUTING ANDREW MAXWELL 9/11/2013 SEQUENCE ALIGNMENT AND COMPARISON BETWEEN BLAST AND BWA-MEM.
DNAseq analysis Bioinformatics Analysis Team
High Throughput Sequencing
Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data Kai Ye
High Throughput Sequencing Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
RNAseq analysis Bioinformatics Analysis Team
Institute for Quantitative & Computational Biosciences Workshop4: NGS- study design and short read mapping.
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
Pathogen Informatics 21 st Nov 2014 Pathogen Sequencing Informatics Jacqui Keane Pathogen Informatics.
Bioinformatics Analysis Team McGill University and Genome Quebec Innovation Center
Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers
NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM.
Sequencing Data Quality Saulo Aflitos. Read (≈100bp) Contig (≈2Kbp) Scaffold (≈ 2Mbp) Pseudo Molecule (Super Scaffold) Paired-End Mate-Pair LowComplexityRegion.
NGS Analysis Using Galaxy
Next generation sequencing Xusheng Wang 4/29/2010.
Steve Newhouse 28 Jan  Practical guide to processing next generation sequencing data  No details on the inner workings of the software/code &
Whole Exome Sequencing for Variant Discovery and Prioritisation
Detecting copy number variations using paired-end sequence data Nick Furlotte CS224 May 29, 2009.
MES Genome Informatics I - Lecture V. Short Read Alignment
File formats Wrapping your data in the right package Deanna M. Church
GBS Bioinformatics Pipeline(s) Overview
SAGExplore web server tutorial for Module II: Genome Mapping.
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)
NGS data analysis CCM Seminar series Michael Liang:
GenomeVIP: A Genomics Analysis Pipeline for Cloud Computing with Germline and Somatic Calling on Amazon’s Cloud R. Jay Mashl October 20, 2014.
Quick introduction to genomic file types Preliminary quality control (lab)
ParSNP Hash Pipeline to parse SNP data and output summary statistics across sliding windows.
Alexis DereeperCIBA courses – Brasil 2011 Detection and analysis of SNP polymorphisms.
BRUDNO LAB: A WHIRLWIND TOUR Marc Fiume Department of Computer Science University of Toronto.
IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
SV validation plate #1 Format: 384 amplicons ( two 384-well plates of primers ) Events: 4 different types of SVs: Deletions Insertions Tandem duplications.
GVS: Genome Variation Server Materials prepared by: Warren C. Lathe, PhD Updated: Q Version 2.
The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute.
Trinity College Dublin, The University of Dublin GE3M25: Data Analysis, Class 4 Karsten Hokamp, PhD Genetics TCD, 07/12/2015
Tutorial 6 High Throughput Sequencing. HTS tools and analysis Review of resequencing pipeline Visualization - IGV Analysis platform – Galaxy Tuning up.
IGV tools. Pipeline Download genome from Ensembl bacteria database Export the mapping reads file (SAM) Map reads to genome by CLC Using the mapping.
Ke Lin 23 rd Feb, 2012 Structural Variation Detection Using NGS technology.
1 Project 3 String Methods. Project 3: String Methods Write a program to do the following string manipulations: Prompt the user to enter a phrase and.
Welcome To: Class Name With Your Instructor: Angella Bernal Class will start at Approximately 8:00am.
GSVCaller – R-based computational framework for detection and annotation of short sequence variations in the human genome Vasily V. Grinev Associate Professor.
Phusion2 Assemblies and Indel Confirmation Zemin Ning The Wellcome Trust Sanger Institute.
Short Read Workshop Day 5: Mapping and Visualization
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
Python is Awesome! (and cooler than R). My Research.
From Reads to Results Exome-seq analysis at CCBR
Short Read Workshop Day 5: Mapping and Visualization Video 3 Introduction to BWA.
Canadian Bioinformatics Workshops
DAY 2. GETTING FAMILIAR WITH NGS SANGREA SHIM. INDEX  Day 2  Get familiar with NGS  Understanding of NGS raw read file  Quality issue  Alignment/Mapping.
Konstantin Okonechnikov Qualimap v2: advanced quality control of
Computing challenges in working with genomics-scale data
Using command line tools to process sequencing data
Next Generation Sequencing Analysis
SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data - Aditi Thuse.
MGmapper A tool to map MetaGenomics data
NGS Analysis Using Galaxy
SAGExplore web server tutorial for Module III:
Yonglan Zheng Galaxy Hands-on Demo Step-by-step Yonglan Zheng
2nd (Next) Generation Sequencing
MapView: visualization of short reads alignment on a desktop computer
Maximize read usage through mapping strategies
Canadian Bioinformatics Workshops
Computational Pipeline Strategies
Introduction to RNA-Seq & Transcriptome Analysis
RNA-Seq Data Analysis UND Genomics Core.
Presentation transcript:

Pindel user manual Kai Ye

Preparation of Pindel input Alignment BAM file generated by BWA Alignment BAM file generated by other aligners Pindel input with sample tag (1)bam2pindel.pl Adaptor.pm (2) sam2pindel.cpp Filtered Pindel input with sample tag (3) FilterPindelReads.cpp Merge Pindel input files for paired or population sequence data

(1) bam2pindel.pl Written by Keiran Raine at Sanger Institute This tool was designed for BWA based BAM/SAM Illumina data You must prepare a name sorted bam file Set BAM_2_PINDEL_ADAPT setenv BAM_2_PINDEL_ADAPT Arguments: -i|input: Input BAM file (req) -o|output: Output ready for pindel -s|sample: Sample or label (sampA,sampB...) (req) -pi|insert: Required if BAM file does not have PI tag in header RG record -r|restrict: Restrict to chromosome xx Example:./bam2pindel_bwa.pl –i NameSorted.bam –o output_prefix -s tumour –om –pi 300

(2) sam2pindel.cpp Written by Kai Ye at Leiden University Medical Center This tool was designed for all BAM/SAM Illumina data You must first compile the cpp source code: g++ sam2pindel.cpp –o sam2pindel –O3 5 arguments are required by sam2pindel – 1. Input sam file. – 2. Output for pindel. – 3. insert size. – 4. tag. – 5. number of extra lines (not start in the beginning of the file. If you start with standard sam file (Input.sam with insert size 300)./sam2pindel Input.samOutput4Pindel.txt 300tumour0 If you start with bam file./samtools view Input.bam |./sam2pindel - Output4Pindel.txt 300 tumour0

Running Pindel 1. Input: the reference genome sequences in fasta format; 2. Input: the unmapped reads in a modified fastq format; 3. Output folder 4. Which chr/fragment 5. BreakDancer result: Format per line: ChrA LocA stringA ChrB LocB stringB others If you don't have BreakDancer result, please provide an empty file here. Example:. /pindelhg19.fapindel_input_chr1.txtOutput_Folderchr1empty

Input format of TGGGGACCGGTGGAATGCTTCCACTGGCTGGGGGGC + chr Tumor ref AnchorAnchor Strand, chr, 3’ coordinate and mapping quality of the mapped reads; sample tag

18 May Output format: deletions 1base - 1million bases

Allow mismatches to accommodate sequence errors and SNPs D 10 ChrID 13 BP AAATCAACTAGTGACCTTCCAGGGACAACCCGAACGTGATGAAAAGATCAaagaacctacTCTATTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGAATCTTTGGACAAAGT GATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGAATCTTTGGACAA CAACCCGAACGTGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGA CGTGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGAATCTTTGGA TGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGAATCTTTGGACAAAG GTGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGAATCTTTGGAC TAGTGACCTTCCAGGGACAACCCGAACGTGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAA CCTTCCAGGGACAACCCGAACGTGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAA ACAACCCGAACGTGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGG CGAACGTGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGAATCTT CCCGAACGTGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGAATC AACCCGAACGTGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGAA TGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGAATCTTTGGACA ACCTTCCAGGGACAACCCGAACGTGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAA GATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGAATCTTTGGACAA AACCCGAACGTGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGAA GAACGTGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGAATCTTT 8

Inversions 9 sample ref

Large insertions 10

Non-template sequence in deletions, inversions and tandem duplications 11 ref sample

Non-template sequence: deletion of 4 bases with 2 bases inserted D 4 I 2 ChrID 3 BP Supports S1 13 SUM_MS 627 NumSupSamples 1 HCC1599a 12 CATGGCTGACTTATAAATCCCTACAGATATGTGGTTACTTCTCTACTTTCCCTTTCTTTGGCTTGGGCAACTGCCACGTTGATGCACTGGAGCCATTCTTCTGCATTCTTCTCATCCTTGGCCTTAAAGACATAGGTTTTATTGTC TTATAAATCCCTACAGATATGTGGTTACTTCTCTACTTTCCCTTTCTTTGCCTTGGGCAACTGCCAAA GATGCACT ATGTGGTTACTTCTCTACTTTCCCTTTCTTTGGCTTGGGCAACTGCCAAA GATGCACTGGAGCCATTCTTCTGCAT CTCTACTTTCCCTTTCTTTGGCTTGGGCAACTGCCAAA GATGCACTGGAGCCATTCTTCTGCATTCTTCTCATCCT AGATATGTGGTTACTTCTCTACTTTCCCTTTCTTTGGCTTGGGCAACTGCCAAA GATGCACTGGAGCCATTCTTCT TTTCCCTTTCTTTGGCTTGGGCAACTGCCAAA GATGCACTGGAGCCATTCTTCTGCATTCTTCTCATCCTTGGCCT TTCCCTTTCTTTGGCTTGGGCAACTGCCAAA GATGCACTGGAGCCATTCTTCTGCATTCTTCTCATCCTTGGCCTT TTACTTCTCTACTTTCCCTTTCTTTGGCTTGGGCAACTGCCAAA GATGCACTGGAGCCATTCTTCTGCATTCTTCT CTTGGGCAACTGCCAAA GATGCACTGGAGCCATTCTTCTGCATTCTTCTCATCCTTGGCCTTAAAGACATAGGTTT CTACAGATATGTGGTTACTTCTCTACTTTCCCTTTCTTTGGCTTGGGCAACTGCCAAA GATGCACTGGAGCCATTC AAATCCCTACAGATATGTGGTTACTTCTCTACTTTCCCTTTCTTTGGCTTGGGCAACTGCCAAA GATGCACTGGAG CTTGGGCAACTGCCAAA GATGCACTGGAGCCATTCTTCTGCATTCTTCTCATCCTTGGCCTTAAAGACATAGGTTT TTCCCTTTCTTTGGCTTGGGCAACTGCCAAA GATGCACTGGAGCCATTCTTCTGCATTCTTCTCATCCTTGGCCTT