Short Read Workshop Day 5: Mapping and Visualization Video 3 Introduction to BWA.

Slides:



Advertisements
Similar presentations
In Silico Primer Design and Simulation for Targeted High Throughput Sequencing I519 – FALL 2010 Adam Thomas, Kanishka Jain, Tulip Nandu.
Advertisements

Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
SCHOOL OF COMPUTING ANDREW MAXWELL 9/11/2013 SEQUENCE ALIGNMENT AND COMPARISON BETWEEN BLAST AND BWA-MEM.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
1 ALAE: Accelerating Local Alignment with Affine Gap Exactly in Biosequence Databases Xiaochun Yang, Honglei Liu, Bin Wang Northeastern University, China.
Introduction to Short Read Sequencing Analysis
Institute for Quantitative & Computational Biosciences Workshop4: NGS- study design and short read mapping.
Sequence Alignment Storing, retrieving and comparing DNA sequences in Databases. Comparing two or more sequences for similarities. Searching databases.
Finding approximate palindromes in genomic sequences.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
Expected accuracy sequence alignment
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Sequence alignment, E-value & Extreme value distribution
Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers
SOAP3-dp Workflow.
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
NGS Analysis Using Galaxy
Sequence Alignment.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Presented by Mario Flores, Xuepo Ma, and Nguyen Nguyen.
Genomics Method Seminar - BWA
Introduction to Short Read Sequencing Analysis
MES Genome Informatics I - Lecture V. Short Read Alignment
File formats Wrapping your data in the right package Deanna M. Church
How I learned to quit worrying Deanna M. Church Staff Scientist, Short Course in Medical Genetics 2013 And love multiple coordinate.
NGS data analysis CCM Seminar series Michael Liang:
Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.
Hugh E. Williams and Justin Zobel IEEE Transactions on knowledge and data engineering Vol. 14, No. 1, January/February 2002 Presented by Jitimon Keinduangjun.
An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design Won-Hyong Chung and Seong-Bae Park Dept. of Computer Engineering.
Indexing DNA sequences for local similarity search Joint work of Angela, Dr. Mamoulis and Dr. Yiu 17/5/2007.
Bacterial Genetics - Assignment and Genomics Exercise: Aims –To provide an overview of the development and.
Quick introduction to genomic file types Preliminary quality control (lab)
Hash Algorithm and SSAHA Implementations Zemin Ning Production Software Group Informatics.
ParSNP Hash Pipeline to parse SNP data and output summary statistics across sliding windows.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
IGV tools. Pipeline Download genome from Ensembl bacteria database Export the mapping reads file (SAM) Map reads to genome by CLC Using the mapping.
Doug Raiford Phage class: introduction to sequence databases.
Short read alignment BNFO 601. Short read alignment Input: –Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.
Introduction of the ChIP-seq pipeline Shigeki Nakagome November 16 th, 2015 Di Rienzo lab meeting.
Heuristic Alignment Algorithms Hongchao Li Jan
B Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Working with PDF and eText Templates.
Assembly S.O.P. Overlap Layout Consensus. Reference Assembly 1.Align reads to a reference sequence 2.??? 3.PROFIT!!!!!
Short Read Workshop Day 5: Mapping and Visualization
Canadian Bioinformatics Workshops
From Reads to Results Exome-seq analysis at CCBR
DAY 2. GETTING FAMILIAR WITH NGS SANGREA SHIM. INDEX  Day 2  Get familiar with NGS  Understanding of NGS raw read file  Quality issue  Alignment/Mapping.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
Day 5 Mapping and Visualization
SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data - Aditi Thuse.
Stubbs Lab Bioinformatics – 5 Review tophat, alignment summary and htseq-count exercises: MDS plots and Differential expression We want to be able to.
Piecewise linear gap alignment.
M. roreri de novo genome assembly using abyss/1.9.0-maxk96
RNA molecule RNA fragment Activity Intro Slide:
Ssaha_pileup - a SNP/indel detection pipeline from new sequencing data
Assessment of HaloPlex Amplification for Sequence Capture and Massively Parallel Sequencing of Arrhythmogenic Right Ventricular Cardiomyopathy–Associated.
Jin Zhang, Jiayin Wang and Yufeng Wu
Assessment of HaloPlex Amplification for Sequence Capture and Massively Parallel Sequencing of Arrhythmogenic Right Ventricular Cardiomyopathy–Associated.
CSC2431 February 3rd 2010 Alecia Fowler
Next-generation sequencing - Mapping short reads
Maximize read usage through mapping strategies
A T C.
Next-generation sequencing - Mapping short reads
CS 6293 Advanced Topics: Translational Bioinformatics
Canadian Bioinformatics Workshops
BF528 - Sequence Analysis Fundamentals
The Variant Call Format
Presentation transcript:

Short Read Workshop Day 5: Mapping and Visualization Video 3 Introduction to BWA

Burrows-Wheeler Alignment Tool (BWA) BWA consists of 3 algorithms BWA-backtrack (aka BWA aln) (<100bp reads) BWA-SW (Smith-Waterman) BWA-MEM (70bp – 1Mb reads) BWA-MEM also has better performance than BWA-backtrack for bp reads BWA is a software package for mapping low-divergent sequences against a large reference genome Performs gapped, local alignments We like BWA for mapping paired-read data Can align both nucleotide and color space reads

Aligning with BWA aln BWA aln designed for “short” <100bp reads 2-step process: map with aln, finalize with either SAMPE or SAMSE 1.) Find the SA coordinates of the input reads 2.) Convert sai to sam file BWA aln BWA samse BWA sampe Out.sai Out.sam Step 1Step 2

Step 1.) BWA aln -o Max number of gap opens (default 1) -O Gap open penalty Read: ATGCA-CTAGCTAGCTAGCTAGCT |||||||||||||||||||||||| Genome: ATGCAGCTAGCTAGCTAGCTAGCT BWA alnOut.sai

Step 1.) BWA aln -o Max number of gap opens (default 1) -O Gap open penalty -e Gap extension -E Gap extension penalty -k Maximum edits i.e. gaps, mismatches in the seed -l specifies seed length Read: ATGCA--TAGCTAGCTAGCTAGCT |||||||||||||||||||||||| Genome: ATGCAGCTAGCTAGCTAGCTAGCT BWA alnOut.sai

Step 1.) BWA aln Command Options Index Name Reads.fq $ bwa aln index reads.fq > out.sai 2> bwa-out.stderr Most basic, default setting run of BWA $ bwa aln index fwd_reads.fq > fwd_out.sai 2> bwa_fwd_out.stderr Most basic, default setting run of BWA paired reads $ bwa aln index rev_reads.fq > rev_out.sai 2> bwa_rev_out.stderr

Step 2.) BWA samse BWA samse BWA sampe Out.sai Out.sam CommandIndex Name Reads.fqOut.saiSAM File Name -n adds an XA tag in SAM file noting how many other hits found for read SAM Header $ bwa samse –n 3 –f out.sam index out.sai reads.fq

Step 2.) BWA sampe BWA sampe Fwd_Out.sai Rev_Out.sai Out.sam CommandIndex Name Reads.fqOut.sai Options -n max hits to output per pair [3] -a max insert length [500] -f specify SAM file name $ bwa sampe index fwd_out.sai rev_out.sai fwd_reads.fq rev_reads.fq > out.sam

BWA-MEM

For each alignment, BWA calculates a mapping quality score, which is the Phred- scaled probability of the alignment being incorrect assuming the true hit can always be found. Simulation reveals that BWA may overestimate mapping quality due to this modification, but the deviation is relatively small.