Presentation is loading. Please wait.

Presentation is loading. Please wait.

Day 5 Mapping and Visualization

Similar presentations


Presentation on theme: "Day 5 Mapping and Visualization"— Presentation transcript:

1 Day 5 Mapping and Visualization
Jess Vera Phil Richmond

2 Questions?

3 Day 5 Outline Index reference genome Map reads with bowtie ~Break~
Convert SAM to BAM Visualize on IGV From /projects/sreadgrp/Day5/ Copy the following file to your home directory: template.sh $ cp /projects/sreadgrp/Day5/template.sh ./

4 Reference genome indexing
Compress the genome into a smaller, more easily searchable format Make Index/ directory Load bowtie module $ module load bowtie_bowtie Load BWA module $ module load bwa_0.7.5a Copy template.sh, give it a new name $ cp template.sh newname.sh Open jobscript with editor

5 How to Index Reference Genome
bowtie(2)-build Command Options Genome.fa Index Name $ bowtie-build /path/Genome.fa Index/SGDv4Bowtie

6 How to Index Reference Genome
bwa index Color space option Command Alignment Type Genome.fa $ bwa index -a is /path/Genome.fa –p Index/SGDv4BWA

7 Reference genome indexing
Compress the genome into a smaller, more easily searchable format Submit jobscript $ qsub <jobscript.sh> Is it running? $ qstat –u <username> What’s in Index/ ? SGDv4Bowtie.1.ebwt SGDv4Bowtie.3.ebwt SGDv4Bowtie.rev.1.ebwt SGDv4BWA.amb SGDv4BWA.bwt SGDv4BWA.sa SGDv4Bowtie.2.ebwt SGDv4Bowtie.4.ebwt SGDv4Bowtie.rev.2.ebwt SGDv4BWA.ann SGDv4BWA.pac (11 files)

8 Bowtie is an ultrafast, memory-efficient short read aligner geared toward quickly aligning large sets of short DNA sequences (reads) to large genomes. We like Bowtie for single-end read alignment Bowtie Only local alignment of reads (no gapped alignment) Optimized for short read lengths (25-50bp); max read length 1000bp Reads not allowed to overlap ambiguous characters (N, Y, R) Colorspace support (as of v0.12.0) Bowtie2 Local, end-to-end/gapped read alignment options Optimized for long read lengths (≥50bp); no max read length Allows read to overlap multiple ambiguous characters (N, Y, R) No colorspace support

9 Bowtie Alignment Options
-v Read Mismatch Allow –v <int> mismatches in whole read Where <int> = 0-3 Quality score is ignored -n Seed Mismatch (default) Allow -n <int> mismatches in seed region Where <int> = 0-3 Seed length = -l <int> Where <int> > 5 Seed Region GATCGCAGAGCTCGGGCATAGCTAGCGC Copy template.sh, give it a new name Open jobscript with editor

10 Reference genome index
Using Bowtie 1.) index reference genome with Bowtie-build (Video 1) 2.) Command line setup or Set options like -n or -v Reference genome index Alignment Output File Read file input $ bowtie -S –p 2 –v 2 –a index /path/RNAseq.fq bowtie-out.sam 2> bowtie_out.stderr Bowtie by default assumes your reads are in a fastq file, for other file formats you must specify the format of the reads $ bowtie -S –p 2–v 2 –m 1 index /path/RNAseq.fq bowtie-out.sam 2> bowtie_out.stderr

11 Submit bowtie jobscript Check that it’s running
15 min break

12 $ bowtie -S –p 2 –v 2 –a index /path/RNAseq. fq bowtie-out
$ bowtie -S –p 2 –v 2 –a index /path/RNAseq.fq bowtie-out.sam 2> bowtie_out.stderr # reads processed: # reads with at least one reported alignment: (92.46%) # reads that failed to align: (7.54%) $ bowtie -S –p 2–v 2 –m 1 index /path/RNAseq.fq bowtie-out.sam 2> bowtie_out.stderr # reads processed: # reads with at least one reported alignment: (23.41%) # reads that failed to align: (7.54%) # reads with alignments suppressed due to -m: (69.05%)

13 Bowtie Alignment Report Options
@read1 @read2 Chr3 Chr3 Chr1 Chr1 Non-uniquely mapping read Uniquely mapping read -a reports all valid alignments for each read -m <int> prevents reporting reads with > int alignments (use –m 1 to report only uniquely mapping reads) -k <int> report up to <int> alignments for each read (default -k 1)

14 What types of data can you view on IGV?
Quantitative Data Read pile up/coverage .bam .bedgraph .wig Qualitative Data Annotations .bed .gff3 .vcf Quantitative Data Coverage IGV image showing both types Qualitative Data Gene Annotations Go to for more information on file format and options

15 Quantitative Data Sorted BAM Genes
Coverage Track Read Alignment (colored by strand) Genes You must have an index of the sorted BAM file for visualization (see Video 1)

16 Quantitative Data Bedgraph and Wig file visualization Bedgraph
Plus and Minus Strand Wig File Plus Strand Wig File Minus Strand Genes

17 Qualitative Data BED GFF GFF file example:
chr1 SGD gene ID=Gene1 chr1 SGD snRNA ID=snRNA1 BED6 file example: chr Gene1 0 + chr Gene2 0 -

18 BEDTools Suite Set of useful scripts designed to conduct various tasks using bed or gff3 files

19 BEDTools Suite intersectBed – what overlaps between 2 files
mergeBed – merge together any overlapping annotations genomeCoverageBed – convert bam to bedgraph coverageBed – get read counts for annotations closestBed – compare 2 files, find what’s closest to an annotation

20 Variant Call Format – VCF File
Displays genomic variant data e.g. SNP, indels You must index a vcf file before loading onto IGV 1.) $ IGVTools index file.vcf 2.) IGV → File → Run IGVTools → Select Index Command

21 SAM → BAM → sorted.BAM → BAM index
Working With SAM Files SAMTools: suite of scripts designed to carry out various tasks using SAM file SAM → BAM → sorted.BAM → BAM index $ samtools view -bS file.sam -o file.bam $ samtools sort file.bam file.sorted → file.sorted.bam $ samtools index file.sorted.bam → file.sorted.bam.bai $ samtools idxstats file.sorted.bam

22 Getting Started with IGV
From /projects/sreadgrp/Day5/ copy SGDv4.fasta SGDv4.gff3 To /projects/sreadgrp/student/<username>/ Connect to VM, start VNC session $ /opt/igv/2.3.8/igv.sh In IGV: Import SGDv4.fa genome with annotations Save .genome file to VM home directory!!

23 Seed Alignment of Reads
Seed Region Read1 GATCGCAGAGCTCGGGCATAGCTAGCGC AGCTATGATCGCAGAGCTCGGGCATAGCTAGCGCTAGAGCTCGCTCGATCGATCGATC Genome


Download ppt "Day 5 Mapping and Visualization"

Similar presentations


Ads by Google