Information processing after resequencing

Slides:



Advertisements
Similar presentations
NGS Bioinformatics Workshop 2.1 Tutorial – Next Generation Sequencing and Sequence Assembly Algorithms May 3rd, 2012 IRMACS Facilitator: Richard.
Advertisements

A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Exploiting SNP polymorphism data Formation Bio-informatique, 9 au 13 février 2015.
DNAseq analysis Bioinformatics Analysis Team
Variant Calling Workshop Chris Fields Variant Calling Workshop v2 | Chris Fields1 Powerpoint by Casey Hanson.
Institute for Quantitative & Computational Biosciences Workshop4: NGS- study design and short read mapping.
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
MCB Lecture #20 Nov 18/14 Reference alignments.
Pathogen Informatics 21 st Nov 2014 Pathogen Sequencing Informatics Jacqui Keane Pathogen Informatics.
NGS Analysis Using Galaxy
Steve Newhouse 28 Jan  Practical guide to processing next generation sequencing data  No details on the inner workings of the software/code &
Read Processing and Mapping: From Raw to Analysis-ready Reads
Polymorphism and Variant Analysis Lab
PAGE: A Framework for Easy Parallelization of Genomic Applications 1 Mucahid Kutlu Gagan Agrawal Department of Computer Science and Engineering The Ohio.
MES Genome Informatics I - Lecture V. Short Read Alignment
File formats Wrapping your data in the right package Deanna M. Church
DAY 1. GENERAL ASPECTS FOR GENETIC MAP CONSTRUCTION SANGREA SHIM.
Standalone Firefly Tools with an embedded Tomcat 7 No additional dependencies besides Java 1.8.
Polymorphism & Variant Analysis Lab Saurabh Sinha Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 1 Powerpoint by Casey Hanson.
Next Generation DNA Sequencing
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
ParSNP Hash Pipeline to parse SNP data and output summary statistics across sliding windows.
Alexis DereeperCIBA courses – Brasil 2011 Detection and analysis of SNP polymorphisms.
Quality Control Hubert DENISE
IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Variant Calling Workshop.
Bioinformatics trainings, Vietnam Hanoi, November, 2015
Trinity College Dublin, The University of Dublin GE3M25: Data Analysis, Class 4 Karsten Hokamp, PhD Genetics TCD, 07/12/2015
IGV tools. Pipeline Download genome from Ensembl bacteria database Export the mapping reads file (SAM) Map reads to genome by CLC Using the mapping.
Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID
De novo assembly of RNA Steve Kelly
Ke Lin 23 rd Feb, 2012 Structural Variation Detection Using NGS technology.
Personalized genomics
Adapter and quality trimming Mick Watson Director of ARK-Genomics The Roslin Institute.
Calling Somatic Mutations using VarScan
Computing on TSCC Make a folder for the class and move into it –mkdir –p /oasis/tscc/scratch/username/biom262_harismendy –cd /oasis/tscc/scratch/username/biom262_harismendy.
Short Read Workshop Day 5: Mapping and Visualization
Canadian Bioinformatics Workshops
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Setting up visualization. Make output folder for visualization files Log into vieques $ ssh
From Reads to Results Exome-seq analysis at CCBR
DAY 2. GETTING FAMILIAR WITH NGS SANGREA SHIM. INDEX  Day 2  Get familiar with NGS  Understanding of NGS raw read file  Quality issue  Alignment/Mapping.
Tools for Targeted Sequencing and NGS analysis O. Harismendy, PhD BIOM262 – W2016.
Canadian Bioinformatics Workshops
METAL Practical You have ran a GWA analysis.
Using command line tools to process sequencing data
NGS File formats Raw data from various vendors => various formats
Day 5 Mapping and Visualization
Short Read Sequencing Analysis Workshop
Cancer Genomics Core Lab
Dowell Short Read Class Phillip Richmond
Next Generation Sequencing Analysis
RNA Sequencing Day 7 Wooohoooo!
MGmapper A tool to map MetaGenomics data
Integrative Genomics Viewer (IGV)
Short Read Sequencing Analysis Workshop
Transcriptomics II De novo assembly
First Bite of Variant Calling in NGS/MPS Precourse materials
Chapter 6 Filters.
Long way to solve short ncRNA data analysis problems – evaluation of small RNA-Seq datasets from non-model organisms in Galaxy Jochen Bick Jochen Bick.
Yonglan Zheng Galaxy Hands-on Demo Step-by-step Yonglan Zheng
Introduction to Data Formats and tools
BF528 - Biological Data Formats
ChIP-Seq Data Processing and QC
Next Gen. Sequencing Files and pysam
8 6 MySQL Special Topics A Guide to MySQL.
Next Gen. Sequencing Files and pysam
Next Gen. Sequencing Files and pysam
Alignment of Next-Generation Sequencing Data
Computational Pipeline Strategies
The Variant Call Format
Presentation transcript:

Information processing after resequencing

Sequence Trimming   Q = –10 log10(P)

Sequence Trimming Trimmomatic java -jar jar ~/Trimmomatic-0.36/trimmomatic-0.36.jar PE -phred33 <input_forward.fq.gz> <input_reverse.fq.gz> <output_forward_paired.fq.gz> <output_forward_unpaired.fq.gz> <output_reverse_paired.fq.gz> <output_reverse_unpaired.fq.gz> ILLUMINACLIP:../Trimmomatic- 0.36/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:20 MINLEN:36 java -jar ~/Trimmomatic-0.36/trimmomatic-0.36.jar SE -phred33 BS.cat.fq.trimmed BS.cat.fq.trimmed.trimo.out.1.20 ILLUMINACLIP:../Trimmomatic-0.36/adapters/TruSeq3-SE.fa:2:30:10 SLIDINGWINDOW:1:20 MINLEN:50 PE / SE SLIDINGWINDOW:4:20 (windowsize, threshold) Truseq3-PE.fa (adapter information of Hiseq)

Mapping quality control Common usage for Aligning bwa mem <ref.fa> <fastq1> <fastq2> … | samtools view –bS - > <filename.bam> -q 30 Mapping quality >= 30

Bam file status check Samtools flagstat <bamfile>

Tview samtools tview <bamfile> <ref.fa>

VCFtools common usage vcftools ‘--options’--vcf <vcf_file> --out <outfile> -- recode --vcf or --gzvcf You can use many options at the same time All the options you can use are in the manual https://vcftools.github.io/man_latest.html

Filtering INDEL vcftools --gzvcf variant.vcf.gz --recode --out variant.vcf.gz.SNPs --remove-indels

Filtering by position vcftools --vcf variant.vcf.gz.SNPs.recode.vcf --chr Chr01 --out variant.vcf.gz.Chr01 --recode --not-chr vcftools --vcf variant.vcf.gz.SNPs.recode.vcf--chr Chr01 - -from-bp 300 --to-bp 1000000 --out variant.vcf.gz.Chr01 –recode vcftools --vcf variant.vcf.gz.SNPs.recode.vcf --out overlap.vcf --recode --positions position_list.txt position_list.txt (TAB separated file)

Other filtering options --maf 0.05 --max-maf 0.3 --minQ 30 --minDP 3

Suggested command for filtering vcftools --vcf variant.vcf --remove-indels --minQ 30 --minDP 5(or 3) --out variant.vcf.SNP.q30.d5 -- recode Minimum depth filtering Mapping quality filtering Remove indels

Get statistical value (without --recode) --freq *.freq --TsTv <window_size> *.TsTv --TsTv-summary

Get statistical value (without --recode) --site-pi --window-pi <window_size> --window-pi-step <step_size> vcftools --vcf variant.vcf.gz.SNPs.recode.vcf --out test --window-pi 10000 --window-pi-step 1000 --weir-fst-pop <file_name> --fst-window-size <window_size> --fst-window-step <step_size> Vcftools --vcf <vcf_file> --weir-fst-pop bam_list_A --weir-fst-pop bam_list_B --fst-window-size 100000 --fst-window-step 10000 --out <out_file>

Get statistical value (without --recode) --TajimaD <window_size> --het

VCF comparison --diff

SNP typing snpEff java –Xmx4g –jar ~/snpEff/snpEff.jar -c <config file> -v <DB> <vcf_file> > <output file> java -Xmx4g -jar ~/snpEff/snpEff.jar -c ~/snpEff/snpEff.config -v Oryza_sativa <vcffile> > <out_file> You can check DB name in ~/snpEff/snpEff.config

Download DB for snpEff java -jar snpEff.jar download -v Oryza_sativa

snpEff DB construction GFF file is needed to contruct snpEff DB Edit snpEff configure file Make gm275 directory in ~/snpEff/data Copy gff file into gm275 directory java -jar snpEff.jar build -gff3 -v gm275

SNP typing java -Xmx4g -jar ~/snpEff/snpEff.jar -c ~/snpEff/snpEff.config -v gm275 BS.bam.sort.vcf -ud 2000 > BS.bam.sort.vcf.type Outfile Genefile Summary file

VCF to tabular format Please use after filtering vcf python make_tabular.py variant.vcf.type > variant.vcf.type.tabular