Presentation is loading. Please wait.

Presentation is loading. Please wait.

First Bite of Variant Calling in NGS/MPS Precourse materials

Similar presentations


Presentation on theme: "First Bite of Variant Calling in NGS/MPS Precourse materials"— Presentation transcript:

1 First Bite of Variant Calling in NGS/MPS Precourse materials
Yonglan Zheng

2 Precourse materials General NGS variant calling workflow [slide#3]
GATK Best Practice as an example NGS file format [slides#4-8] NGS variant calling tools and platform [slided#9] FastQC, BWA-MEM, Picard (Markduplicates), GATK (RealignerTargetCreator, IndelRealigner, Unified Genotyper), SnpEff, Freebayes

3 GATK Best Practice (v3.x)
Multi-sample calling is replaced by a winning combination of single-sample calling in gVCF mode [Genome VCF (gVCF) for both variant and non-variant positions] and joint genotyping analysis.

4 FASTQ A FASTQ file (.fq and .fastq) is a text-based file for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. Each entry in a FASTQ file consists of four lines: • Sequence identifier • Sequence • Quality score identifier line (consisting of a +) • Quality score Quality A quality value Q is an integer mapping of p (i.e., the probability that the corresponding base call is incorrect). Phred quality score: Sequence identifier @<instrument>:<run number>:<flowcell ID>:<lane>:<tile>:<x-pos>:<y-pos> <read>:<is filtered>:<control number>:<index sequence>

5 FASTQ An example of a valid entry is as follows:
@EAS139:136:FC706VJ:2:5:1000: :Y:18:ATCACG AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +

6 SAM/BAM/CRAM A SAM (Sequence Alignment/Map) file (.sam) is a tab-delimited text file that contains sequence alignment data. A BAM file (.bam) is the binary version of a SAM file. Typically CRAM achieves 40-50% space saving over the alternative BAM format. It uses reference based compression, meaning that only base calls that differ to a designated reference sequence need to be stored. Headers Alignment section: mandatory fields CIGAR (Compact Idiosyncratic Gapped Alignment Report) String Alignment section: optional fields

7 VCF/BCF A VCF (Variant Call Format) file (.vcf) is a text file that contains meta-information lines (prefixed with ”##”), a header line (prefixed with ”#”), and data lines each containing information about a position in the genome and genotype information on samples for each position (text fields separated by tabs). VCF’s binary counterpart is BCF.

8 MAF A Mutation Annotation Format (MAF) file (.maf) is a tab-delimited text file that lists mutations. The format originates from The Cancer Genome Atlas (TCGA) project. Its columns include: Hugo_Symbol, Entrez_Gene_Id , Center, NCBI_Build, Chromosome, Start_Position, End_Position, Strand, Variant_Classification, Variant_Type, Reference_Allele, Tumor_Seq_Allele1, Tumor_Seq_Allele2, ... BED A BED (Browser Extensible Data) (.bed) file is a tab-delimited text file that defines a feature track. It consists of one line per feature, each containing 3-12 columns of data. Required Optional

9 NGS Variant Calling Tools and Platform
FastQC: BWA-MEM: Picard: Picard MarkDuplicates: GATK: GATK RealignerTargetCreator: GATK IndelRealigner: SnpEff: Freebayes: Galaxy: (main site); (test site)


Download ppt "First Bite of Variant Calling in NGS/MPS Precourse materials"

Similar presentations


Ads by Google