MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Slides:



Advertisements
Similar presentations
Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.
Advertisements

Outline to SNP bioinformatics lecture
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
PolyPhen and SIFT: Tools for predicting functional effects of SNPs Epi 244 Spring 2009 Sam S. Oh.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
Mutation and DNA Mutation = change(s) in the nucleotide/base sequence of DNA; may occur due to errors in DNA replication or due to the impacts of chemicals.
Gene Mutations.
Mutations. The picture shows a human genome Karyotype. Look at it carefully and discuss.
NGS Analysis Using Galaxy
Dr Katie Snape Specialist Registrar in Genetics St Georges Hospital
Whole Exome Sequencing for Variant Discovery and Prioritisation
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
MES Genome Informatics I - Lecture V. Short Read Alignment
Genetics-multistep tumorigenesis genomic integrity & cancer Sections from Weinberg’s ‘the biology of Cancer’ Cancer genetics and genomics Selected.
5. Point mutations can affect protein structure and function
Mutations.
Next-Generation Sequencing
The Biology and Genetic Base of Cancer. 2 (Mutation)
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
Mutation Calling IGV Exercises. Run IGV – Web search IGV (Integrative Genomics Viewer) – Go to Download page – may need to provide – Launch with.
1 Gene – Expression – Mutation - polymorphism. 2 How are genes expressed ? Nucleus Cytoplasm DNA Transcription Poly(A ) Cap Pre-mRNA Splicing Cap Poly(A)
Next-Generation Sequencing Eric Jorgenson Epidemiology 217 2/28/12.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Mutations In molecular biology and genetics, mutations are changes in the DNA sequence of a cell's genome. ntent/variation/
13-3 Mutations Can be good, bad or nothing!!. What is a mutation? The word is Latin for “to change”. There are 2 types: – 1) Single gene changes – 2)
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
12/16/14 StarterConnection/Exit: What is the true meaning of the word mutation? Are mutations bad / harmful? 12/16/14 Protein Synthesis Writing
Single nucleotide polymorphisms and Large scale variation
Introduction A mutation is a change in the normal DNA sequence. They are usually neutral, having no effect on the fitness of the organism. Sometimes,
Point Mutations Silent Missense Nonsense Frameshift.
Mutations in DNA changes in the DNA sequence that can be inherited can have negative effects (a faulty gene for a trans- membrane protein leads to cystic.
Personalized genomics
GSVCaller – R-based computational framework for detection and annotation of short sequence variations in the human genome Vasily V. Grinev Associate Professor.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Unit 7 Review DNA, Protein Synthesis, Mutations. Hershey and Chase DNA is the hereditary material.
Mutation. What you need to know How alteration of chromosome number or structurally altered chromosomes can cause genetic disorders How point mutations.
A change in the nucleotide sequence of DNA Ultimate source of genetic diversity Gene vs. Chromosome.
From Reads to Results Exome-seq analysis at CCBR
Canadian Bioinformatics Workshops
Interpreting exomes and genomes: a beginner’s guide
Week-6: Genomics Browsers
Lesson Four Structure of a Gene.
Molecular mechanism of mutation
Lesson Four Structure of a Gene.
DNA/GENE MUTATIONS.
“How does it affect the protein?”
Gene Mutations.
Interpretation Next Generation Sequencing (Bench Clinic)
Types of Mutations.
School of Pharmacy, University of Nizwa
DNA and mutations SC.912.L.16.4.
What makes a mutant?.
Content and Labeling of Tests Marketed as Clinical “Whole-Exome Sequencing” Perspectives from a cancer genetics clinician and clinical lab director Allen.
Mutations changes in the DNA sequence that can be inherited
What can you infer from this cartoon?
Genomic alterations in breast cancer cell line MDA-MB-231.
“TaqMan genotyping Assay’’
Mutations Any change in an organism’s DNA. Mutations in somatic cells only impact individual; mutations in gametes may impact offspring. 2 Types: A. Gene.
School of Pharmacy, University of Nizwa
Mutations Changes in the DNA code.
BF528 - Genomic Variation and SNP Analysis
BF528 - Whole Genome Sequencing and Genomic Variation
Mutation Notes.
Mutations.
Copyright Pearson Prentice Hall
Unit 1 Human Cells Higher Human Biology for CfE Miss Aitken
Analysis of protein-coding genetic variation in 60,706 humans
Mutation and DNA repair
Concordance between the genomic landscape identified by whole-exome sequencing of plasma cfDNA and tumor; DNA and recurrence of KDR/VEGFR2 oncogenic mutations.
Presentation transcript:

MES Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University College of Medicine Genome Informatics I (2015 Spring)

Overview Goal of this lecture – You will learn how to interpret discovered variants to filter and prioritize for associated phenotype (e.g. disease) and practice Predicting functional impact of variants – Utilizing sequence features – Utilizing protein features Popular methods and practice – Polyphen2 – Mutationassessor – SeattleSeq Genome Informatics I (2015 Spring)

FUNCTIONAL IMPACT OF VARIANTS Genome Informatics I (2015 Spring)

We usually have too many variants Genome Informatics I (2015 Spring) Saksena et al, “Developing Algorithms to Dis cover Novel Cancer Genes: A look at the cha llenges and approaches” We want to narrow down the number of “called” variant as small as possible

A simple mutation calling does not give you the final answer Genome Informatics I (2015 Spring) mutation calling (NGS) A lot of candidate variants some from sequencing error some from polymorphisms some from mapping error some from mapping error some are passengers

A simple mutation calling does not give you the final answer Genome Informatics I (2015 Spring) mutation calling (NGS) A lot of candidate variants some from sequencing error some from polymorphisms some from mapping error some from mapping error some are passengers A few real pathogenic variants

Gold mining Genome Informatics I (2015 Spring) Bunch of candidate variants Many variants A few variants Strategy I: Do they really exist? - Any mistakes in sequencing and variant calling? - Any non-disease causing polymorphisms? Strategy II: Are they functional? - Are they damaging? pathogenic? - Are they related to phenotypes?

Five ways to narrow down 1. Include control data 1. eliminate germline variants 2. Use more strict variant quality threshold 1. work on only confident variants 3. Filter out polymorphisms 1. remove non-damaging polymorphisms 4. Predict functional impacts 1. find damaging levels 5. Use disease specific knowledge 1. to acquire final candidates Genome Informatics I (2015 Spring)

Five ways to narrow down 1. Include control data 1. eliminate germline variants 2. Use more strict variant quality threshold 1. work on only confident variants 3. Filter out polymorphisms 1. remove non-damaging polymorphisms 4. Predict functional impacts 1. find damaging levels 5. Use disease specific knowledge 1. to acquire final candidates Genome Informatics I (2015 Spring) Strategy I

Five ways to narrow down 1. Include control data 1. eliminate germline variants 2. Use more strict variant quality threshold 1. work on only confident variants 3. Filter out polymorphisms 1. remove non-damaging polymorphisms 4. Predict functional impacts 1. find damaging levels 5. Use disease specific knowledge 1. to acquire final candidates Genome Informatics I (2015 Spring) Strategy I Strategy II

1. Include control data Genome Informatics I (2015 Spring) germline som atic som atic 100,000~ ~500, ~1000 We should eliminate unwanted germline variants

When controls are unavailable Single nucleotide polymorphism rate = 1/100~1/1000 Whole Genome Sequencing – Total DNA length = 3 billion – Expected SNP numbers = 3~30 million Whole Exome Sequencing – Total DNA length = 50 million – Expected SNP numbers = 50~500 thousands Targeted Sequencing (Panel) – Total DNA length = 100~1000 thousands – Expected SNP numbers = 1000~10,000 Hotspot Panel (only for very well known variants) – Controls can be omitted Genome Informatics I (2015 Spring)

2. Use more strict quality threshold Variant quality Genome Informatics I (2015 Spring) Low Variant Quality - This variant (although it has been called) can be false Cause of low quality - Low read depth (insufficient observation) - Bad basecall/mapping quality - Low allele frequency

2. Use more strict quality threshold Possible actions – Cut out variants based on Variant quality (e.g. QUAL<10) Total read depth (e.g. <20) Number of alt-depth (e.g. <5) Allele frequency (e.g. <0.1) – Prioritize variants Sort with variant quality and inspect from the top Genome Informatics I (2015 Spring)

3. Filter out polymorphisms When you had no control data (panel) – Check if the variants have been reported as polymorphism When you had control data – You may not have polymorphisms Because somatic mutations callers removes germline calls – However, there are some cases that polymorphisms can be reported (as somatic mutations) For example, low read depth in control sample Genome Informatics I (2015 Spring) low depth bad region Variant Undetected Variant Detected

dbSNP Database of SNP Genome Informatics I (2015 Spring) chr7: A>T

dbSNP Database of SNP Genome Informatics I (2015 Spring) chr7: A>T

4. Predict functional impacts Types of point mutations – Coding mutations Synonymous (silent) – Amino acid unchanged Missense – Amino acid changed Nonsense – Stop codon gained Readthrough – Stop codon loss – Non-coding mutations Intron Splice-variants Variants in regulatory elements Genome Informatics I (2015 Spring)

Functional impacts Types of indels – Inframe Insertion or deletion in a multiple of 3 base-pairs – Frameshift Genome Informatics I (2015 Spring)

General classification (priority) Genome Informatics I (2015 Spring)

General classification (priority) Genome Informatics I (2015 Spring) high-impact low-incidence low-confidence High incidence

Functional impact prediction of missense mutations How critical is an AA change to its protein function? – Amino acid conservation If the AA is essential, it would be conserved though the evolution – Amino acid in protein conformation Substitution of AA in active site would be more damaging Genome Informatics I (2015 Spring)

Amino acid conservation Genome Informatics I (2015 Spring)

Protein Structure Genome Informatics I (2015 Spring)

5. Use disease specific knowledge Your knowledge about the disease – e.g. cancer – “Has it been reported in other previous samples?” – Search it in COSMIC, if you found it is recurrent, it is likely to be functional Genome Informatics I (2015 Spring)

Five ways to narrow down 1. Include control data 1. eliminate germline variants 2. Use more strict variant quality threshold 1. work on only confident variants 3. Filter out polymorphisms 1. remove non-damaging polymorphisms 4. Predict functional impacts 1. find damaging levels 5. Use disease specific knowledge 1. to acquire final candidates Genome Informatics I (2015 Spring) Many, uncertain variants A few, reliable variants

Five ways to narrow down 1. Include control data 1. eliminate germline variants 2. Use more strict variant quality threshold 1. work on only confident variants 3. Filter out polymorphisms 1. remove non-damaging polymorphisms 4. Predict functional impacts 1. find damaging levels 5. Use disease specific knowledge 1. to acquire final candidates Genome Informatics I (2015 Spring) Many, uncertain variants A few, reliable variants Functional study, Mechanism study

SUMMARY OF PART I Genome Informatics I (2015 Spring)

- Connect to Linux cluster, Job script writing and submission - NGS technologies, NGS data - Short read alignment - Variant Calling, CNV, SV calling - Interpretation of discovered variants

In the remaining classes Genomic data to expression data – Gene  mRNA  Protein  Pathways and Networks  Phenotype Use high throughput data for your study Don’t forget your project Genome Informatics I (2015 Spring)

PRACTICE - FUNCTIONAL VARIANT ANNOTATION WITH SEATTLESEQ Genome Informatics I (2015 Spring)

Today’s data Somatic variants in chr22 of anonymous cancer called from Virmid Data location – /scratch/2015_GenomeInformatics/{yourdir}/virmid output – If you did not complete somatic calling practice, copy it from /scratch/2015_GenomeInformatics/public Genome Informatics I (2015 Spring)

data download to local PC ① move to your virmid out directory ② check your virmid output ③ click FTP

④ double click

seattle-seq search then click here!!!

seattle-seq ① write your ② input your VCF file ③ check!! ④ check!!

① click file > open.. ② select ‘all file’ ③ select annotated file

①②

Filtering phase accession (column H) – for filtering curated isoforms NM: mNRA XM: predicted mRNA model  filter functionGVS (column I) – for filtering damaging mutation type missense, missense-near-splice stop-gain, stop-loss splice-donor, splice-acceptor The others  filter

① ②

① ②

IGV download search then click here!!!

IGV download download then double click!!

IGV view

① input disease bam file ② input normal bam file ③ input VCF file

IGV view