Presentation is loading. Please wait.

Presentation is loading. Please wait.

MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Similar presentations


Presentation on theme: "MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,"— Presentation transcript:

1 MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University College of Medicine Genome Informatics I (2015 Spring)

2 Overview Goal of this lecture – You will learn how to interpret discovered variants to filter and prioritize for associated phenotype (e.g. disease) and practice Predicting functional impact of variants – Utilizing sequence features – Utilizing protein features Popular methods and practice – Polyphen2 – Mutationassessor – SeattleSeq Genome Informatics I (2015 Spring)

3 FUNCTIONAL IMPACT OF VARIANTS Genome Informatics I (2015 Spring)

4 We usually have too many variants Genome Informatics I (2015 Spring) Saksena et al, “Developing Algorithms to Dis cover Novel Cancer Genes: A look at the cha llenges and approaches” We want to narrow down the number of “called” variant as small as possible

5 A simple mutation calling does not give you the final answer Genome Informatics I (2015 Spring) mutation calling (NGS) A lot of candidate variants some from sequencing error some from polymorphisms some from mapping error some from mapping error some are passengers

6 A simple mutation calling does not give you the final answer Genome Informatics I (2015 Spring) mutation calling (NGS) A lot of candidate variants some from sequencing error some from polymorphisms some from mapping error some from mapping error some are passengers A few real pathogenic variants

7 Gold mining Genome Informatics I (2015 Spring) Bunch of candidate variants Many variants A few variants Strategy I: Do they really exist? - Any mistakes in sequencing and variant calling? - Any non-disease causing polymorphisms? Strategy II: Are they functional? - Are they damaging? pathogenic? - Are they related to phenotypes?

8 Five ways to narrow down 1. Include control data 1. eliminate germline variants 2. Use more strict variant quality threshold 1. work on only confident variants 3. Filter out polymorphisms 1. remove non-damaging polymorphisms 4. Predict functional impacts 1. find damaging levels 5. Use disease specific knowledge 1. to acquire final candidates Genome Informatics I (2015 Spring)

9 Five ways to narrow down 1. Include control data 1. eliminate germline variants 2. Use more strict variant quality threshold 1. work on only confident variants 3. Filter out polymorphisms 1. remove non-damaging polymorphisms 4. Predict functional impacts 1. find damaging levels 5. Use disease specific knowledge 1. to acquire final candidates Genome Informatics I (2015 Spring) Strategy I

10 Five ways to narrow down 1. Include control data 1. eliminate germline variants 2. Use more strict variant quality threshold 1. work on only confident variants 3. Filter out polymorphisms 1. remove non-damaging polymorphisms 4. Predict functional impacts 1. find damaging levels 5. Use disease specific knowledge 1. to acquire final candidates Genome Informatics I (2015 Spring) Strategy I Strategy II

11 1. Include control data Genome Informatics I (2015 Spring) germline som atic som atic 100,000~ ~500,000 100~1000 We should eliminate unwanted germline variants

12 When controls are unavailable Single nucleotide polymorphism rate = 1/100~1/1000 Whole Genome Sequencing – Total DNA length = 3 billion – Expected SNP numbers = 3~30 million Whole Exome Sequencing – Total DNA length = 50 million – Expected SNP numbers = 50~500 thousands Targeted Sequencing (Panel) – Total DNA length = 100~1000 thousands – Expected SNP numbers = 1000~10,000 Hotspot Panel (only for very well known variants) – Controls can be omitted Genome Informatics I (2015 Spring)

13 2. Use more strict quality threshold Variant quality Genome Informatics I (2015 Spring) Low Variant Quality - This variant (although it has been called) can be false Cause of low quality - Low read depth (insufficient observation) - Bad basecall/mapping quality - Low allele frequency

14 2. Use more strict quality threshold Possible actions – Cut out variants based on Variant quality (e.g. QUAL<10) Total read depth (e.g. <20) Number of alt-depth (e.g. <5) Allele frequency (e.g. <0.1) – Prioritize variants Sort with variant quality and inspect from the top Genome Informatics I (2015 Spring)

15 3. Filter out polymorphisms When you had no control data (panel) – Check if the variants have been reported as polymorphism When you had control data – You may not have polymorphisms Because somatic mutations callers removes germline calls – However, there are some cases that polymorphisms can be reported (as somatic mutations) For example, low read depth in control sample Genome Informatics I (2015 Spring) low depth bad region Variant Undetected Variant Detected

16 dbSNP Database of SNP Genome Informatics I (2015 Spring) chr7:11584142 A>T

17 dbSNP Database of SNP Genome Informatics I (2015 Spring) chr7:11584142 A>T

18 4. Predict functional impacts Types of point mutations – Coding mutations Synonymous (silent) – Amino acid unchanged Missense – Amino acid changed Nonsense – Stop codon gained Readthrough – Stop codon loss – Non-coding mutations Intron Splice-variants Variants in regulatory elements Genome Informatics I (2015 Spring)

19 Functional impacts Types of indels – Inframe Insertion or deletion in a multiple of 3 base-pairs – Frameshift Genome Informatics I (2015 Spring)

20 General classification (priority) Genome Informatics I (2015 Spring)

21 General classification (priority) Genome Informatics I (2015 Spring) high-impact low-incidence low-confidence High incidence

22 Functional impact prediction of missense mutations How critical is an AA change to its protein function? – Amino acid conservation If the AA is essential, it would be conserved though the evolution – Amino acid in protein conformation Substitution of AA in active site would be more damaging Genome Informatics I (2015 Spring)

23 Amino acid conservation Genome Informatics I (2015 Spring)

24 Protein Structure Genome Informatics I (2015 Spring)

25 5. Use disease specific knowledge Your knowledge about the disease – e.g. cancer – “Has it been reported in other previous samples?” – Search it in COSMIC, if you found it is recurrent, it is likely to be functional Genome Informatics I (2015 Spring)

26 Five ways to narrow down 1. Include control data 1. eliminate germline variants 2. Use more strict variant quality threshold 1. work on only confident variants 3. Filter out polymorphisms 1. remove non-damaging polymorphisms 4. Predict functional impacts 1. find damaging levels 5. Use disease specific knowledge 1. to acquire final candidates Genome Informatics I (2015 Spring) Many, uncertain variants A few, reliable variants

27 Five ways to narrow down 1. Include control data 1. eliminate germline variants 2. Use more strict variant quality threshold 1. work on only confident variants 3. Filter out polymorphisms 1. remove non-damaging polymorphisms 4. Predict functional impacts 1. find damaging levels 5. Use disease specific knowledge 1. to acquire final candidates Genome Informatics I (2015 Spring) Many, uncertain variants A few, reliable variants Functional study, Mechanism study

28 SUMMARY OF PART I Genome Informatics I (2015 Spring)

29 - Connect to Linux cluster, Job script writing and submission - NGS technologies, NGS data - Short read alignment - Variant Calling, CNV, SV calling - Interpretation of discovered variants

30 In the remaining classes Genomic data to expression data – Gene  mRNA  Protein  Pathways and Networks  Phenotype Use high throughput data for your study Don’t forget your project Genome Informatics I (2015 Spring)

31 PRACTICE - FUNCTIONAL VARIANT ANNOTATION WITH SEATTLESEQ Genome Informatics I (2015 Spring)

32 Today’s data Somatic variants in chr22 of anonymous cancer called from Virmid Data location – /scratch/2015_GenomeInformatics/{yourdir}/virmid output – If you did not complete somatic calling practice, copy it from /scratch/2015_GenomeInformatics/public Genome Informatics I (2015 Spring)

33 data download to local PC ① move to your virmid out directory ② check your virmid output ③ click FTP

34 ④ double click

35 seattle-seq search then click here!!!

36 seattle-seq ① write your email ② input your VCF file ③ check!! ④ check!!

37 ① click file > open.. ② select ‘all file’ ③ select annotated file

38

39 ①②

40 Filtering phase accession (column H) – for filtering curated isoforms NM: mNRA XM: predicted mRNA model  filter functionGVS (column I) – for filtering damaging mutation type missense, missense-near-splice stop-gain, stop-loss splice-donor, splice-acceptor The others  filter

41 ① ②

42 ① ②

43 IGV download search then click here!!!

44 IGV download download then double click!!

45 IGV view

46

47 ① input disease bam file ② input normal bam file ③ input VCF file

48 IGV view


Download ppt "MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,"

Similar presentations


Ads by Google