Presentation is loading. Please wait.

Presentation is loading. Please wait.

Next-generation sequencing: from basics to future diagnostics PART I: NGS technologies and standard workflow Sangwoo Kim, Ph.D. Assistant Professor, Severance.

Similar presentations


Presentation on theme: "Next-generation sequencing: from basics to future diagnostics PART I: NGS technologies and standard workflow Sangwoo Kim, Ph.D. Assistant Professor, Severance."— Presentation transcript:

1 Next-generation sequencing: from basics to future diagnostics PART I: NGS technologies and standard workflow Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University College of Medicine

2 Overview PART I: NGS technologies and standard workflow
Next generation sequencing History and technology Data and its meaning; process workflow Discussion PART II: NGS Analysis to find variants NGS analysis to find variants Single nucleotide variants (SNVs) Copy number variations (CNVs) Structural variations (SVs) PART III: NGS application to diagnostics NGS in genomic medicine Potential application to forensic science

3 background Conventional variant calling
Variant calling in minor subgroups background

4

5 6.7% of Japanese patients with NSCLC harbor a fusion of EML4 with the intracellular kinase domain of ALK

6 PF-2341066 (Crizotinib) | cMet/ALK inhibitor

7 57% response rate, 27% stable disease

8 “The FDA approved the Pfizer drug in 2011 based on 250 patients, four years after the ALK-mutation link was discovered. That is lightning speed in an industry accustomed to spending a decade with thousands of test subjects to get drug approval.”

9 Genomic medicine is a reality
McCarthy et al, 2013 Sci Transl Med.

10 The first breakthrough
The Human Genome Project (1990~2003) Began in 1990. Consortium comprised in U.S, U.K, France, Australia, Japan etc. “Rough draft” in 2000 “Complete genome” published in 2003 13 years, $3 billion dollars.

11 The second breakthrough
Massively Parallel Sequencing (a.k.a. Next-generation sequencing) via spatially separated, clonally amplified DNA templates or single DNA molecules Metzker et al, Nat Rev Genet, 2010 Illumina HiSeq2500 5500 SOLiD system Ion Torrent PGM

12

13

14 Launched in 2008 Sequencing of 1092 individual genomes was announced in 2012 Great repository for population genomics

15 Inaugural publication in 2009
Aims to assemble a genomic zoo (10,000 vertebrate species)

16 Project announced in 2013, aiming to accomplish in 5 years.
To identify cancer genes (regarding heterogeneity) and genetics of rare diseases

17

18

19

20

21

22

23

24 Overwhelmed by data “The challenges turns from data generation into data analysis!” Alex Sanchez, Introduction to NGS data analysis, 2012

25 Overwhelmed by data Alex Sanchez, Introduction to NGS data analysis, 2012 Elizabeth Pennsini , Science 2011

26 Overwhelmed by data …“Within a few years, Ponting predicts, analysis, not sequencing, will be the main expense hurdle to many genome projects. And that’s assuming there’s someone who can do it; bioinformaticists are in short supply everywhere.”... Alex Sanchez, Introduction to NGS data analysis, 2012 Elizabeth Pennsini , Science 2011

27 From data-poor to data rich
“과거의 ‘classical’ bioinformatics는 서열 상동성분석, 정렬, 재구성등에 대한 알고리즘이 주를 이루었습니다. 하지만 고도로 병렬화된 대용량 생명정보는 단순 분석을 넘어서는 통합과 해석을 요구하기 시작했습니다.” “오늘날 데이터는 도처에서 생성됩니다. 이제 데이터는 ‘그저 생성되기 마련’인 시대입니다.” Prof. Ju Han Kim, SNU Conference on Biomedical Informatics

28 From data-poor to data rich env.
“과거의 ‘classical’ bioinformatics는 서열 상동성분석, 정렬, 재구성등에 대한 알고리즘이 주를 이루었습니다. 하지만 고도로 병렬화된 대용량 생명정보는 단순 분석을 넘어서는 통합과 해석을 요구하기 시작했습니다.” “오늘날 데이터는 도처에서 생성됩니다. 이제 데이터는 ‘그저 생성되기 마련’인 시대입니다.” Prof. Ju Han Kim, SNU Conference on Biomedical Informatics Prof. Atul Butte, Stanford Univ. Hypothesis driven data → Data driven hypothesis

29 next generation sequencing
Conventional variant calling Variant calling in minor subgroups next generation sequencing

30 Traditional Sequencing
Genomic DNA is fragmented, then cloned to a plasmid vector and used to transform E. coli For each sequencing reaction, a single bacterial colony is picked and plasmid DNA isolated Each cycle sequencing reaction takes place within a microliter-scale volume

31 Sanger Sequencing

32 Next Generation Sequencing
No cloning DNA to be sequenced is used to construct a library of fragments that have synthetic DNAs (adapters) added covalently to each fragment end by use of DNA ligase Amplification can be done in parallel Library fragments are amplified in situ on a solid surface Sequencing can be done in parallel (in 3 iterative steps) a nucleotide addition step a detection step a wash step

33 Illumina Sequencing

34 Illumina Sequencing

35 Illumina Sequencing

36 Illumina Sequencing

37 Ion Torrent Sequencing
DNA capture on beads Single bead in a well Attach one nucleotide (A/T/G/C) at one time Detect pH change Measure the level of change for homopolymer detection

38 Ion Torrent Sequencing

39 Ion Torrent Sequencing

40 Ion Torrent Sequencing

41 Pacbio SMRT sequencing
zero-mode waveguide (ZMW)

42 Nanopore sequencing

43 Comparison

44 NGS data and processing overview
Conventional variant calling Variant calling in minor subgroups NGS data and processing overview

45 FASTA format A format for DNA (or protein) sequence

46 FASTQ format (NGS raw data)
sequence quality one read A format for NGS read (FASTQ + quality)

47 Mapping back to genome Where is this sequence in human genome?
TAACACCTGGGAAATTCATCACAAAAAGATCTTAGCCTAGGCACATTGTCATTAGGTTATCCAAAGTTAAGACAAAGGAAAGAATCTTAAGAGCTGTGAGA

48 Quality Each basecall (a call for nucleotide – ‘A’,’T’,’C’,’G’) has its own quality quality is a confidence of the machine Genome Informatics I (2015 Spring)

49 Phred scale quality Q = -10log10(e) Quality score
@SRR D4LHBFN1:204:D1B2UACXX:6:1101:1156:1996 length=101 NCTCTCACCGAGCTCCACGAACGATAAGGGAATCAGTCTTAAAAGAGCCGCGAGTTACAGGCACACCTGAGAGAAAGAGATGTTTGTATTCACCTTAGAAC +SRR D4LHBFN1:204:D1B2UACXX:6:1101:1156:1996 length=101 Q = -10log10(e) Quality score Probability of the base call being wrong 10, 20, 30, 40… 10%, 1%, 0.1%, 0.01%... +33 +,5,?,I… ASCII code table

50 D. Validation and functional assessment
control sequencing quality control short read alignment (BAM files) raw reads (FASTQ files) germ-line mutation somatic mutation copy number variation (CNV) structural variation (SV) A. Data Generation B. Variant Finding C. Variant Analysis xenogeneic sequence 43% 0% 31% recurrence analysis GKRRAGGGKRRAV*G variant impact prediction mutation filtration/selection tumor heterogeneity inference disease Box 1. Sequencing types and platforms. Depending on the sequencing purpose, various platforms can be considered for optimization. Whole genome sequencing (WGS) allows an inspection of all genomic areas and is applicable for CNV and SV analysis. Whole exome sequencing (WES) only interrogates coding regions (1~2% of the genome) with a less cost and throughput. WGS and WES are frequently used for novel causative variant discovery and control sample sequencing is generally mandatory. When a limited regions are to be tested (as in a diagnosis kit), a set of targeted genes are amplified and fed for sequencing (targeted/ panel sequencing). For this case, control is usually omitted when the target sites (hotspots) are clear. D. Validation and functional assessment variant confirmation pathway analysis functional study Kim S and Paik S, in preparation

51 discussion Conventional variant calling
Variant calling in minor subgroups discussion


Download ppt "Next-generation sequencing: from basics to future diagnostics PART I: NGS technologies and standard workflow Sangwoo Kim, Ph.D. Assistant Professor, Severance."

Similar presentations


Ads by Google