Presentation is loading. Please wait.

Presentation is loading. Please wait.

VirVarSeq vs ViVaMBC Pictured above: The structure of HIV.

Similar presentations


Presentation on theme: "VirVarSeq vs ViVaMBC Pictured above: The structure of HIV."— Presentation transcript:

1 Statistical methods for improved variant calling of massively parallel sequencing data.
VirVarSeq vs ViVaMBC Pictured above: The structure of HIV. Bie Verbist | NCS Brugge |

2 OUTLINE Viral dynamics Massive parallel sequencing Variant calling VirVarSeq ViVaMBC Results HCV plasmids HCV clinical sample

3 Viral dynamics A virus is a small infectious agent that replicates only inside the living cells of other organisms. High replication rate (1011 replications a day for HIV) High mutation rate Viral population consist of closely related subgroups, viral quasispecies, which we want to identify and quantify.

4 Viral dynamics Number of virusus in population Time
Drug-sensitive variants Drug-resistant variant Number of virusus in population Heterogeneous viral population Undetectable Before treatment On treatment Time

5 Sequencing Sanger sequencing detection limit: 20-30%
no accurate estimate of frequency Massively parallel sequencing ACGGTTTCCGTCTGGG ACGGTTTCTGTCTGGG ACGGTTTCCGTCTGGG ACGGTTTCTGTCTGGG ACGGTTTCTGTCTGGG ACGGTTTCTGTCTGGG ACGGTTTCTGTCTGGG ACGATTTCTGTCTGGG detection limit << 20% more accurate estimate of frequency

6 Massively parallel sequencing
Fragmentation Amplification Viral population DNA Fragments Sequencing by synthesis Example, one fragment: T G C C A A A G A C G G T T T C T

7 Massively parallel sequencing
Viral population @HWUSI-EAS1524:17:FC:1:120:19254: :N:0:GATCAG GATCGGAAGAGCACACGTCTGAACTCCAGTCACGATCAGATCTCGTATGCCGTCTTCTGCTTGAAAAAAA + @HWUSI-EAS1524:17:FC:1:120:9430: :N:0:GATCAG ATCGGAAGAGCACACGNCTGAACTCCAGTCACGATCAGATCCCGTATGCCGTCTTCTGCTTGAAAAAAAA @HWUSI-EAS1524:17:FC:1:120:12760: :N:0:GATCAG ATCATACTGTCTTACTNTGATAAAACCTCCAATTCCCCCTANCATTNTTGGTTNCCATCTTCCTTGCAAA

8 Distinguish low-frequency variants
Variant calling Distinguish low-frequency variants from sequencing error. VirVarSeq ViVaMBC Adaptive filtering approach based on quality scores. Verbist et al. 2014, Bioinformatics. doi: / bioinformatics/btu587. Model based clustering approach which models the error probabilities based on quality scores. Verbist et al. 2014, BMC bioinformatics. under revision.

9 VirVarSeq Extract reads that cover codon of interest
Filter based on the quality scores. Build a codon table Reference ... ... ... ... Reads ... CGA CCA CGT GGA CGA CCA CGT GGA ... Pos x Codon Freq CGA 0.62 CCA 0.25 GGA 0.13 ... ... ... ... ... Filtering Codon Table ... ... ... ... ... ... ... ... ... * codon = nucleotide triplets which specifies a single amino acid

10 Image or graphic goes here
VirVarSeq Definition of the Q-threshold (QIT) : Fit mixture distribution on Q-scores with 3 components: Point prob around Q 2 Error distribution Reliable call distribution Intersection point is threshold. QIT Image or graphic goes here

11 ViVaMBC Extract reads that cover codon of interest
Perform Model Based Clustering Model the error probability Clusters unknown, EM algorithm Reference ... ... Reads ... CGA CCA CGT GGA ... Pos x Codon Freq CGA 0.62 CCA 0.25 GGA 0.13 ... ... CCA ... ... CCA GGA ... Clustering Codon Table CCA ... CGT ... ... CGA ... ... CGA CGA ... CGA CGA ... CGA ... CGA ... Cluster medoids = variant Size of Cluster = Frequency N° Clusters = N° variants

12 Results – HCV plasmids Two plasmids Amino acids 1 to 181 of NS3 region
differ at two codon positions (36 and 155) mixed 4 different proportions

13 Results – HCV plasmids Other variants (11481 max) are false positives.
VirVarSeq reports: more false positives with frequencies going up to 0,5%

14 Results - HCV clinical sample
VirVarSeq reports more variants. Above 1% methods in agreement, even above 0.5%. Few false pos in GC region for ViVaMBC ? Image or graphic goes here VirVarSeq ViVaMBC

15 VirVarSeq vs ViVaMBC When applying reporting limits of 1% or 0.5%, methods are in agreement. Below this limit, trade-off between sensitivity and specificity, with VirVarSeq less specific. VirVarSeq Adaptive approach Easy development Runs fast ViVaMBC More elegant Longer development time Longer run time

16 Acknowledgements Promoters: Prof.Dr.O.Thas1, Prof.Dr.L.Clement1 and Prof.Dr.L.Bijnens2 Yves Wetzels, Tobias Verbeke, Joris Meys1 for IT support Scientists within discovery sciences2 Non-clinical statistics team2 2 2 1

17 Thank you

18 Back-up

19 ViVaMBC Notation: Complete Data Likelihood:
ri: best base calls of read i (i=1 ... n) si: second best base calls of read i (i=1 ... n) zij: zij=1 when read i belongs to haplotype j (j=1...k) τj: probability to belong to haplotype j Complete Data Likelihood:

20 ViVaMBC Complete Data Likelihood:
Likelihood depends on cluster membership zij  EM algorithm

21 Library preparation Sequencing by synthesis


Download ppt "VirVarSeq vs ViVaMBC Pictured above: The structure of HIV."

Similar presentations


Ads by Google