E XOME SEQUENCING AND COMPLEX DISEASE : practical aspects of rare variant association studies Alice Bouchoms Amaury Vanvinckenroye Maxime Legrand 1
W HAT IS EXOME SEQUENCING ? Exon : coding sequence of the DNA Exome sequencing : Aim : to sequence the coding part of the DNA i.e. the exons 2
I NTRODUCTION GWAS : helped discover common coding variants Exome sequencing Also rare coding variants Faster, better large sample ( > individuals) Before 2010 : only few publications on PUBMED Now : more than 2000 publications on PUBMED
K EY QUESTIONS TO ASK YOURSELF 4
S TUDY DESIGN State objectives Focus on extreme outcomes Unusual phenotype or traits BUT : CAREFUL : de novo mutations Geographical restrictions ? 5
S TUDY DESIGN Sequencing strategy ? Quality of the sample : 20x or greater level of coverage depth of sequencing/person : 60x or greater Non-coding regions : can still be usefull Determine ancestries or estimate genotype 0,2x to 2x 6
V ARIANT CALLING Goal : obtain high-quality genotypes Several steps: DNA contamination, DNA fingerprints, good follow- up? Alignment with reference genome, calibration of base quality score, removal of duplicate reads. 7
V ARIANT CALLING After reads mapping: Sample quality metrics (spotting of outlier properties) Variant calling: Look for differences where overlaps appear in alignment with the reference genome 8
V ARIANT CALLING Machine-learning-based classifier: Polymorphic variants / artifacts Evaluate metrics : true / false positives Quality metrics on samples Recommendation: min depth of coverage 20X Development of standards for storing sequence data and variant calls 9
A SSOCIATION ANALYSIS Goal: find functional effects of variants Score: indicates the effect on the protein function Separation between variants with high damage and the others If multiple annotations, 3 ways: Focus on the longest transcript Focus on the most deleterious effect Focus on the canonical transcript
A SSOCIATION ANALYSIS Single variant association test Check of quality data Usual way of processing rare variants: gather them in groups acting on the same gene to do the analysis 11
A SSOCIATION ANALYSIS 2 methods for processing groups: Comparison of the number of variants between cases and controls Comparison with chance expectations Recommendation: at least a test of each category with different thresholds If no threshold, variety of frequency cut-offs 12
A SSOCIATION ANALYSIS Packages available to perform the tests with subsets of data Example : 1. missense, splice, stop altering variants 2. subset of deleterious variants 3. splice, stop altering variants 13
A SSOCIATION ANALYSIS No optimal choices for the analysis because of variability of variants and of their charateristics between genes. Permutation-based approaches Statistical significance If no permutation-based threshold, p values ≤ QQ plots to summarize the results 14
A PPROACHES FOR FOLLOW - UP To demonstrate association based on the analysed samples, additional samples are needed. 15
A PPROACHES FOR FOLLOW - UP Exome chip experiments examine most of the varaints, but not very sensitive to non-European populations. 16
A PPROACHES FOR FOLLOW - UP Statistical imputation Take the base which has the highest correlation with the missing one, and assume it is the same allele than T (i.e. minor or major). But again, often not possible for mixed populations 17
R OLE OF FUNCTIONAL ASSAYS Study the changes in the proteins due to coding variants Study why these changes result in diverse diseases. 18
F ORWARD GENETICS Other approach to study functional variants First look at which proteins show changes Then search in the DNA sequence for the variant(s) 19
D ISCUSSION In other articles : more careful about the sample quality gain of sensitivity in variant calls if made among several samples indels in variant call are the major source of false positive. Need alignment algorithm which allows gapped alignement Check results of association in data bases 20
D ISCUSSION Because of costs, exome sequencing studies focus on coding part of the genome. Thus not suitable for non- exonic sequence. (stuctural variants, chromosomal rearrangements) These problems will be partially solved by the cut in costs of sequencing 21
REFERENCES 22
23