PRIORITIZING REGIONS OF CANDIDATE GENES FOR EFFICIENT MUTATION SCREENING
Outline Abstract Background Materials and Methods Results Discussion Conclusion
Abstract Complete sequence of human genome has altered search process for disease-causing mutations Previously, mostly rare diseases studied. Took years to analyze data Now, rate-limiting step is screening patients and interpreting results Tests hypothesis that disease-causing mutations are not uniformly distributed and can be predicted bioinformatically Developed prioritization of annotated regions (PAR) technique
Abstract Tested by analyzing 710 genes with 4,498 previously identified mutations Nearly 50% of disease-associated genes found after analyzing only 9% of complete coding sequence PAR found 90% of genes as containing at least one mutation using less than 40% of screening resources
Background When screening for mutations, researchers usually focus on coding sequence Not enough to show relationship between mutation and disease Ex. Age-related macular degeneration Today’s techniques: Single strand conformational polymorphism analysis (SSCP) Denaturing high-performance liquid chromatography Automated DNA sequencing
Background SSCP Compares conformational differences in strands of DNA of the same length (1) Denaturing high-performance liquid chromatography Compares two or more chromosomes as a mixture of denatured and reannealed PCR amplicons, revealing the presence of a mutation by the differential retention of homo- and heteroduplex DNA on reversed-phase chromatography supports under partial denaturation (2)
Background Through own work, found disease-causing variations are not uniformly distributed throughout sequence Ex. Bardet-Biedl: Restrict to patients with retinitis pigmentosa with ulnar polydactyl Disease-causing mutations more likely lie in structural and functional regions
Materials and Methods List of 710 genes obtained via OMIM Cross-referenced with transcripts in Ensembl Release NCBI31 Gene structure and annotated protein domains obtained from Ensembl Information on mutation locations obtained from OMIM Secondary structure prediction performed by nnPredict
Materials and Methods x = nucleotide position W s = PAR window size N x = No. distinct annotation elements W(i) = PAR window function A f (x,j) = annotation function for jth annotation at xth position A s (x,j) = annotation score for jth annotation at xth position A o (x,j) = annotation scalar offset A m (j) = annotation multiplier for jth annotation feature
Materials and Methods
Impractical to perform manually for every gene in candidate set Graphic representation of gene structure of EFEMP1 gene and corresponding PAR values
Materials and Methods Regions in each gene were identified that maximized PAR function Primer pair positions selected consistent with default parameters of Primer3 until at least one mutation flanked
Materials and Methods Other methods used for comparison Serial Generates minimally overlapping primer pair positions for each exon with same PCR product size requirements Models traditional screening approach Examines complete coding sequence Random Selects region from any transcript without replacement Continues to select with minimal overlap Complete screening with laboratory information management system (LIMS)
Results - Efficiency PAR Found 90% of mutations with 60% coverage Serial Linear: 90% at 90%, 100% at 100% Random: Fell short of identifying 100% of mutations
Results
Results – Figure 2 PAR 819 mutations identified in 350 distinct genes using a single best PAR-selected region per gene Corresponds to 18% of mutations in approximately half the transcripts Of 1,908,911 nucleotides, PAR selected only 168,980 One mutation was identified in 50% of genes with only 9% of total transcript screened
Results
Results – Figure 3 Serial Linear relationship between screening resource utilization and number of genes PAR Identified 90% of genes with 60% reduction in screening resources Only one primer pair in each transcript was evaluated and nearly 40% of transcripts found to contain at least one mutation
Discussion History of genetic screening PCR Lengthy clinical work Therefore, always evaluated entire coding sequence in all patients Explains current use of serial screening
Discussion Changes More common diseases being analyzed More available patients Availability of genomic sequence Develop PCR-based assay in less than a day with algorithms More involvement from other professions (engineers, statisticians) Supply tools to keep track of experiments Realization that many disease-causing mutations do not affect coding sequences
Discussion Advantages of PAR Effective use of gene annotation Prioritizes gene segments for screening Conservation of protein structure Focus on gene segments vs. entire gene Evident that likelihood of finding disease-causing variation in a gene falls with each exon screened with no positive result Serial approach screens all no matter what PAR screens a section with an average chance of finding mutation
Conclusion Consideration of parameters resulted in significantly higher discoveries per unit of effort Algorithm can be easily modified and expanded Most useful for large number of candidate genes in large number of patients Select best two or four regions in each candidate gene Screen all as initial screening strategy Additional screening based on findings from first round and PAR algorithm Clear PAR approach is preferable to serial screening
References (1) "Single Strand Conformation Polymorphism." Wikipedia. 28 May Sept (2) "Single Strand Conformation Polymorphism." Wikipedia. 28 May Sept