Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
Sequence variations (polymorphisms) A reference sequence of the human genome is available… … but every individual is unique, and is different from others at millions of nucleotide locations genetic polymorphisms
Our research interests 1. How to find genetic polymorphisms? ? ? ? ? 2. How to use variation data to track our pre-historic past? 3. How to utilize polymorphism data for medical research?
Tools for polymorphism discovery SNP discovery in clonal sequences
Redevelopment and expansion Homozygous T Homozygous C Heterozygous C/T Automated detection of heterozygous positions in diploid individual samples (visit Aaron Quinlan’s poster)
Redevelopment and expansion Discovery of short deletions/insertions (both bi-allelic and micro-satellite repeats)
Redevelopment and expansion Improve the detection of very rare alleles by taking into account recent results in Population Genetics (i.e. a priori, rare alleles are more frequent than common alleles) Developing a rigorous statistical framework both for heterozygote polymorphisms and INDELs Calculating a probability value that a SNP found in one set of samples will also be present in another Complete software rewrite Graphical User Interface (GUI) Ease of use for small laboratories without UNIX expertise
Genetic and epigenetic changes in cancer changes in DNA methilation, histone modification copy number changes, chromosomal rearrangements nucleotide changes, short insertions / deletions We want to develop tools for detecting inherited polymorphisms and somatic mutations in a variety of new data types, representing both genetic and epigenetic changes
Human pre-history
Demographic history European data African data bottleneck modest but uninterrupted expansion
Tools for Medical Genetics The polymorphism structure of individuals follow strong patterns
The international HapMap project However, the variation structure observed in the reference DNA samples… … often does not match the structure in another set of samples such as those used in a clinical case-control association study aimed to find disease genes and disease-causing genetic variants
Tools to test sample-to-sample variability Instead of genotyping additional sets of (clinical) samples with costly experimentation, and comparing the variation structure of these consecutive sets directly… … we generate additional samples with computational means, based on our Population Genetic models of demographic history. We then use these samples to test the efficacy of gene-mapping approaches for clinical research.
Tools to test sample-to-sample variability computational sample experimental sample (visit Dr. Eric Tsung’s poster)
Tools to connect genotype and clinical outcome clinical endpoint (adverse drug reaction) computational prediction based on haplotype structure genetic marker (haplotype) in genome regions of drug metabolizing enzyme (DME) genes functional allele (known metabolic polymorphism) molecular phenotype (drug concentration measured in blood plasma)
The Computational Genetics Lab