Presentation is loading. Please wait.

Presentation is loading. Please wait.

Research for medical discovery at the Computational Genomics Laboratory at Boston College Biology Gabor T. Marth Department of Biology, Boston College.

Similar presentations


Presentation on theme: "Research for medical discovery at the Computational Genomics Laboratory at Boston College Biology Gabor T. Marth Department of Biology, Boston College."— Presentation transcript:

1 Research for medical discovery at the Computational Genomics Laboratory at Boston College Biology
Gabor T. Marth Department of Biology, Boston College

2 We study genetic variations because…
… they underlie phenotypic differences … cause heritable diseases and determine responses to drugs … allow tracking ancestral human history

3 We are interested in various aspects of genetic variations…
how to discover inherited genetic polymorphisms and somatic mutations that lead to disease? how to model human polymorphism structure to inform medical research? how to select the best genetic markers for clinical case-control association studies? how to use genetic markers to predict individual responses to drugs, including adverse drug reactions?

4 1. 1. We build computer tools for variation discovery
inherited (germ line) polymorphisms predispose to disease the most common type of human polymorphisms are single-nucleotide polymorphisms (SNPs) and short insertion-deletions (INDELs) Marth et al. Nature Genetics 1999 1. A general Bayesian polymorphism discovery tool: Genetic variations are landmarks that allow us to track our genetic ancestry and their genome structure informs us about the molecular and demographic forces that have shaped it. For medical research the most important polymorphisms are disease-causing variants, but non-functional polymorphisms are also useful as markers for linkage and association studies. There is a growing need to find rare, medically important alleles in deep alignments of clonal sequences and diploid sequence traces; to identify large numbers of markers for mapping studies in humans, model organisms, and plants; and to discover informative polymorphisms for pathogen strain identification. With the tremendous sequencing capacity at large sequencing centers and an anticipated jump in sequencing speed medical re-sequencing projects to map out genetic changes leading to and during cancer development are gearing up. This amount of sequence data will require completely automated, versatile, yet highly accurate polymorphism discovery and genotype determination software that does not exist today. The detection of single-nucleotide polymorphisms (SNPs) and short insertion/deletions (INDELs) in DNA sequences is challenging because one must align and compare sequences from varied sources, and differentiate true polymorphisms from sequencing errors. we have developed a computer package, PolyBayes© , for accurate discovery of DNA polymorphisms in clonal sequences

5 Recently received a 5-year research grant from the NIH to expand our SNP detection capabilities…
Homozygous T Homozygous C Heterozygous C/T 1. for automated detection of somatic mutations in diploid individual samples (medical re-sequencing data) 2. for new data types produced by the latest, super-high throughput sequencing technologies Building on our existing software, POLYBAYES, first developed at Washington University, we are currently developing a general polymorphism discovery tool that meets these challenges. We organize fragmentary sequences by layering them upon the genome reference sequence; discard paralogous sequences from similar, duplicated genome regions; and use base quality values in a rigorous, Bayesian scheme to compare sequences of arbitrary quality standards. Specifically, we propose methods to align multi-exon genes, and novel methods for paralog filtering based either on complete mapping information or on genome distributions of sequence divergence. We will develop new algorithms for the difficult problem of INDEL detection; integrate accurate heterozygote polymorphism detection in diploid traces into our software to enable individual genotyping; enhance sensitivity to detect rare alleles; and include a new measure to estimate the true positive rate of our candidate polymorphism predictions. We will implement a fast, reliable, full-functionality discovery tool that is free for academic research, performs well in large discovery projects, but can run on desktop computers, and is easily accessible to Biologists in small or medium laboratories. 3. to address the informatics needs of detecting genetic and epigenetic changes in somatic cells that lead to cancer and that occur during cancer proliferation copy number changes, chromosomal rearrangements changes in DNA methilation, histone modifications

6 2. We quantify the demographic history of human populations using DNA variation data…
stationary collapse expansion bottleneck past history present MD (simulation) 2. Computational models of human variation structure. The genome structure of variations is determined by the molecular mutation process, random genetic drift modulated by recombination and the effects of long-term demographic history, and the various forms of natural selection. Understanding these primary Biological processes is clearly of interest on its own. They are also of great interest for Medical Genetics because these processes govern allelic association, the non-random assortment between marker and disease allele that makes genetic mapping possible. With quantitative models of allelic association we can make rational decisions for marker spacing in a case-control study. With the knowledge of population specific linkage patterns we can predict how well those markers will work within each population, a question that has been a focus of my research. The two main determinants of allelic association are the fine-scale structure of recombination rates along human chromosomes and demographic history. It is reasonable to assume that recombination rate is governed by molecular processes common to all humans hence the population differences in the strength of allelic association are mainly the consequences of differential demographic histories. Allele frequency data is ideal for studying the effects of long-term demography because the allele frequency spectrum is unaffected by variations in mutation or recombination rates. Using a model fitting approach we determine the model structures and quantify the model parameters that best describe the allele frequency data within each population analyzed. AFS (direct form)

7 … and build computational models of human ancestral demographic history that underlies present-day genome polymorphism structure European data African data genetic bottleneck In order to study the same set of polymorphisms in samples from different populations, we must connect the variation structure of these populations. This can be accomplished with coalescent models of geographic subdivision that account for migration between population groups. I will use these models to quantify the differences in the allele frequency of SNPs among different populations and the fraction of private polymorphisms. I will combine the demographic models with simple models of recombination to investigate population differences in haplotype structure: Are regions of reduced haplotype diversity shared among human populations? Are common haplotypes in different populations defined by the same or different set of SNP markers? Given the allele frequency differences, are haplotypes defined by the same markers have similar frequencies in all human populations? modest but uninterrupted expansion

8 3. An large NIH project aims to map out human polymorphism structure to aid gene mapping…
However, the variation structure observed in the reference DNA samples genotyped by the HapMap project… 3. A Population Genetic computational platform for marker selection for clinical case-control association studies: The primary purpose of the genotype data from the HapMap reference samples is the annotation of the strength of allelic association in the human genome at the kilobase scale so one can select informative polymorphic markers for clinical case-control studies. The value of these makers for a clinical study depends directly on the degree to which the association patterns in the HapMap reference individuals recapitulate the association patterns of the clinical samples. For example, one method of marker selection is to choose from all genotyped SNPs in a region a smaller subset that is sufficient to distinguish among (i.e. “tag”) common haplotypes found in the reference samples. If, however, these same haplotypes represent a smaller fraction of observed haplotypes in a collection of clinical samples, the tagging performance of the markers will be degraded, and the power of detecting association reduced. This effect is well documented across samples from different ethnic background, and was an important motivation for genotyping reference samples from different world populations. Although LD patterns within samples of a single population appear more consistent, there is evidence of significant sample variance even across randomly selected subsets of the HapMap reference individuals. … often does not match the structure in another set of samples such as those used in clinical samples used to find disease genes and disease-causing genetic variants

9 … we build computational tools to help the selection of optimal genetic markers for clinical studies. Instead of genotyping additional sets of (clinical) samples with costly experimentation, and comparing the variation structure of these consecutive sets directly… … we generate additional samples with computational means, based on our Population Genetic models of demographic history. We then use these samples to test the efficacy of gene-mapping approaches for clinical research. Sample variance may be assessed directly, by genotyping several, additional sets of individuals from a single population. This undertaking is expensive and hence only feasible for a small number of regions. It is desirable to develop alternative methods that can assess variance across consecutive sets of samples with computational means. We are developing such methods based on a Coalescent technique. With such additional, computational samples in hand we will (1) estimate sample variance of LD within or across populations; (2) extrapolate the performance of markers selected within the HapMap reference samples to a different, possibly larger, collection of clinical samples; and (3) use standard marker selection tools to select a better performing marker set that takes into account both the experimentally determined HapMap genotypes and the computationally generated samples. This collection of methods will be implemented as a publicly available software tool to aid the experimental design of clinical association studies.

10 4. We develop methods to connect genotype and clinical outcome in pharmaco-genetic systems
genetic marker (haplotype) in genome regions of drug metabolizing enzyme (DME) genes clinical endpoint (adverse drug reaction) computational prediction based on haplotype structure 4. Genotype vs. phenotype correlations in gene systems with known functional variants: To translate the reagents of the HapMap project into medical discovery we must understand how to use haplotypes as markers for phenotypic variants. Pharmaco-genetics is an important area where this be studied. For example, uncovering the genetic basis for individual drug responses can pinpoint patient groups that benefit most from a given drug, and others with reduced efficacy. Also, it would be highly desirable to develop tests that can predict adverse drug reactions (ADRs) before they happen. We are currently starting a project to investigate how to use haplotypes as markers for important phenotypic effects in simpler gene systems, ones in which one can directly connect genotype with function. One such system is the set of genes that encode for drug metabolizing enzymes, a small number of which is responsible to break down the majority of known drugs. Many functional alleles in these genes are known; the functional alleles are highly predictive of metabolic speed but not necessarily of potential ADRs. Our goal is to examine if by using complete haplotype information in addition to the known functional alleles we can improve our ability to predict ADRs. This is a collaborative project with researchers at the Marshfield Medical and Research Foundation. molecular phenotype (drug concentration measured in blood plasma) functional allele (known metabolic polymorphism)


Download ppt "Research for medical discovery at the Computational Genomics Laboratory at Boston College Biology Gabor T. Marth Department of Biology, Boston College."

Similar presentations


Ads by Google