Genetics in Clinical Research Jonathan L. Haines, Ph.D. Center for Human Genetics Research 7/16/04
CLASSES OF HUMAN GENETIC DISEASE Diseases of Simple Genetic Architecture –Can tell how trait is passed in a family: follows a recognizable pattern –One gene per family –Often called Mendelian disease –“Causative” gene Diseases of Complex Genetic Architecture –No clear pattern of inheritance –Moderate to strong evidence of being inherited –May be: common in population: dementia, stroke, tremor, etc. Rare in population: Adverse drug response, primary lateral sclerosis, etc. –Involves many genes or genes and environment –“Susceptibility” genes –This is your trait!
COMMON COMPLEX DISEASE Complex Disease GeneticistsClinicians Biostatisticians Environment Epidemiologists Genotyp e Phenotyp e Analysis
Why Test the Genes? Basic Science: Better understanding of biology of disease –Direct probe into functional pathways –Target for detecting interacting factors –Better definition of disease Clinical Science: Making this knowledge useful –Improved diagnostic testing –Presymptomatic testing –Improved prognostic testing –Improved treatment (e.g. pharmacogenetics)
Why test the Genes? Practical reasons Can help make sense of results –If there is a lot of variability, it may be due to genetics –Can clean up the analysis and find significant results! –Can add a sexy new component to your study –It can be easy and cheap through the GCRC! Virtually all GCRC studies have a potential genetic component Pilot data can lead to larger funded studies
DISEASE GENE DISCOVERY Broad Search (Genomic screen) –Examine a large but representative subset of all genomic variations. Not hindered by poor assumptions of biology. –Use families with more than one affected individual. –Problem: Lots of genes at the same location! Targeted Search (Candidate genes) –Examine a specific and small set of candidate variations based on what we know about the biology of the disease. –Can use both families with multiple affected individuals and families with only one affected individual. –Problem: There are 50,000 genes and we know very little about their function!
Genome Project
Genome Toolbox Physical map: genome sequence –multiple different species Genetic map: recombination Linkage disequilibrium map: HapMap Variation maps –Repeats –Deletions –Duplications –SNPs Homology maps Hardware Technology –Sequencing –Genotyping Software Technology –Public databases –Analysis programs Increased productivity –Experiments now possible that were considered impossible just 2 years ago
Large FamiliesSmall Families Linkage Analysis Association Studies Family-BasedCase-Control Study Designs
Association Study Designs Family-based analysis –Two flavors Trio (patient and both parents) Discordant sibpairs –Multiple statistical methods for analysis –Advantage: inherent control for genetic background –Disadvantage: family-based Case-Control –Standard epidemiological design –Statistical methods logistic or linear regression Statistical genetics methods Case only –Outcomes analysis
Genetic Association Analysis Can incorporate gene/gene interactions –Look at two or more genes at a time Logistic regression MDR Can incorporate gene/environment interactions –Logistic regression –MDR
Need To Characterize The Gene Use genome databases to get known information –Gene location (NCBI, Ensembl, Celera) –Gene structure (NCBI, Ensembl, Celera) –Possible gene functions (OMIM, NCBI, KEGG) –Gene expression (tissue localization) (NCBI) –Gene variation (HapMap, dbSNP, Celera, OMIM) Deletions Mutations SNPs LD relationships
Gene Characterization Choose what variants to examine Decide if further polymorphism discovery is needed Many factors to be considered: –Frequency of variant –Genotyping platform and assay development –LD relationships –Availability and quality of DNA
Essential Problem in Choosing the Gene(s) to Study How do we integrate all the available information that we and others generate? How do we locate the one or few genetic variations involved in our trait in the sea of hundreds or thousands of possible variations? Most methods identify a set, often a large set, of possible variations.
Genomic Convergence Genomic convergence identifies the intersection of genes found through multiple methods such as drug metabolism, allelic association analysis, and gene expression studies. Metabolism Association Expression
Using Your Time and Effort Wisely Design your study. Genetics can be added easily and will only benefit, not hinder, the main study Do not waste time on the details! We have the expertise to help make it happen.
Core Services ( Family Ascertainment Core – th Ave S., Suite 100 –Kelly Taylor, MS; Manager DNA Resources Core –518 Light Hall –Cara Sutcliffe, MS; Manager Genetic Data Analysis Core – Light Hall –Chun Li, Ph.D.; Faculty Advisor Computing/ Bioinformatics Core – Light Hall –Janey Wang, MS 2; Manager
Family Ascertainment Core Faculty advisor: Jeff Canter
Family Ascertainment Core Services IRB and Protocol Development Patient/Family Ascertainment –Identify and recruit participants Clinic, local, distant –Data collection Family history, clinical, demographic –Biological sample collection Phlebotomy Buccal washes Finger sticks Project and Data Management –Progeny pedigree and ascertainment database –PEDIGENE clinical and genetic database –Template forms IRB Family History Clinical –Limited access, locked file room
DNA Resources Core Faculty Advisor: Doug Mortlock
DNA Resources Core Services DNA extraction –Blood –Buccal (wash, brush) –Cell Pellets Sample tracking and storage –Web-based Oracle database –PI-controlled access –Bar-coded, standardized storage in locked cold room DNA quantitation Initiation of lymphoblast cell lines Microsatellite genotyping SNP genotyping Storage of cell lines (LN 2 freezers)
DNA Resources Core Resources 4 Staff Automated DNA large and small volume extraction (Autopure) Locked cold room, liquid nitrogen freezers Bar-coding, RPIDs, web-based database Hitachi FMBIO II laser scanner (fluorescent dyes) ABI 7900HT (high-throughput SNP genotyping)
Genetic Data Analysis Core Faculty Advisor: Chun Li
Genetic Data Analysis Core Services Linkage Analysis –Parametric lod scores Non- parametric scores –Two disease loci Linkage Disequilibrium Analysis –Case-control –Family-based (TDT, S-TDT, PDT) Gene-Gene Interactions Gene-Environment Interactions –Logistic Regressions –MDR analysis Quantitative Trait Locus Analysis Marker reference maps Error detection –Mendelian checks –Haplotype checking –Pedigree relationship checking Consultation on study design Training on use of software Data Management
Genetic Data Analysis Core Resources 7 Staff PEDIGENE database –Clinical –Family history –Genotyping Latest genetic analysis software –Testing of new programs and methods –Experience with strengths and weaknesses Access to –7 PCs –6 Unix –12 Linux systems –VAMPIRE
Computing/Bioinformatics Core Faculty advisor: Marylyn Ritchie
Computing/Bioinformatics Core Services Complete database services –Support of PEDIGENE –Data collection Web-based data entry Teleforms scannable forms –Oracle expertise –Build custom databases –Extend current databases Bioinformatics Support –Programming/scripting –Web-design
Getting your GCRC Genetic Study Done Develop Protocol: –New protocol: Contact Kelly Taylor, she’ll do most of the work –Existing protocol: In many cases existing DNA addendum will work Present to GCRC for approval –<100 DNAs, <100 genotypes, no special approval needed –>100 DNAs, >100 genotypes, Genetics subcommittee must approve. Perform study –DNA collection can be done on GCRC or by FAC –DNA extraction by DNA Resources core –Genotyping by DNA Resources core Genetic Analyses –Can be done by DMAC (additional fee)