Introduction to Genetics Debashis Ghosh Professor and Chair, Biostatistics and Informatics, ColoradoSPH
Question we tackle today What do we mean by a gene? Steve Mount (ongenetics.blogspot.com): “A gene is all of the DNA elements required in cis for the properly regulated production of a set of RNAs whose sequences overlap in the genome. ” Mark Gerstein (2007, Genome Biology): “The gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products”
What is a gene? No ``one-size-fits-all” definition The previous definitions are useful to contextualize data that are generated from experiments Thinking carefully about evolution and the constraints it has placed on functions is also important
From Genotype to Phenotype Full genotypes (genomes) are coming…But inheritance is complex Genetic markers are characters inherited in a way that is simple enough to easily track Want to find genetic markers that explain or predict phenotypes e.g., disease, susceptibility Ideally, the marker would be causative But that is rare
Alleles as Genes At each gene locus, we have two alleles, one transmitted to us by our father, and one by our mother. Usual assumption: Each parent randomly transmits one of his/her alleles to the child For real datasets, this is identical to DNA variants referred to as single-nucleotide polymorphisms (SNPs)
Diploid Inheritance From Mom From Dad Heterozygote From Mom From Dad Homozygote
Phenotypic Dominance From Mom From Dad Heterozygote Light blue dominant Dark blue recessive Mixed Dominance Dark blue dominant Light blue recessive
Recessive Phenotype Only Visible in Homozygote Diploid Inheritance Dark Blue Is Dominant Heterozygote Recessive Phenotype Only Visible in Homozygote Homozygote
Mendelian Ratios
Recombination From Grandma From Grandpa Chromosomal Segment in Mom (she’s a diploid, remember) From Mom From Dad Chromosomal Segment in You (You’re diploid too)
Sister Chromatids Recombine (Cross Over) During Meiosis Crossing Over From Grandma From Grandpa Sister Chromatids Recombine (Cross Over) During Meiosis Inherited by You Lost (Except in Tetrad Analysis) Products of Meiosis
Recombination: Basic Points Recombination switches which chromosome in the parent (i.e., originating from which grandparent) is passed along to the offspring Alleles physically adjacent on a chromosome are more likely to be passed on together than alleles far apart Alleles very far apart or on different chromosomes are inherited randomly
Finding Disease Genes Assemble data set of probands Assemble data set of control population Might have pedigree if runs in families Might have trios to determine linkage Proband plus two parents Look for linkage between genetic markers and disease In pedigree In dataset of less related individuals
Genetic Markers Polymorphic in population Haplotype Different variants in different individuals Single Nucleotide Polymorphism (SNP) Variable Number of Tandem Repeats (VNTR) minisatellites Short Tandem Repeats (STR) Microsatellites Very high mutation rate: strand slippage Haplotype A set of closely linked SNPs inherited as unit
Linkage Analysis Set of variable markers distributed throughout genome Identify linkage regions (haplotypes) that cosegregate (are inherited) with disease or trait
Pedigree Analysis Tabulate the occurrence of a trait in an extended family Pedigree is family’s mating history
Assumptions and Complications Single gene with Mendelian inheritance Best use of extended families Few extended families with trait Quantitative traits are multigenic Includes most widespread or “common” inherited diseases Sib pairs are best for complex traits with incomplete penetrance (see next slide)
Incomplete Penetrance Not everyone with genotype will have the disease Delayed or adult onset Mild or undetectable symptoms Environmental and developmental factors Unknown genetic factors Disease allele = increase probability of disease, relative risk We don’t always know in pedigree who has the disease genotype!
Evaluating Linkage Remember, individual is a recombinant with respect to two genes, A and B, if inherits the allele from one parental chromatid at A and inherits the allele from the other parental chromatid at B The recombination fraction is the probability that a child is recombinant If A and B are tightly linked, then is small
Simple LOD Scores Total number of offspring, P Number of recombinant offspring, R Likelihood of the Data = Maximum likelihood estimate LOD score for linkage in pedigree is
Complications Need to know phase, genotypes of parents, to identify recombinants Can estimate informativeness of additional data depending on heterozygosity of markers Many disease versus marker comparisons are involved Multiple comparisons But, markers are not independent Population structure LOD scores > 3 (1000:1) give general sense; >5 very strong
Population structure Genetic markers have different patterns in different populations; this has the possibility of confounding associations between genetic markers with disease phenotypes.
Realistic Complications Include Penetrance(X|G) Likelihood of observing trait X given the genotype G Prior(G) Likelihood of observing the genotype in an individual Transmit(Gm|Gk,Gl, ) Probability that offpring will have genotype Gm given parental genotypes Gk and Gl, and the recombination parameter
LOD Graph Can look at LOD score over a range of 's, not just MLE. Usual assumption is LOD > 3 is evidence for linkage, LOD < -2 is evidence for exclusion Example: 27 recombinants Out of 139 gametes (example from S. Purcell)
Recombination Probability and Distance along Chromosome Recombination does not increase linearly Multiple recombination events possible over greater distances, but also interference Can estimate genetic distance from recombination rates Measure in Morgans, or cM the expected number of crossovers, is additive
Mapping Functions Haldane’s mapping function Crossovers are assumed random and independent Kosambi’s mapping function Models interference: crossovers not too close Most popular
Genetic versus Physical Mapping is not simple Recombination rate varies along chromosomes Male versus Female Men 28.51M over whole genome 1.05 Mb/cM Women 42.96M (excluding X) 0.88 Mb/cM In Drosophila, about 0.4 Mb/cM
Modeling Penetrance Single locus, three genotypes If Disease is Mendelian dominant Disease is Mendelian recessive Spontaneous mutations: incomplete penetrance:
Extending Analysis SNPs scattered throughout genome LOD scores for regions, not individual marker Multipoint linkage analysis Establish order relationship among 3+ markers Non-parametric analysis can be better for complex traits, incomplete penetrance Work with affected siblings Less statistical power than model-based methods Identical by descent (IBD) versus chance
Non-Parametric Concerning siblings or other relatives Need “both affected” and “only one affected” pairs Correlate shared IBD alleles with affected state, proportion in two classes High correlation means linkage to disease Mention T1D
(Genomewide) Association Studies Correlate markers with disease over a large population Marker may be disease (rare) Large regions of chromosome in linkage disequilibrium with disease allele Marker is in disease gene haplotype Regions of chromosome tend to be inherited as a unit Tapers off over time due to recombination
Association Studies Linkage disequilibrium varies among populations Depends on population structure, age coalescent Europeans have a lot, African populations only a little Population of human origin is more diverse, older Need dense, cheap markers over genome: Genome Wide Association Studies (GWAS)
QTL and GWAS Quantitative Traits, polygenic traits that are assumed to have additive effects Height, heart disease Quixotic Trait Loci? Each gene has a small effect Huge genotyping efforts now paying off BUT only a small fraction of genetic component is accounted for even in huge studies Tradeoffs of including broader human population
Common Disease versus Rare Variants Common disease, common variants: The most frequently occurring alleles/SNPs should explain most of the etiology of a disease. - Current studies do NOT show this to be the case. Newer paradigm: rare variants - occur less frequently but have larger associations with disease
Sullivan, Daly and Donovan, Nature Reviews Genetics, 2012
Different results in different populations Heritability What makes a gene matter to a disease? Take advantage of human phenotyping What genes CAN contribute to disease or modification of disease? A golden age of personal genomics?
Acknowledgments David Pollock, Biochemistry and Molecular Genetics