BI820 – Seminar in Quantitative and Computational Problems in Genomics Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA 02467
Sequence variations Human Genome Project produced a reference genome sequence that is 99.9% common to each human being sequence variations make our genetic makeup unique SNP Single-nucleotide polymorphisms (SNPs) are most abundant, but other types of variations exist and are important
Why do we care about variations? phenotypic differences inherited diseases demographic history
How do we find polymorphisms? look at multiple sequences from the same genome region diverse sequence resources can be used EST WGS BAC diversion: sequencing informatics
SNP discovery -- Methods Sequence clustering Cluster refinement Multiple alignment SNP detection
SNP discovery – Computer tools
SNP discovery – Mining Projects ~ 30,000 clones >CloneX ACGTTGCAACGT GTCAATGCTGCA >CloneY ACGTTGCAACGT GTCAATGCTGCA 25,901 clones (7,122 finished, 18,779 draft with basequality values) 21,020 clone overlaps (124,356 fragment overlaps) ACCTAGGAGACTGAACTTACTG 507,152 high-quality candidate SNPs (validation rate 83-96%) Marth et al., Nature Genetics 2001 ACCTAGGAGACCGAACTTACTG
SNP databases and characteristics access to variation data SNP properties reliability of information characterizing known polymorphic sites in sample collections – genotyping
Where do variations come from? sequence variations are the result of mutation events TAAAAAT TAACAAT TAAAAAT TAACAAT MRCA mutations are propagated down through generations TAAAAAT TAACAAT
Mutation rate higher mutation rate (µ) gives rise to more SNPS MRCA accgttatgtaga accgctatgtaga MRCA actgttatgtaga accgctatataga MRCA
Recombination accgttatgtaga accgttatgtaga accgttatgtaga accgttatgtaga
Demographic history large (effective) population size N small (effective) population size N different world populations have varying long-term effective population sizes (e.g. African N is larger than European)
Modeling history stationary collapse expansion bottleneck past present MD (simulation) AFS (direct form)
Ancestral inference modest but uninterrupted expansion bottleneck
The effects and signatures of selection selective mutations influence the genealogy itself; in the case of neutral mutations the processes of mutation and genealogy are decoupled
Allelic association and haplotype structure “linkage disequilibrium” “haplotype blocks”
Computer simulations: the Coalescent
? Medical utility? clinical phenotype molecular markers functional understanding
Mapping disease-causing loci genetic linkage association between allele and phenotype
Forensic applications