Eran Halperin November 10, 2009

Slides:



Advertisements
Similar presentations
Statistical methods for genetic association studies
Advertisements

Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Objectives Cover some of the essential concepts for GWAS that have not yet been covered Hardy-Weinberg equilibrium Meta-analysis SNP Imputation Review.
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Basics of Linkage Analysis
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Published Genome-Wide Associations through ,617 published GWA at p≤5X10 -8 for 249 traits Autism marker Multiple Sclerosis Marker The GWAS Human.
MALD Mapping by Admixture Linkage Disequilibrium.
Single nucleotide polymorphisms and applications Usman Roshan BNFO 601.
Workshop in Bioinformatics Eran Halperin. The Human Genome Project “What we are announcing today is that we have reached a milestone…that is, covering.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
Single nucleotide polymorphisms Usman Roshan. SNPs DNA sequence variations that occur when a single nucleotide is altered. Must be present in at least.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
CSE 291: Advanced Topics in Computational Biology Vineet Bafna/Pavel Pevzner
CSE182-L17 Clustering Population Genetics: Basics.
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
SNPs DNA differs between humans by 0.1%, (1 in 1300 bases) This means that you can map DNA variation to around 10,000,000 sites in the genome Almost all.
Single nucleotide polymorphisms and applications Usman Roshan BNFO 601.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
Evolution of Populations
Broad-Sense Heritability Index
Linear Reduction for Haplotype Inference Alex Zelikovsky joint work with Jingwu He WABI 2004.
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
Case(Control)-Free Multi-SNP Combinations in Case-Control Studies Dumitru Brinza and Alexander Zelikovsky Combinatorial Search (CS) for Disease-Association:
Population Stratification
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
CS177 Lecture 10 SNPs and Human Genetic Variation
Course outline HWE: What happens when Hardy- Weinberg assumptions are met Inheritance: Multiple alleles in a population; Transmission of alleles in a family.
Gene Hunting: Linkage and Association
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Personalized Medicine Dr. M. Jawad Hassan. Personalized Medicine Human Genome and SNPs What is personalized medicine? Pharmacogenetics Case study – warfarin.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.
Allele Frequencies: Staying Constant Chapter 14. What is Allele Frequency? How frequent any allele is in a given population: –Within one race –Within.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
Lecture 21: Quantitative Traits I Date: 11/05/02  Review: covariance, regression, etc  Introduction to quantitative genetics.
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
Association mapping for mendelian, and complex disorders January 16Bafna, BfB.
The International Consortium. The International HapMap Project.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Populations: defining and identifying. Two major paradigms for defining populations Ecological paradigm A group of individuals of the same species that.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
Admixture Mapping Controlled Crosses Are Often Used to Determine the Genetic Basis of Differences Between Populations. When controlled crosses are not.
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
International Workshop on Bioinformatics Research and Applications, May 2005 Phasing and Missing data recovery in Family Trios D. Brinza J. He W. Mao A.
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
SNPs and complex traits: where is the hidden heritability?
Common variation, GWAS & PLINK
Of Sea Urchins, Birds and Men
Constrained Hidden Markov Models for Population-based Haplotyping
Population Genetics As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data.
Genome Wide Association Studies using SNP
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
Presentation transcript:

Eran Halperin November 10, 2009 COMPUTATIONAL HUMAN GENETICS - SEARCHING FOR RELATIONS BETWEEN GENES, DISEASES, AND POPULATIONS Eran Halperin November 10, 2009

Genetic Factors Complex disease Environmental Factors Multiple genes may affect the disease. Therefore, the effect of every single gene may be negligible.

The Human Chromosomes April 05’

………ACCAGGACGA…… ………ACCAGGACGA…… Each chromosome ‘is’ a sequence over the alphabet {A,G,C,T} (base pairs) Copy from mother ………ACCAGGACGA…… ………ACCAGGACGA…… Copy from father

Facts about our genome 23 pairs of chromosomes. X and Y are the sex chromosomes (XX for women, XY for men). 3,300,000,000 base pairs in the human genome

The Human Genome Project “What we are announcing today is that we have reached a milestone…that is, covering the genome in…a working draft of the human sequence.” “But our work previously has shown… that having one genetic code is important, but it's not all that useful.” “I would be willing to make a predication that within 10 years, we will have the potential of offering any of you the opportunity to find out what particular genetic conditions you may be at increased risk for…” Washington, DC June, 26, 2000

The Vision of Personalized Medicine Genetic and epigenetic variants + measurable environmental/behavioral factors would be used for a personalized treatment and diagnosis

Paradigm shifts in medicine

Example: Warfarin An anticoagulant drug, useful in the prevention of thrombosis.

Example: Warfarin Warfarin was originally used as rat poison. Optimal dose varies across the population Genetic variants (VKORC1 and CYP2C9) affect the variation of the personalized optimal dose.

Association Studies Genetic variants such as Single Nucleotide Polymorphisms (SNPs) are tested for association with the trait.

Where should we look first? SNP = Single Nucleotide Polymorphism person 1: ….AAGCTAAATTTG…. person 2: ….AAGCTAAGTTTG…. person 3: ….AAGCTAAGTTTG…. person 4: ….AAGCTAAATTTG…. person 5: ….AAGCTAAGTTTG…. Each common SNP has only two possible letters (alleles).

Disease Association Studies SNP = Single Nucleotide Polymorphism Cases: Associated SNP (high Relative Risk) AGAGCAGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCGGTAGAGCCGTGAGATCGACATGATAGCC AGAGCCGTCGACATGTATAGTCTACATGAGATCGACATGAGATCGGTAGAGCAGTGAGATCGACATGATAGTC AGAGCAGTCGACAGGTATAGTCTACATGAGATCGACATGAGATCGGTAGAGCCGTGAGATCGACATGATAGCC AGAGCAGTCGACAGGTATAGCCTACATGAGATCAACATGAGATCGGTAGAGCAGTGAGATCGACATGATAGCC AGAGCCGTCGACATGTATAGCCTACATGAGATCGACATGAGATCGGTAGAGCCGTGAGATCAACATGATAGCC AGAGCCGTCGACATGTATAGCCTACATGAGATCGACATGAGATCGGTAGAGCAGTGAGATCAACATGATAGCC AGAGCCGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCGGTAGAGCAGTGAGATCAACATGATAGTC AGAGCAGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGCC Controls: Associated SNP (lower Relative Risk) AGAGCAGTCGACATGTATAGTCTACATGAGATCGACATGAGATCGGTAGAGCAGTGAGATCAACATGATAGCC AGAGCAGTCGACATGTATAGTCTACATGAGATCAACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGCC AGAGCAGTCGACATGTATAGCCTACATGAGATCGACATGAGATCTGTAGAGCCGTGAGATCAACATGATAGCC AGAGCCGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGTC AGAGCCGTCGACAGGTATAGTCTACATGAGATCGACATGAGATCTGTAGAGCCGTGAGATCAACATGATAGCC AGAGCAGTCGACAGGTATAGTCTACATGAGATCGACATGAGATCTGTAGAGCAGTGAGATCGACATGATAGCC AGAGCCGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGCC AGAGCCGTCGACAGGTATAGTCTACATGAGATCAACATGAGATCTGTAGAGCAGTGAGATCGACATGATAGTC

Preliminary Definitions SNP – single nucleotide polymorphism. A genetic variant which may carry different ‘value’ for different individuals. Allele – the variant’s value: A,G,C, or T. Most SNPs are bi-allelic. There are only two observed alleles in the populations. Risk allele – the allele which is more common in cases than in controls (denoted R) Nonrisk allele – the allele which is more common in the controls (denoted N)

Relative Risk Risk=G Nonrisk=A Chances of developing type II diabetes: 30% Risk=G Chances of developing type II diabetes: 20% Nonrisk=A Relative Risk: Pr(D|R)/Pr(D|N) = 1.5

Other Structural Variants Inversion Copy number variant Deletion

Published Genome-Wide Associations through 6/2009, 439 published GWA at p < 5 x 10-8 NHGRI GWA Catalog www.genome.gov/GWAStudies

Public Genotype Data Growth HapMap Phase 2 5,000,000+ SNPs 600,000,000+ genotypes 2006 2001 Daly et al. Nature Genetics 103 SNPs 40,000 genotypes Gabriel et al. Science 3000 SNPs 400,000 genotypes 2002 TSC Data Nucleic Acids Research 35,000 SNPs 4,500,000 genotypes 2003 Perlegen Data Science 1,570,000 SNPs 100,000,000 genotypes 2004 NCBI dbSNP Genome Research 3,000,000 SNPs 286,000,000 genotypes 2005

Chance or Real Association? Cases: AGAGCAGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCGGTAGAGCCGTGAGATCGACATGATAGCC AGAGCCGTCGACATGTATAGTCTACATGAGATCGACATGAGATCGGTAGAGCAGTGAGATCGACATGATAGTC AGAGCAGTCGACAGGTATAGTCTACATGAGATCGACATGAGATCGGTAGAGCCGTGAGATCGACATGATAGCC AGAGCAGTCGACAGGTATAGCCTACATGAGATCAACATGAGATCGGTAGAGCAGTGAGATCGACATGATAGCC AGAGCCGTCGACATGTATAGCCTACATGAGATCGACATGAGATCGGTAGAGCCGTGAGATCAACATGATAGCC AGAGCCGTCGACATGTATAGCCTACATGAGATCGACATGAGATCGGTAGAGCAGTGAGATCAACATGATAGCC AGAGCCGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCGGTAGAGCAGTGAGATCAACATGATAGTC AGAGCAGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGCC Controls: Associated SNP (lower Relative Risk) AGAGCAGTCGACATGTATAGTCTACATGAGATCGACATGAGATCGGTAGAGCAGTGAGATCAACATGATAGCC AGAGCAGTCGACATGTATAGTCTACATGAGATCAACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGCC AGAGCAGTCGACATGTATAGCCTACATGAGATCGACATGAGATCTGTAGAGCCGTGAGATCAACATGATAGCC AGAGCCGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGTC AGAGCCGTCGACAGGTATAGTCTACATGAGATCGACATGAGATCTGTAGAGCCGTGAGATCAACATGATAGCC AGAGCAGTCGACAGGTATAGTCTACATGAGATCGACATGAGATCTGTAGAGCAGTGAGATCGACATGATAGCC AGAGCCGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGCC AGAGCCGTCGACAGGTATAGTCTACATGAGATCAACATGAGATCTGTAGAGCAGTGAGATCGACATGATAGTC

How does it work? For every SNP we can construct a contingency table: Total Cases a b Controls c d

Hypothesis testing Null hypothesis: Pr(R|case) = Pr(R|control) Alternative hypothesis: Pr(R|case) ≠ Pr(R|control) The model assumes that all individuals are independent (unrelated), and therefore our sample is a random sample from a Binomial distribution Cases sampled from distribution X~B(n,Pr(R|cases)) Controls sampled from distribution Y~B(n,Pr(R|controls))

Hypothesis testing, cont. When n is large, B(n,p) ~ N(np, np(1-p)). Under the null hypothesis:

P-value Z is called a test-statistic (z-score in this case). We can calculate Z* for our data, and then calculate (using the normal approximation): p-value = Pr(|Z| > |Z*|) Often we take , which is

Results: Manhattan Plots

The curse of dimensionality – corrections of multiple testing In a typical Genome-Wide Association Study (GWAS), we test millions of SNPs. If we set the p-value threshold for each test to be 0.05, by chance we will “find” about 5% of the SNPs to be associated with the disease. This needs to be corrected.

Bonferroni Correction If the number of tests is n, we set the threshold to be 0.05/n. A very conservative test. If the tests are independent then it is reasonable to use it. If the tests are correlated this could be bad: Example: If all SNPs are identical, then we lose a lot of power; the false positive rate reduces, but so does the power.

Challenge 1 Population Substructure

Population Substructure Imagine that all the cases are collected from Africa, and all the controls are from Europe. Many association signals are going to be found The vast majority of them are false; Why ??? Different evolutionary forces: drift, selection, mutation, migration, population bottleneck.

Evolution Theory Mutations add to genetic variation Natural Selection controls the frequency of certain traits and alleles Genetic drift

Mutations AGAGCAGTCGACAGGTATAGCCTACATGAGATCGACATGAGA AGAGCAGTCCACAGGTATAGCCTACATGAGATCGACATGAGA Estimated probability of a mutation in a single generation is 10^-8

Other ‘mutations’ - recombination Copy 1 Copy 2 Probability ri (~10^-8) for recombination in position i. child chromosome

Natural Selection Example: being lactose telorant is advantageous in northern Europe, hence there is positive selection in the LCT gene different allele frequencies in LCT

Genetic Drift Even without selection, the allele frequencies in the population are not fixed across time. Consider the case where we assume Hardy- Weinberg Equilibrium (HWE), that is, individuals are mating randomly in the population. If at the first generation the allele frequencies are p0 (of a) and q0=1-p0 (of A). Under HWE, E[pk+1]=pk, but V[pk+1] > 0, so the next generation will have pk+1≠p0.

The rate of the drift N – effective population size (if all individuals are entirely unrelated than N is the total population size). Under an assumption of constant population size, if Xk counts the number of occurrences of a at generation k, then Xk+1 ~ B(N,pk). E[pk+1] = E[Xk+1]/N = pk. Var[pk+1] = pk(1-pk)/N. The effect of genetic drift depends on the time and the effective populations size. Small population increases the effect.

Bottleneck effect Effective population size Time Genetic drift’s rate is higher.

The Wright-Fisher Model Generation 1 Allele frequency 1/9

The Wright-Fisher Model Generation 2 Allele frequency 1/9

The Wright-Fisher Model Generation 3 Allele frequency 1/9

The Wright-Fisher Model Generation 4 Allele frequency 1/3

The Wright-Fisher Model

The Wright-Fisher Model

Ancestral population

Ancestral population migration

different allele frequencies Ancestral population Genetic drift

Population Substructure Imagine that all the cases are collected from Africa, and all the controls are from Europe. Many association signals are going to be found The vast majority of them are false; What can we do about it?

Jakobsson et al, Nature 421: 998-103

Principal Component Analysis Dimensionality reduction Based on linear algebra (Singular Value Decomposition) Intuition: find the ‘most important’ features of the data – project the data on the axis with the largest variance.

Principal Component Analysis Plotting the data on a one dimensional line for which the spread is maximized.

Principal Component Analysis In our case, we want to look at two dimensions at a time. The original data has many dimensions – each SNP corresponds to one dimension.

Ancestry Inference To what extent can population structure be detected from SNP data? What can we learn from these inferences? Can we build the tree of life? How do we analyze complex populations (mixed)? Novembre et al., Nature, 2008

Challenge 2 Modeling Correlation

A typical associated region

Linkage Disequilibrium

Haplotype Data in a Block (Daly et al., 2001) Block 6 from Chromosome 5q31

Phasing - haplotype inference Haplotypes Genotype þ ý ü î í ì A C CG G T ATCCGA AGACGC mother chromosome father chromosome Cost effective genotyping technology gives genotypes and not haplotypes. ATACGA AGCCGC Possible phases: AGACGA ATCCGC ….

Inferring Haplotypes From Trios 1??11? 1100?? 0100?? 1?0??? 10?11? 11?11? 1100?? 0100?? 100??? 110??? 10011? 11111? 11000? 01001? 1??11? ?100?? 1?0??? Parent 1 122112 Parent 2 210022 120222 Child Assumption: No recombination

Maximum Likelihood Until now we discussed the case of two hypotheses (null, and alternative). In some cases we are interested in many hypotheses and we search for the best one. Normally a hypothesis will be defined by a set of parameters θ. The likelihood of θ is .We are interested in the hypothesis that maximizes the likelihood.

Soft assignment Compute probabilities P={ph} for all possible haplotypes. For each genotype g, we do not assign one pair of haplotypes, but a distribution of possible pairs. The set of pairs of haplotypes compatible with g is denoted as C(g). In soft assignment, a pair is explaining g with probability

Phasing via Maximum Likelihood Soft decision: Hard decision:

An iterative algorithm Data: 1 0 h h 1 h 0 0 1 h 1 h h 1 1 0.4 0.6 0.75 0.25 0 0 0 1 0 1/12 0 0 0 1 1 1/12 1 0 0 0 1 1/12 1 0 0 1 0 1/12 1 0 0 1 1 3/12 1 0 1 0 1 1/12 1 0 1 1 1 2/12 1 1 0 1 1 1/12 1 1 1 1 1 1/12 1 0 0 0 1 1 0 1 1 1 1 0 0 1 1 1 0 1 0 1 ¼ 0 0 0 1 0 1 0 0 1 1 0 0 0 1 1 1 0 0 1 0 ¼ 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 ¼

An iterative algorithm Data: 1 0 h h 1 h 0 0 1 h 1 h h 1 1 0 0 0 1 0 .125 0 0 0 1 1 .042 1 0 0 0 1 .067 1 0 0 1 0 .042 1 0 0 1 1 .325 1 0 1 0 1 .1 1 0 1 1 1 .067 1 1 0 1 1 .067 1 1 1 1 1 .1 0 0 0 1 0 1/12 0 0 0 1 1 1/12 1 0 0 0 1 1/12 1 0 0 1 0 1/12 1 0 0 1 1 3/12 1 0 1 0 1 1/12 1 0 1 1 1 2/12 1 1 0 1 1 1/12 1 1 1 1 1 1/12 1 0 0 0 1 1 0 1 1 1 1 0 0 1 1 1 0 1 0 1 ¼ 0.4 0.6 0 0 0 1 0 1 0 0 1 1 0 0 0 1 1 1 0 0 1 0 ¼ 0.75 0.25 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 ¼ 0.6 0.4

An iterative algorithm Data: 1 0 h h 1 h 0 0 1 h 1 h h 1 1 0 0 0 1 0 1/6 0 0 0 1 1 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 0 1 1 1/2 1 0 1 0 1 1/6 1 0 1 1 1 0 1 1 0 1 1 0 1 1 1 1 1 1/6 1 0 0 0 1 1 0 1 1 1 1 0 0 1 1 1 0 1 0 1 ¼ 1 0 0 0 1 0 1 0 0 1 1 0 0 0 1 1 1 0 0 1 0 ¼ 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 ¼ 1

Expectation Maximization (EM) D – given data Θ– parameters that need to be estimated Z – Latent missing variables

EM rationale Lemma:. Proof: First, note that

QED

MLE from Incomplete Data Finding MLE parameters: nonlinear optimization problem log P(x| ) E ’[log P(x,y| )] Expectation Maximization (EM): Use “current point” to construct alternative function (which is “nice”) 

MLE from Incomplete Data log P(x| ) E ’[log P(x,y| )] 

EM for phasing

This is maximized for:

Phasing summary Expectation maximization is easy to implement, works reasonably well in practice. We can use other models (tree models) to improve the accuracy of the phasing prediction.

Human Genetics – where to? We can typically explain 5%-15% of the heritability of common diseases. Where is the missing heritability? Rare variants Gene-gene interactions Gene-environment interactions Creative computational methods are key to the discovery of the missing heritability.

Course: Computational Human Genetics Semester bet More background in human genetics, statistics, and machine learning. Studying genetics of human disease Privacy and forensics Analysis of new technologies (sequencing) Population genetics – detecting selection, mutation rate, recombination rates, etc. Reconstructing human history