Genetic Variations Lakshmi K Matukumalli. Human – Mouse Comparison.

Slides:



Advertisements
Similar presentations
Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
Advertisements

Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Alleles = A, a Genotypes = AA, Aa, aa
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
Single Nucleotide Polymorphism Copy Number Variations and SNP Array Xiaole Shirley Liu and Jun Liu.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
MALD Mapping by Admixture Linkage Disequilibrium.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Population Genetics (Ch. 16)
14 Molecular Evolution and Population Genetics
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
Polymorphism Structure of the Human Genome Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
CSE 291: Advanced Topics in Computational Biology Vineet Bafna/Pavel Pevzner
Phasing of 2-SNP Genotypes Based on Non-Random Mating Model Dumitru Brinza joint work with Alexander Zelikovsky Department of Computer Science Georgia.
Population Genetics direct extension of Mendel’s laws, molecular genetics, and the ideas of Darwin Instead of genetic transmission between individuals,
Computational Molecular Biology Biochem 218 – BioMedical Informatics Simple Nucleotide.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Population Genetics 101 CSE280Vineet Bafna. Personalized genomics April’08Bafna.
Population Genetics Learning Objectives
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
Broad-Sense Heritability Index
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Advanced Algorithms and Models for Computational Biology -- a machine learning approach Population Genetics: SNPS Haplotype Inference Eric Xing Lecture.
Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph.
CS177 Lecture 10 SNPs and Human Genetic Variation
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
CSE280Vineet Bafna CSE280a: Algorithmic topics in bioinformatics Vineet Bafna.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Copyright © 2008 Pearson Education Inc., publishing as Pearson Benjamin Cummings Chapter 23 The Evolution of Populations.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Quantitative Genetics
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
Genes in human populations n Population genetics: focus on allele frequencies (the “gene pool” = all the gametes in a big pot!) n Hardy-Weinberg calculations.
INTRODUCTION TO ASSOCIATION MAPPING
Allele Frequencies: Staying Constant Chapter 14. What is Allele Frequency? How frequent any allele is in a given population: –Within one race –Within.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.

Molecular Markers CRITFC Genetics Workshop December 8, 2015.
In The Name of GOD Genetic Polymorphism M.Dianatpour MLD,PHD.
Vineet Bafna CSE280A CSE280Vineet Bafna. We will cover topics from Population Genetics. The focus will be on the use of algorithms for analyzing genetic.
NEW TOPIC: MOLECULAR EVOLUTION.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Objective: Chapter 23. Population geneticists measure polymorphisms in a population by determining the amount of heterozygosity at the gene and molecular.
The plant of the day Pinus longaevaPinus aristata.
Types of genome maps Physical – based on bp Genetic/ linkage – based on recombination from Thomas Hunt Morgan's 1916 ''A Critique of the Theory of Evolution'',
Genetics of Gene Expression BIOS Statistics for Systems Biology Spring 2008.
8 and 11 April, 2005 Chapter 17 Population Genetics Genes in natural populations.
The Haplotype Blocks Problems Wu Ling-Yun
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Simple-Sequence Length Polymorphisms
Evolution and Population Genetics
Topics How to track evolution – allele frequencies
Gil McVean Department of Statistics
Population Genetics direct extension of Mendel’s laws, molecular genetics, and the ideas of Darwin Instead of genetic transmission between individuals,
Genome-wide Associations
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Outline Cancer Progression Models
A modest but significant effect of CGB5 gene promoter polymorphisms in modulating the risk of recurrent miscarriage  Kristiina Rull, M.D., Ph.D., Ole.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Presentation transcript:

Genetic Variations Lakshmi K Matukumalli

Human – Mouse Comparison

Ploidy (Down’s Syndrome) Structural Variations Inversions Translocations Segmental duplications

Single nucleotide polymorphisms Short Indels Simple sequence repeats Copy number variants Loss of heterozygosity Microsatellite (2-9 bp core repeat) Minisatellite (10-60 bp core repeat) Copy number variants Molecular Variations

Type of polymorphisms TCTC Single-nucleotide Polymorphism (SNP) 5’ Flanking region Promoter 5’ Untranslated region ATG Coding Nonsynonymous polymorphism GAG Asp GUG Val Intron Transcript Synonymous polymorphism GAU Asp GAC Asp Coding End 3’ Untranslated region Insertion/deletion polymorphism (indel) TAACGG TA GG 3’ Flanking region

Choosing the Technology

Extent of Variation (Human Genome) > 5 million SNPs (dbSNP) Recent genome analysis of diploid individual showed 4.1 million DNA variants, encompassing 12.3 Mb. - 3,213,401 single nucleotide polymorphisms (SNPs), - 53,823 block substitutions (2–206 bp), - 292,102 heterozygous insertion/deletion events (indels)(1–571 bp), - 559,473 homozygous indels (1–82,711 bp), - 90 inversions, - Plus segmental duplications and copy number variations. Non-SNP DNA variation accounts for 22% of all events, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants.

Importance of SNPs and other variants Study Genetic variation in diverse populations in any species to understand evolutionary origins and history, estimate population size, breeding structure, or life-history characters Migration within and between sub-populations Understand evolutionary basis for maintenance of genetic variation and speciation. Applications Genetic association of traits Effects on gene expression (e.g., synonymous vs nonsynonymous / TF binding sites) DNA finger printing or sample tracking

Fine Mapping with SNP Markers Advantages of SNPs as genetic markers as compared to microsatellites. High abundance Distribution throughout the genome Ease of genotyping Improved accuracy Availability of high throughput multiplex genotyping platforms

SNP Discovery - Sanger sequencing (EST)

SNP Discovery - Diploids (heterozygous loci)

SNP-PHAGE (Software package) Important steps are Primer development Primer testing Sequencing Base calling, Sequence assembly Polymorphisms analysis Haplotype analysis GenBank submission of confirmed polymorphisms Primers Sequence Variation 5’ amplicons 3’ amplicons SNP Pipeline for Haplotype Analysis and GEnbank (dbSNP) submissions.

Application of Machine Learning in SNP Discovery Inputs Machine Learning Program Planning and Reasoning Outputs Model (Tree / Rules) Model (Tree /Rules) Inputs Outputs Training modeTesting/Prediction mode Steps: Parameter Selection Parameter Optimization Testing Implementation. Results: Achieved substantial improvement in the accuracies as compared to using only polybayes or polyphred. Objective: Reduce human intervention by using expert annotated dataset for training a Machine learning (ML) program and use it to differentiate good/bad polymorphisms

SNP Discovery using next generation sequencers Short sequences bp long at a fraction of cost. Reduced Representation Sequencing Digest genomic DNA with restriction enzyme Screen based on in silico digestion Size select based on Repetitive DNA Number of fragments Sequencing platform Allows “targeted” deep sequencing of pools of DNA Randomly distributed Cost / Mb ABI $ $160 Solexa $5

SNP Discovery - Bioinformatics Strategies to maximize performance High quality score stringencies For each read At base for putative SNP Require single map location of a 23-bp “tag” (and 4-bp restriction site) Allow only one single base pair difference match for a putative SNP Reduces repeat content Reduces gene family/paralog false positives Require 2 copies of each allele – assembly can count as 1

Predicted & Observed Minor Allele Frequency

Population Genetics Population genetics is the study of the allele frequency distribution and change under the influence of the four evolutionary forces: natural selection, genetic drift, mutation and gene flow. It attempts to explain phenomena as adaptation and speciation. ( X Variation

Population Genetics Neutral theory : Rate at which new genetic variants are formed is equal to the loss of genetic diversity due to drift. C/T C/C T/T Genotypes : CT, CC, TT Alleles : C and T Genotyping of a population of 1000 individuals for a SNP resulted in 100, 500 and 400 genotypes for CC, CT and TT respectively Genotype Frequencies: CC (0.1), CT (0.5) and TT(0.4) Allele Frequencies: C (p) = ( )/2000 = 0.35 (minor allele -- MAF) T (q) = ( )/2000 = 0.65 (major allele) Hardy-Weinberg Equilibrium: Expected genotype frequencies are p 2, 2pq and q 2 (122, 422 and 455) HWE Deviations: Drift, Selection, Admixture etc.,

Useful to partition genetic variation into components: within populations between populations among populations Sewall Wright’s Fixation index (Fst is a useful index of genetic differentiation and comparison of overall effect of population substructure. Measures reduction in heterozygosity (H) expected with non-random mating at any one level of population hierarchy relative to another more inclusive hierarchical level. Fst = (HTotal - Hsubpop)/HTotal Fst ranges between minimum of 0 and maximum of 1: = 0  no genetic differentiation << 0.5  little genetic differentiation >> 0.5  moderate to great genetic differentiation = 1.0  populations fixed for different alleles Fst

Genotype – Phenotype Association (Significance of Haplotypes)

Haplotype inference The solution to the haplotype phasing problem is not straightforward due to resolution ambiguity Computational and statistical algorithms for addressing ambiguity in Haplotype Phasing: 1) parsimony 2) phylogeny 3) maximum-likelihood 4) Bayesian inference

Linkage disequilibrium (LD) Non-random association of alleles at two or more loci, not necessary in the same chromosome. LD is generally caused by interactions between genes; genetic linkage and the rate of recombination; random drift or non-random mating; and population structure. B 1 B 2 Total A 1 p 11 = p 1 q 1 + D p 12 = p 1 q 2 - Dp 1 A 2 p 21 = p 2 q 1 - D p 22 = p 2 q 2 + D p 2 Totalq 1 q 2 1 Let A and B be two loci segregating two alleles each; a1 and a2 with frequencies p1 and p2 in A, and b1 and b2 with frequencies q1 and q2 in B. A B

D = p 11 - p 1 q 1 D depends on the allele frequencies at A and B. D’ a scaled version of D: Linkage disequilibrium (cont) D min(p 1 q 1, p 2 q 2 ) D’ = If D < 0 D min(p 1 q 2, p 2 q 1 ) If D > 0

Squared correlation coefficient Linkage disequilibrium (cont) r 2 = D2D2 p1p2q1q2p1p2q1q2 * The measure preferred by population geneticists * Is independent of of allele frequencies * Ranges between 0 and 1 * r 2 = 1 implies the markers provide exactly the same information * r 2 = 0 when they are in perfect equilibrium

Visualizing LD 2.4Linkage disequilibrium (cont)

Visualizing LD