Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

Slides:



Advertisements
Similar presentations
Manish Anand Nihar Sheth Jim Costello Univ. of Indiana
Advertisements

Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Julia Krushkal 4/11/2017 The International HapMap Project: A Rich Resource of Genetic Information Julia Krushkal Lecture in Bioinformatics 04/15/2010.
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
Single Nucleotide Polymorphism Copy Number Variations and SNP Array Xiaole Shirley Liu and Jun Liu.
Genome-wide Association Study Focus on association between SNPs and traits Tendency – Larger and larger sample size – Use of more narrowly defined phenotypes(blood.
Linkage Disequilibrium
Efficient Algorithms for Genome-wide TagSNP Selection across Populations via the Linkage Disequilibrium Criterion Authors: Lan Liu, Yonghui Wu, Stefano.
Single Nucleotide Polymorphism And Association Studies
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
Genomics An introduction. Aims of genomics I Establishing integrated databases – being far from merely a storage Linking genomic and expressed gene sequences.
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
A coalescent computational platform for tagging marker selection for clinical studies Gabor T. Marth Department of Biology, Boston College
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
A coalescent computational platform to predict strength of association for clinical samples Gabor T. Marth Department of Biology, Boston College
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
The informatics of SNPs and haplotypes Gabor T. Marth Department of Biology, Boston College Cold Spring Harbor Laboratory Advanced Bioinformatics.
Picking SNPs Application to Association Studies Dana Crawford, PhD SeattleSNPs PGA University of Washington March 20, 2006.
The informatics of SNPs and haplotypes Gabor T. Marth Department of Biology, Boston College Cold Spring Harbor Laboratory Advanced Bioinformatics.
Lecture X.X1. 2 The informatics of SNPs and Haplotypes Gabor T. Marth Department of Biology, Boston College
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Sequence Variation Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
SNP Selection University of Louisville Center for Genetics and Molecular Medicine January 10, 2008 Dana Crawford, PhD Vanderbilt University Center for.
Course Overview Personalized Medicine: Understanding Your Own Genome Fall 2014.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
HAPLOID GENOME SIZES (DNA PER HAPLOID CELL) Size rangeExample speciesEx. Size BACTERIA1-10 Mb E. coli: Mb FUNGI10-40 Mb S. cerevisiae 13 Mb INSECTS.
HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
Bioinformatics SNPs and haplotypes Kristel Van Steen, PhD, ScD Université de Liege - Institut Montefiore
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
Molecular & Genetic Epi 217 Association Studies
CS177 Lecture 10 SNPs and Human Genetic Variation
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
Gene Hunting: Linkage and Association
Informative SNP Selection Based on Multiple Linear Regression
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Molecular & Genetic Epi 217 Association Studies: Indirect John Witte.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Recombination Mapping SNP mapping
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
Lecture 7.01 The informatics of SNPs and haplotypes Gabor T. Marth Department of Biology, Boston College CGDN Bioinformatics Workshop June.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
1 Balanced Translocation detected by FISH. 2 Red- Chrom. 5 probe Green- Chrom. 8 probe.
The HapMap Project and Haploview
The International Consortium. The International HapMap Project.
In The Name of GOD Genetic Polymorphism M.Dianatpour MLD,PHD.
Motivations to study human genetic variation
Simple-Sequence Length Polymorphisms SSLPs Short tandemly repeated DNA sequences that are present in variable copy numbers at a given locus. Scattered.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Global Variation in Copy Number in the Human Genome Speaker: Yao-Ting Huang Nature, Genome Research, Genome Research, 2006.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Simple-Sequence Length Polymorphisms
Single Nucleotide Polymorphisms (SNPs
GENETIC MARKERS (RFLP, AFLP, RAPD, MICROSATELLITES, MINISATELLITES)
Of Sea Urchins, Birds and Men
BI820 – Seminar in Quantitative and Computational Problems in Genomics
Medical genomics BI420 Department of Biology, Boston College
Medical genomics BI420 Department of Biology, Boston College
Haplotypes When the presence of two or more polymorphisms on a single chromosome is statistically correlated in a population, this is a haplotype Example.
Presentation transcript:

Polymorphism Haixu Tang School of Informatics

Genome variations underlie phenotypic differences cause inherited diseases

Restriction fragment length polymorphism (RFLP)

RFLP Haplotype

Microsattelite (short tandem repeats) polymorphysim the repeat region is variable between samples while the flanking regions where PCR primers bind are constant 7 repeats 8 repeats AATG

Which Suspect, A or B, cannot be excluded from potential perpetrators of this assault?

Single nucleotide polymorphism The highest possible dense polymorphism A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more than 1 percent) of a large population.

Some Facts In human beings, 99.9 percent bases are same. Remaining 0.1 percent makes a person unique. –Different attributes / characteristics / traits how a person looks, diseases he or she develops. These variations can be: –Harmless (change in phenotype) –Harmful (diabetes, cancer, heart disease, Huntington's disease, and hemophilia ) –Latent (variations found in coding and regulatory regions, are not harmful on their own, and the change in each gene only becomes apparent under certain conditions e.g. susceptibility to lung cancer)

SNP facts SNPs are found in –coding and (mostly) noncoding regions. Occur with a very high frequency –about 1 in 1000 bases to 1 in 100 to 300 bases. The abundance of SNPs and the ease with which they can be measured make these genetic variations significant. SNPs close to particular gene can acts as a marker for that gene.

SNP maps Sequence genomes of a large number of people Compare the base sequences to discover SNPs. Generate a single map of the human genome containing all possible SNPs => SNP maps

How do we find sequence variations? look at multiple sequences from the same genome region use base quality values to decide if mismatches are true polymorphisms or sequencing errors

Automated polymorphism discovery Marth et al. Nature Genetics 1999

Large SNP mining projects Sachidanandam et al. Nature 2001 ~ 8 million EST WGS BAC genome reference

How to use markers to find disease? genotyping: using millions of markers simultaneously for an association study genome-wide, dense SNP marker map depends on the patterns of allelic association in the human genome question: how to select from all available markers a subset that captures most mapping information (marker selection)

Allelic association allelic association is the non- random assortment between alleles i.e. it measures how well knowledge of the allele state at one site permits prediction at another marker site functional site by necessity, the strength of allelic association is measured between markers significant allelic association between a marker and a functional site permits localization (mapping) even without having the functional site in our collection

Linkage disequilibrium LD measures the deviation from random assortment of the alleles at a pair of polymorphic sites D=f( ) – f( ) x f( ) other measures of LD are derived from D, by e.g. normalizing according to allele frequencies (r 2 )

strong association: most chromosomes carry one of a few common haplotypes – reduced haplotype diversity Haplotype diversity the most useful multi-marker measures of associations are related to haplotype diversity 2 n possible haplotypesn markers random assortment of alleles at different sites

Haplotype blocks Daly et al. Nature Genetics 2001 experimental evidence for reduced haplotype diversity (mainly in European samples)

The promise for medical genetics CACTACCGA CACGACTAT TTGGCGTAT within blocks a small number of SNPs are sufficient to distinguish the few common haplotypes  significant marker reduction is possible if the block structure is a general feature of human variation structure, whole-genome association studies will be possible at a reduced genotyping cost this motivated the HapMap project Gibbs et al. Nature 2003

The HapMap initiative goal: to map out human allele and association structure of at the kilobase scale deliverables: a set of physical and informational reagents

Haplotyping the problem: the substrate for genotyping is diploid, genomic DNA; phasing of alleles at multiple loci is in general not possible with certainty experimental methods of haplotype determination (single-chromosome isolation followed by whole-genome PCR amplification, radiation hybrids, somatic cell hybrids) are expensive and laborious A T C T G C C A

A example of hyplotyping Mother GG AT CA TT Father CC AA AC CT Children GC AA CC CT Children GC AT AA TT Children GC AA AC CT

Haplotypes a b Mother I G A C T G T A T II G T C T G A A T Father I C A A C C A C T II C A A T C A C C

A example of hyplotyping Mother GG AT CA TT Father CC AA AC CT Children GC AA CC CT (M-Ia & F-IIb) Children GC AT AA TT (M-Ib & F-IIa) Children GC AA AC CT (M-Ia & F-Ia or M-IIb & F-IIb) ?

HapMap Project High-density SNP genotyping across the genome provides information about –SNP validation, frequency, assay conditions –correlation structure of alleles in the genome A freely-available public resource to increase the power and efficiency of genetic association studies to medical traits All data is freely available on the web for application in study design and analyses as researchers see fit

HapMap Samples 90 Yoruba individuals (30 parent-parent-offspring trios) from Ibadan, Nigeria (YRI) 90 individuals (30 trios) of European descent from Utah (CEU) 45 Han Chinese individuals from Beijing (CHB) 45 Japanese individuals from Tokyo (JPT)

HapMap progress PHASE I – completed, described in Nature paper * 1,000,000 SNPs successfully typed in all 270 HapMap samples PHASE II – data generation complete, data released * >3,500,000 SNPs typed in total !!!

ENCODE-HAPMAP variation project Ten “typical” 500kb regions 48 samples sequenced All discovered SNPs (and any others in dbSNP) typed in all 270 HapMap samples Current data set – 1 SNP every 279 bp A much more complete variation resource by which the genome-wide map can evaluated

Tagging from HapMap Since HapMap describes the majority of common variation in the genome, choosing non-redundant sets of SNPs from HapMap offers considerable efficiency without power loss in association studies

Pairwise tagging Tags: SNP 1 SNP 3 SNP 6 3 in total Test for association: SNP 1 SNP 3 SNP 6 A/T 1 G/A 2 G/C 3 T/C 4 G/C 5 A/C 6 high r 2 AAAA TTTT G C C G G C C G T CCCCCC A CCCCCC G C C G T CCCCCC GGGG AAAA GGGG AAAA After Carlson et al. (2004) AJHG 74:106