Bioinformatics SNPs and haplotypes Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Slides:



Advertisements
Similar presentations
Linkage and Genetic Mapping
Advertisements

Lecture 2 Strachan and Read Chapter 13
Manish Anand Nihar Sheth Jim Costello Univ. of Indiana
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Julia Krushkal 4/11/2017 The International HapMap Project: A Rich Resource of Genetic Information Julia Krushkal Lecture in Bioinformatics 04/15/2010.
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
Efficient Algorithms for Genome-wide TagSNP Selection across Populations via the Linkage Disequilibrium Criterion Authors: Lan Liu, Yonghui Wu, Stefano.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Signatures of Selection
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
Dr. Almut Nebel Dept. of Human Genetics University of the Witwatersrand Johannesburg South Africa Significance of SNPs for human disease.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
Course Overview Personalized Medicine: Understanding Your Own Genome Fall 2014.
Paola CASTAGNOLI Maria FOTI Microarrays. Applicazioni nella genomica funzionale e nel genotyping DIPARTIMENTO DI BIOTECNOLOGIE E BIOSCIENZE.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Simple Nucleotide.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Biotechnology and Genomics Chapter 16. Biotechnology and Genomics 2Outline DNA Cloning  Recombinant DNA Technology ­Restriction Enzyme ­DNA Ligase 
Standardization of Pedigree Collection. Genetics of Alzheimer’s Disease Alzheimer’s Disease Gene 1 Gene 2 Environmental Factor 1 Environmental Factor.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
Analyzing DNA Differences PHAR 308 March 2009 Dr. Tim Bloom.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
Conservation of genomic segments (haplotypes): The “HapMap” n In populations, it appears the the linear order of alleles (“haplotype”) is conserved in.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
Pharmacogenetics & Pharmacogenomics Personalized Medicine.
CS177 Lecture 10 SNPs and Human Genetic Variation
Genomes and Genomics.
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
Genome-Wide Association Study (GWAS)
Personalized Medicine Dr. M. Jawad Hassan. Personalized Medicine Human Genome and SNPs What is personalized medicine? Pharmacogenetics Case study – warfarin.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Biotechnology and Genomics Chapter 16. Biotechnology and Genomics 2Outline DNA Cloning  Recombinant DNA Technology ­Restriction Enzyme ­DNA Ligase 
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
The International Consortium. The International HapMap Project.
Motivations to study human genetic variation
Copyright OpenHelix. No use or reproduction without express written consent1.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Unit 1 – Living Cells.  The study of the human genome  - involves sequencing DNA nucleotides  - and relating this to gene functions  In 2003, the.
Notes: Human Genome (Right side page)
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Pharmacogenetics/Pharmacogenomics. Outline Introduction  Differential drug efficacy  People react differently to drugs Why does drug response vary?
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
Interpreting exomes and genomes: a beginner’s guide
Single Nucleotide Polymorphisms (SNPs
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Human Cells Human genomics
School of Pharmacy, University of Nizwa
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
Biology, 9th ed,Sylvia Mader
Power to detect QTL Association
“TaqMan genotyping Assay’’
School of Pharmacy, University of Nizwa
Medical genomics BI420 Department of Biology, Boston College
Biology, 9th ed,Sylvia Mader
Medical genomics BI420 Department of Biology, Boston College
Haplotypes When the presence of two or more polymorphisms on a single chromosome is statistically correlated in a population, this is a haplotype Example.
SNPs and CNPs By: David Wendel.
Presentation transcript:

Bioinformatics SNPs and haplotypes Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009

Acknowledgements Parts of these slides have been adapted or taken over from existing course notes and online material: Practical: Heather Cordell Slides: Stuart M Brown

Outline Practical in R on genetic association analysis SNPs and Haplotypes A tour in FBAT

Genetic Association Analysis in R

Computer Practical Exercise Heather Cordell http://www.staff.ncl.ac.uk/heather.cordell/WTACcasecon2007.html Using R for Case-control association Gene-gene interactions (future class)

SNPs and Haplotypes A gentle introduction of relevant issues

Mutations create Alleles Mutations occur randomly throughout the DNA Most have no phenotypic effect (non-coding regions, equivalent codons, similar AAs) Some damage the function of a protein or regulatory element A very few provide an evolutionary advantage

Human Alleles The OMIM (Online Mendelian Inheritance in Man) database at the NCBI tracks all human mutations with known pheontypes. It contains a total of about 2,000 genetic diseases [and another ~11,000 genetic loci with known phenotypes - but not necessarily known gene sequences] It is designed for use by physicians: can search by disease name contains summaries from clinical studies

Population Genetics Chromosome pairs segregate and recombine in every generation. Every allele of every gene has its own independent evolutionary history (and future!) Frequencies of various alleles differ in different sub-populations of people.

SNPs Single nucleotide polymorphisms (SNPs) are DNA sequence variations occurring when a single nucleotide (A, C, T, G) in the genome is altered. The inherited allelic variation must have >1% population frequency. SNPs can occur in both coding and non-coding regions, making up 90% of all human genetic variation Frequency: roughly, every 100 to 300 bases along the about 3 billion base human genome Remark: Some definitions include methylated and deaminated dinucleotides 11

Distribution of SNPs and Power

SNPs are Very Common SNPs are very common in the human population. Between any two people, there is an average of one SNP every 1000 bases. Most of these have no phenotypic effect only <1% of all human SNPs impact protein function (non-coding regions) Selection against mis-sense mutations (think about what would happen to dominant lethal mutations?) Some are alleles of genes.

Why are SNPs Important? Alleles of health related genes Genetic Markers that are linked to every gene (and to non-transcribed loci that may also affect health) Fast, cheap, accurate genotypes Population diversity & history Genetic Association studies in populations Pharmacogenomics

Genome Sequencing finds SNPs The Human Genome Project involves sequencing DNA cloned from a number of different people. Even in a library made from from one person’s DNA, the homologous chromosomes have SNPs This inevitably leads to the discovery of SNPs - any single base sequence difference

We describe a map of 1.42 million single nucleotide polymorphisms (SNPs) distributed throughout the human genome, providing an average density on available sequence of one SNP every 1.9 kilobases. These SNPs were primarily discovered by two projects: The SNP Consortium and the analysis of clone overlaps by the International Human Genome Sequencing Consortium. The map integrates all publicly available SNPs with described genes and other genomic features. We estimate that 60,000 SNPs fall within exon (coding and untranslated regions), and 85% of exons are within 5 kb of the nearest SNP. Nucleotide diversity varies greatly across the genome, in a manner broadly consistent with a standard population genetic model of human history. This high-density SNP map provides a public resource for defining haplotype variation across the genome, and should help to identify biomedically important genes for diagnosis and therapy.

GenBank has a dbSNP “As of Mar. 2007 , dbSNP has submissions for 31,035,607 human SNPs” It is possible to search dbSNP by BLAST comparisons to a target sequence

directly located on the Genome map >gnl|dbSNP|rs1042574_allelePos=51 total len = 101 |taxid = 9606|snpClass = 1 Length = 101 Score = 149 bits (75), Expect = 3e-33 Identities = 79/81 (97%) Strand = Plus / Plus Query: 1489 ccctcttccctgacctcccaactctaaagccaagcactttatatttttctcttagatatt 1548 ||||||||||||||||||||||||||||||||||||||||||||||| || ||||||||| Sbjct: 1 ccctcttccctgacctcccaactctaaagccaagcactttatattttcctyttagatatt 60 Query: 1549 cactaaggacttaaaataaaa 1569 ||||||||||||||||||||| Sbjct: 61 cactaaggacttaaaataaaa 81 If a matching SNP is found, then it can be directly located on the Genome map

Linkage Meiosis (sexual cell division) involves a process of crossing over, which gives new combinations of alleles Genes that are located close to each other on the chromosome rarely show recombination of alleles

HapMap Project The HapMap Project tests linkage between SNPs in various sub-populations. For a group of linked SNPs recombination may be rare over tens of thousands of bases A few "tag SNPs" can be used to identify genotypes for groups of linked SNPs Makes it possible to survey the whole genome with fewer markers (1/3-1/10th)

Haplotype Linkage is common in the human population, particularly in genetically isolated sub-populations. A group of alleles for neighboring genes on a segment of a chromosome are very often inherited together. Such a combination of linked alleles is known as a haplotype. When linked alleles are shared by members of a population, it is called a linkage disequilibrium.

Haplotype Map of the Human Genome Goals: Define patterns of genetic variation across human genome Guide selection of SNPs efficiently to “tag” common variants Public release of all data (assays, genotypes) Phase I: 1.3 M markers in 269 people Phase II: +2.8 M markers in 270 people

HapMap Samples 90 Yoruba individuals (30 parent-parent-offspring trios) from Ibadan, Nigeria (YRI) 90 individuals (30 trios) of European descent from Utah (CEU) 45 Han Chinese individuals from Beijing (CHB) 45 Japanese individuals from Tokyo (JPT)

Recombination hotspots: widespread - LD structure 7q21

Common Haplotypes For a single locus in a population, 55 percent of people may have one version of a haplotype, 30 percent may have another, 8 percent may have a third, and the rest may have a variety of less common haplotypes. These haplotype blocks may contain 5-20 SNPs

Common Haplotypes All of these halplotypes can be identified by genotyping 1-3 "tag SNPs" Tag SNPs that contain most of the information about the patterns of human genetic variation are estimated to be about 300,000 to 600,000, which is far fewer than the 10 million common SNPs.

Applications of HapMap Pick better SNPs for genotyping study Choose SNPs with high heterozygosity in target population Whole genome coverage with reduced set of "tag SNPs" (capture all "common variants") Interpret genotyping results What genes are in LD with this SNP? What coding variants and putative functional variants are in LD with this SNP?

Example: Complement Factor H - AMD rs380390

SNP Testing Genotyping SNPs are permanent features of genomic DNA May be homozygous or heterozygous Many different technologies are available

Genotyping Technologies Sequencing (whole genome or targeted) PCR (allele specific primers) Oligonucleotide ligation Primer extension (incorporate labeled nucleotides) Hybridization (microarray)

TaqMan - rtPCR 4 oligos must be designed and tested for each SNP Fast & cheap for lots of samples

Primer Extension

Oligonucleotide Ligation (ABI) can multiplex 48 SNPs

Preliminary data from Affy 10K SNP

Microarrays Screening large numbers of SNP markers on a sample of genomic DNA is one highly promising application for microarray technology. Many other “high-throughput” SNP genotyping technologies are under development. Affymetrix 1million SNP product on sale now!

Comparison of Methods? Array-based methods can cover the whole genome PCR (& variants) are cheaper for defined numbers of SNPs on lots of samples Whole genome: may be too much data false positives privacy concerns Whole genome may work for discovery research, but clinical applications favor targeted assays

Pharmacogenomics The use of DNA sequence information to measure and predict the reaction of individuals to drugs. Personalized drugs Faster clinical trials Less drug side effects

Some Gene Products Interact with Drugs There are proteins that chemically activate or inactivate drugs. Other proteins can directly enhance or block a drug's activity. There are also genes that control side effects

Example 10% of African Americans have polymorphic alleles of Glucose-6-phosphate dehydrogenase that lead to haemolyitic anemia when they are given the anti-malarial drug primaquine.

Collect Drug Response Data These drug response phenotypes are associated with a set of specific gene alleles. Identify populations of people who show specific responses to a drug. In early clinical trials, it is possible to identify people who react well and react poorly.

Make Genetic Profiles Scan these populations with a large number of SNP markers. Find markers linked to drug response phenotypes. It is interesting, but not necessary, to identify the exact genes involved. Can work with “associated populations,” does not require detailed information on disease in family history(pedigree).

Huge Database Problem Physicians collect tons of data patient age, sex, weight, blood pressure, family disease history, date of symptom onset Cancer data: tumor size, location, stage, etc. Data specific to each type of disease Now integrate thousands (or 100K’s) of SNPs that are correlated with some of these clinical factors in complex relationships

Use the Profiles Genetic profiles of new patients can then be used to prescribe drugs more effectively & avoid adverse reactions. Can also speed clinical trials by testing on those who are likely to respond well. Can "rescue" drugs that don't work well on everybody, or that have bad side effects on a few.

Real World Applications Most of the major pharmaceutical companies are currently collecting pharmacogenomic data in their clinical trials. Data is yet to be published. Genetic indications for drug use are becoming available. Plan to sell the drug with the gene test

Multi-locus SNP Profiles There will be a few hundred to a few thousand SNPs linked to medically important alleles in the next ~10 years. Haplotypes will reduce the number that need to be screened (one SNP gives information about a group of linked genes) Some genes will turn out to be involved in many important pathways

Will People Want This Information?? Genetic determinism and possible discrimination. Even a simple test to see what drug you should take could reveal information about your risk of cancer or heart disease.

A tour in FBAT testing

A tour in Python

check website for exercise and supplementary info: due 28 Oct Homework Assignment 4 (R) check website for exercise and supplementary info: due 28 Oct

Homework Assignment 6 (FBAT) check website: due 4 Nov