Online Resources for Genetic Variation Study – Part One Yi-Bu Chen, Ph.D. Bioinformatics Specialist Norris Medical Library University of Southern California.

Slides:



Advertisements
Similar presentations
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Advertisements

Julia Krushkal 4/11/2017 The International HapMap Project: A Rich Resource of Genetic Information Julia Krushkal Lecture in Bioinformatics 04/15/2010.
1 of 25 Sequence Variation in Ensembl. 2 of 25 Outline SNPs SNPs in Ensembl Linkage disequilibrium SNPs in BioMart DAS sources.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Fatchiyah, PhD Dept Biology UB Fatchiyah.lecture.ub.ac.id
Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series.
Outline to SNP bioinformatics lecture
Using HapMap.Org A Tutorial Lincoln Stein, Cold Spring Harbor Laboratory.
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
Resources at HapMap.Org Tutorial Marcela K. Tello-Ruiz Cold Spring Harbor Laboratory.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.
A coalescent computational platform for tagging marker selection for clinical studies Gabor T. Marth Department of Biology, Boston College
SNP Resources: Finding SNPs, Databases and Data Extraction Debbie Nickerson
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
SNP Resources: Finding SNPs, Databases and Data Extraction Debbie Nickerson NIEHS SNPs Workshop.
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD Robert J. Livingston, PhD NIEHS Variation Workshop January 30-31, 2005.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Online Resources for Genetic Variation Study – Part One
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
SNPs DNA differs between humans by 0.1%, (1 in 1300 bases) This means that you can map DNA variation to around 10,000,000 sites in the genome Almost all.
SNP Selection University of Louisville Center for Genetics and Molecular Medicine January 10, 2008 Dana Crawford, PhD Vanderbilt University Center for.
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD SeattleSNPs Variation Workshop March 20-21, 2006.
Course Overview Personalized Medicine: Understanding Your Own Genome Fall 2014.
Genome Variations & GWAS
DbSNP: the NCBI database of genetic variation S. T. Sherry, M.H. Ward, M. Kholodov, J. Baker, L. Phan, E. M. Smigielski and K. Sirotkin, Nucleic Acids.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Simple Nucleotide.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
GeVab: Genome Variation Analysis Browsing Server Korean BioInformation Center, KRIBB InCoB2009 KRIBB
Problem Set I review BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
Analyzing DNA Differences PHAR 308 March 2009 Dr. Tim Bloom.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
Online Mendelian Inheritance in Man (OMIM): What it is & What it can do for you Knowledge Management & Eskind Biomedical Library January 27, 2012 helen.
SeattleSNPs Variation Discovery Resource Materials prepared by: Mary E. Mangan, PhD Updated: Q Version 1.
1 of 32 Sequence Variation in Ensembl. 2 of 32 Outline SNPs SNPs in Ensembl Haplotypes & Linkage Disequilibrium SNPs in BioMart HapMap project Strain-specific.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
ABC for the AEA Basic biological concepts for genetic epidemiology Martin Kennedy Department of Pathology Christchurch School of Medicine.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
GVS: Genome Variation Server Materials prepared by: Warren C. Lathe, PhD Updated: Q Version 2.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
The International Consortium. The International HapMap Project.
In The Name of GOD Genetic Polymorphism M.Dianatpour MLD,PHD.
Single nucleotide polymorphisms and Large scale variation
Copyright OpenHelix. No use or reproduction without express written consent1.
Resources at HapMap.Org HapMap3 Tutorial Marcela K. Tello-Ruiz Cold Spring Harbor Laboratory.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Welcome to the combined BLAST and Genome Browser Tutorial.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
1 Bioinformatics Tools for Genotyping Frances Tong Dr. Garry Larson, Ph.D City of Hope Department of Molecular Medicine Southern California Bioinformatics.
Gil McVean Department of Statistics
School of Pharmacy, University of Nizwa
Relationship between Genotype and Phenotype
A Tutorial Lincoln Stein, Cold Spring Harbor Laboratory
School of Pharmacy, University of Nizwa
Relationship between Genotype and Phenotype
SNPs and CNPs By: David Wendel.
Presentation transcript:

Online Resources for Genetic Variation Study – Part One Yi-Bu Chen, Ph.D. Bioinformatics Specialist Norris Medical Library University of Southern California Sept. 6, 2007

Workshop Outline  Overview of Bioinformatics Support Program at NML  Human Genetic Variation Overview Main types of genetic variations Basics of the single nucleotide polymorphisms (SNPs)  NCBI Genetic Variation Resources: dbSNP and OMIM dbSNP overview dbSNP search examples OMIM overview  International HapMap Project The HapMap project: overview and major findings HapMap search examples  The Perlegen Genetic Variation Database  Genome Variation Server (SeattleSNPs)  Ensembl SNPs  Hands-on Search Question

Human vs. Chimp ~96% similar including other forms of variations (~99% similar in terms of SNPs) Human vs. Human ~99.9% similar or around 3 million single nucleotide differences Polymorphisms: How different are we? Adapted from a lecture slide by Jonathan Wren, NYU

Why do we care about genetic variations? 3. Genetic variations reveal clues of ancestral human migration history 2. Genetic variations determine our predisposition to complex diseases and responses to drugs and environmental factors 1. Genetic variations underlie phenotypic differences among different individuals

Main Types of Genetic Variations A. Single nucleotide mutation  Resulting in single nucleotide polymorphisms (SNPs)  Accounts for 90% of human genetic variations  Majority of SNPs do NOT directly and significantly contribute to any phenotypes B. Insertion or deletion of one or more nucleotide(s) 1. Tandem repeat polymorphisms  Tandem repeats are genomic regions consisting of variable length of sequence motifs repeating in tandem with variable copy number.  Used as genetic markers for DNA finger printing (forensic, parentage testing)  Many cause genetic diseases  Microsatelites (Short Tandem Repeats): repeat unit 1-6 bases long  Minisatelites: repeat unit bases long 2. Insertion/Deletion (INDEL or DIPS) polymorphisms Often resulted from localized rearrangements between homologous tandem repeats. C. Gross chromosomal aberration  Deletions, inversions, or translocation of large DNA fragments  Rare but often causing serious genetic diseases

How many variations are present in human genome?  SNPs appear once per kb interval or on average 1 per 300 bp. Considering the size of entire human genome (3.2 x10 9 bp), the total number of SNPs is well above 10 million.  In sillico estimation of potentially polymorphic variable number tandem repeats (VNTR) are over 100,000 across the human genome  The short insertion/deletions are very difficult to quantify and the number is likely to fall in between SNPs and VNTR.

Types of Single Base Substitutions  Transitions Change of one purine (A,G) for another purine, or a pyrimidine (C,T) for another pyrimidine  Transversions Change of a purine (A,G) for a pyrimidine (C,T), or vice versa.  The cytosine to thymine (C>T) transition accounts for approximately 2 out of every 3 SNPs in human genome.

SNP or Mutation?  Call it a SNP IF the single base change occurs in a population at a frequency of 1% or higher.  Call it a mutation IF the single base change occurs in less than 1% of a population.  A SNP is a polymorphic position where the point mutation has been fixed in the population.

From a Mutation to a SNP

SNPs Classification SNPs can occur anywhere on a genome, they are classified based on their locations.  Intergenic region  Gene region can be further classified as promoter region, and coding region (intronic, exonic, promoter region, UTR, etc.)

Coding Region SNPs  Synonymous  Non-Synonymous  Missense – amino acid change  Nonsense – changes amino acid to stop codon. GeospizaGeospiza Green Arrow™ tutorial by Sandra Porter, Ph.D.

Consequences of SNPs The phenotypic consequence of a SNP is significantly affected by the location where it occurs, as well as the nature of the mutation.  No consequence  Affect gene transcription quantitatively or qualitatively.  Affect gene translation quantitatively or qualitatively.  Change protein structure and functions.  Change gene regulation at different steps.

Simple/Complex Genetic Diseases and SNPs  Simple genetic diseases (Mendelian diseases) are often caused by mutations of a single gene. -- e.g. Huntington’s, Cystic fibrosis, PKU, etc.  Many complex diseases are the result of mutations in multiple genes, the interactions among them as well as between the environmental factors. -- e.g. cancers, heart diseases, Alzheimer's, diabetes, asthmas, etc.  Majority of SNPS may not directly cause any diseases.  SNPs are ideal genomic markers (dense and easy to assay) for locating disease loci in association studies.

 NCBI dbSNP  NCBI Online Mendelian Inheritance in Man (OMIM)  International HapMap Project  Perlegen  Genome Variation Server (Seattle SNPs) Main Genetic Variation Resources

Where to Find Bioinformatics Resources for Genetic Variation Studies?  OBRC: Online Bioinformatics Resources Collection (Univ. of Pittsburgh) The most comprehensive annotated bioinformatics databases and software tools collection on the Web, with over 200 resources relevant to genetic variation studies.  HUGO Mutation Database Initiative

NCBI dbSNP Database: Overview  URL:  The NCBI’s Single Nucleotide Polymorphism database (dbSNP) is the largest and primary public-domain archive for simple genetic variation data.  The polymorphisms data in dbSNP includes: Single-base nucleotide substitutions (SNPs) Small-scale multi-base deletions or insertions variations (also called deletion insertion polymorphisms or DIPs) Microsatellite tandem repeat variations (also called short tandem repeats or STRs).

dbSNP Data Stats (March, 2007)

dbSNP Data Types The dbSNP contains two classes of records:  Submitted record The original observations of sequence variation; submitted SNPs (SS) records started with ss (ss )  Computationally annotated record Generated during the dbSNP "build" cycle by computation based the original submitted data, Reference SNP Clusters (ref SNP) start with rs (rs )

dbSNP Submitted Record  Provides information on the SNP and conditions under which it was collected.  Provides links to collection methods (assay technique), submitter information (contact data, individual submitter), and variation data (frequencies, genotypes). ss

From Submitted Record to Reference SNP Cluster SNPs records submitted by researchers SNP position mapped to the reference genomic contigs If the SNP position is unique, a new RS# is assigned If the SNP position not unique, it will be assigned to the existing RefSNP cluster

Search dbSNP: Example 1 Mutations on human BRCA1 gene have been reported to be involved in the early onset of breast cancer. Retrieve all validated non-synonymous coding reference SNPs for BRCA1 from dbSNP.

Different Ways to Search SNPs in dbSNP  dbSNP Web site Direct search of SS record; batch search; allow SNP record submission; NO search limits  Entrez SNP Search limits options allows precise retrieval  Entrez Gene Record’s SNP Links Out Feature Direct links to corresponding SNP records; access to genotype and linkage disequilibrium data  NCBI’s MapViewer Visualize SNPs in the genomic context along with other types of genetic data.

Search SNPs from dbSNP Web Page  dbSNP Web site

Search SNPs from Entrez SNP Web Page  Entrez SNP The dbSNP is a part of the Entrez integrated information retrieval system and may be searched using either qualifiers (aliases) or a combination search limits from 14 different categories.

Entrez SNP Search Limits  Organisms  Chromosome (including W and Z for non-mammals)  Chromosome Ranges  Map Weight (how many times in genome)  Function Class (coding non-synonymous; intron; etc.)  SNP Class (types of variations)  Method Class (methods for determining the variations)  Validation Status (if and how the data is validated)  Variation Alleles (using IUPAC- codes)  Annotation (Records with links to other NCBI database)  Heterozygosity (% of heterozygous genotype)  Success Rate (likelihood that the SNP is real)  Created Build ID  Updated Build ID

Entrez SNP Search Results Example 1

dbSNP Ref SNP Record Example 1: Summery This Ref SNP cluster contains multiple submitted SNP records from different groups

dbSNP Ref SNP Record Example 1: SNP position and the flank region

Because of alternative splicing, the very same SNP can locate in different region of the transcripts. dbSNP Ref SNP Record Example 1: GeneView of an individual SNP

dbSNP Ref SNP Record Example 1: TableView of an individual SNP Notice that the individual SNP is mapped to the same position on the reference genomic contig, but different positions on mRNAs and proteins due to alternative splicing.

dbSNP Ref SNP Record Example 1: Links to Various Annotated NCBI Databases Link to the OMIM record where documented clinical and genetic data of this SNP can be found.

dbSNP Ref SNP Record Example 1: Population Allele Frequency, Genotype and Heterozygosity Data Link to the detailed population genotype data.

dbSNP Ref SNP Record Example 1: GeneVeiw and SequenceView of ALL SNPs

dbSNP Ref SNP Record Example 1: Links to View SNP on 3D Structure, Conserved Domains, and Multiple Sequence Alignment

Search dbSNP: Example 2 Mutations in Dopamine Receptor 5 (DRD5) gene have been observed in patients with various neurological disorders. Find how many refSNP records have been reported for DRD5. Show all refSNPs in the context of a chromosome.

Search dbSNP: SNP Links from Entrez Gene Record

Search dbSNP: SNP Display Using NCBI Map Viewer

Search dbSNP: Configure Map Viewer to Display other Relevant Data

SNPs Display in Map Viewer: Legend Click on any column headings to see the refSNPs legend.

SNPs Display in Map Viewer: Legend

Online Mendelian Inheritance in Man (OMIM): A Brief Overview  URL: URL:  OMIM is a human genetic disorders database built and curated use results from published studies.  Each OMIM record provides a summary of the current state of knowledge of the genetic basis of a disorder, which contains the following information: description and clinical features of a disorder or a gene involved in genetic disorders; biochemical and other features; cytogenetics and mapping; molecular and population genetics; diagnosis and clinical management; animal models for the disorder; allelic variants.  OMIM is searchable via NCBI Entrez, and its records are cross-linked to other NCBI resources.

Online Mendelian Inheritance in Man Stats

OMIM: Allelic Variants  The OMIM database includes genetic disorders caused by various mutation/variation, from SNPs to large-scale chromosomal abnormalities.  The listed allelic variants are searchable through the "Allelic Variants" field. Single nucleotide substitutions (SNPs); small insertions and deletions (INDEL/DIPS); frame shifts caused by these INDELs.  Allelic variants are represented by a 10-digit OMIM number, and can be searched in two ways:OMIM number Search for a gene or a disease, when retrieved, view its allelic variants. Use the Limits to narrow your search to:Limits -- retrieve only records that contain allelic variant information; -- search for particular terms within the allelic variants field.

Notes on OMIM Allelic Variants  For most genes, only selected mutations are included as specific subentries. Criteria for inclusion include: the first mutation to be discovered, high population frequency, distinctive phenotype, historic significance, unusual mechanism of mutation, unusual pathogenetic mechanism, and distinctive inheritance.  Most of the allelic variants represent disease- producing mutations, NOT polymorphisms.  A few polymorphisms are included, many of which show a positive statistical correlation with particular common disorders.  Majority of neutral polymorphisms are included in OMIM.

Assessing Polymorphisms: Genotypes and Genotyping  Genotype: Each person has two copies of all chromosomes except the sex chromosomes. The set of alleles that a person has is called a genotype.  Genotyping: A method that discovers what genotype a person has.  Whole-genome genotyping of all SNPs in a human genome? (11.8 million and counting)  Technologically daunting  Prohibitively expensive and time consuming

Assessing Polymorphisms: the Origin of Haplotype  Two ancestral chromosomes scrambled through recombination over many generations to yield different descendant chromosomes.  If a genetic variant marked by the X on the ancestral chromosome increases the risk of a particular disease, the two descendants who inherit that part of the ancestral chromosome will be at increased risk.  Adjacent to the variant marked by the X are many SNPs that can be used to identify the location of the variant.  Haplotype: A particular combination of alleles along a chromosome that tends to be inherited as a unit.

Assessing Polymorphisms: Linkage Disequilibrium, Haplotype Block, and Tag SNPs Adapted from Nature 426, 6968: (2003)  Linkage Disequilibrium (LD): If two alleles tend to be inherited together more often than would be predicted, then the alleles are in linkage disequilibrium.  If the majority SNPs have highly significant correlation to one or more of neighbors, these correlations can be used to generate haplotypes, which represent excellent proxies for individual SNP.  Because haplotypes may be identified by a much small number of SNPs (tag SNPs), assessing polymorphisms via haplotypes dramatically reduces genotyping work.

48  Tag SNP: a representative SNP enabling to infer (or predict) other SNPs of its “neighborhood” (both distance and genealogically wise).  An r 2 of 0.8 or greater is sufficient for tag SNP mapping to obtain a good coverage of untyped SNPs.  Tag SNPs allow genotyping of a lower number of marker SNPs with very small losses in power.  If LD between SNPs is low, almost every SNP might have to be genotyped to get all variation information. Assessing Polymorphisms: Tag SNPs

Goals  Create a public genome-wide database of common human genetic variation in the context of geographic distribution  Provide such information to guide genetic studies of clinical phenotypes  Phase I (Oct. 2002)  One million common SNPs (every 5 kb across the genome) were genotyped in 269 DNA samples from four populations.  Common SNPs : Minor Allele Frequency ≥ 0.05  YRI : Yoruba in Nigeria (30 trios), CEU : Utah with European ancestry (30 trios), CHB : 45 Han Chinese, JPT: 44 Japanese  Phase II  An additional 4.6 million SNPs are genotyped.  ENCODE (Encyclopedia of DNA Elements)  Collection of ten regions, each 500kb in length.  Each 500 kb region was re-sequenced and all SNPs were genotyped.

HapMap Progress PHASE I – completed  1,000,000 SNPs successfully typed in all 270 HapMap samples  At least one common SNP every 5 kb across the genome  ENCODE variation reference resource available PHASE II – data generation complete, about 4.6 million SNPs typed in total. ENCODE-HAPMAP – A much more detailed variation resource  48 samples sequenced  All discovered SNPs (and any others in dbSNP) typed in all 270 HapMap samples  Current data set – average 1 SNP every 279 bp

Basic Data: genotypes of the 270 individual samples (frequencies of SNP alleles and genotypes in each population) Recent data release (Full Data Set): January 11, 2007, NCBI B35 (includes both Phase I&II data, genotypes from Illumina 100k and 300k genotyping arrays and the Affymetrix nsSNPs) Phase I: 600,000 common SNPs in 270 individuals Phase II: 4-5 million SNPs in the same individuals Available for bulk download:  All genotype data, haplotype phasing data (from PHASE)  Pedigree trio files  Raw LD data (D’, R 2 ), recombination rates and hotspots  Allele and genotype frequencies  SNP assays and protocols  Allocated SNPs (dbSNP reference clusters chosen for genotyping) Adapted from Alanna Morrison, Human Genetics Center, Feb lecture HapMap Data Overview

Major Findings of the HapMap Project  Extensive Redundancy of SNP: over 90% of all SNPs on the map have highly statistically significant correlation to one or more neighbors.  Confirmed the generality of recombination hotspots and long segments of strong LD (Haplotype blocks), with the average length ranging from 7.3 (YRI) to 16.3 kb (CEU), and between 65-85% of human genome presented in such blocks.  Revealed limited haplotype diversity: while each haplotype block contains SNPs, on average only common haplotype blocks exist, which can be further identified by a smaller number of SNPs (tag SNPs).  The density of common SNPs can be reduced by 75–90% with essentially no loss of information. That is, the genotyping burden can be reduced from one common SNP every 500 bp to one SNP every 2 kb (YRI) to 5 kb (CEU and CHB/JPT).

What can you do from the HapMap Web Site?  Search for SNPs in a gene or any region of interest (ROI).  View patterns of LD in the ROI.  Select tagSNPs in the ROI.  Download information on the SNPs in ROI for genotype/haplotype data analysis and visualization in Haploview or other software.  Generate and retrieve customized subset data.  Download the entire data set in bulk.

Search HapMap: Example 1 SNPs in human BRCA1 gene have been reported to be involved in the early onset of breast cancer. Find all available genotype and LD data for SNPs documented for BRCA1 in HapMap database.

HapMap Search Example 1 Step 1: Open the Genome Browser with the Latest Full Data Set Click “HapMap Genome Browser (B35 full data set)”

HapMap Search Example 1 Step 2: Specify the landmark/region of interests Enter gene name “brca1” to specify the region of your interest When there are multiple transcripts, click one of your choice

Genotype frequency HapMap Search Example 1 Step 3: Examine and determine the desired region for display Examine the region for display using different scales Genotyped SNPs in the region, pie chart shows allelic frequencies (ref vs other) The mRNA

HapMap Search Example 1 Step 4: Select the desired tracks for display Select the desired analysis results for display Click “Update Image” once the configuration is done

HapMap Search Example 1 Step 5: Configure the tag SNP Picker Select the desired population Select the desired tagging methods Select r2 value to set desired stringency Set MAF for the lowest threshold of alleles to be captured by the tagged SNPs Specify SNPs to be included/excluded as tagged SNPs

HapMap Search Example 1 Step 6: Configure the LD Plot Configure LD plot display Select LD measurement and range Customize the color display for LD value Select desired populations

HapMap Search Example 1 Step 7: Tag SNPS and LD Plot Genotyped SNPs in the region LD plot shows LD between different pairs of SNPs Tagged SNPs based on your criteria

Select desired data or file for download Click “Go” HapMap Search Example 1 Step 8: Download various data and files The genotype data can be used for in depth LD and Haplotype analysis with the free Haploview program.

Haploview--

Haploview Screenshots

HapMap Data Extraction using HapMart Select desired population

HapMap Data Extraction using HapMart: Data filter and export

Perlegen Sciences  Found in 2000 with the mission of identifying clinically relevant patterns of genetic variation.  Over 1.6 millions common SNPs genotyped from 71 individuals from 3 American populations of European, African and Asian ancestry (about 1 SNP/1871 bp)  GWA studies on over 100,000 different human individual.  Re-sequenced the nuclear DNA genomes of 15 inbred laboratory mouse strains and generated genotype data.  Specialized Mouse Genome Brower allows users visualize the SNPs and LR-PCR primer pairs and access the SNP genotypes for the 15 strains

Perlegen Human Genotype Brower

Perlegen Human Genotype Brower

 Hosting raw genotyping data for 4.5 million human SNPs from HapMap, Perlegen, and other projects.  Generated SNPs data on candidate genes involved in cardiovascular diseases and inflammatory process.  Tools for searching, visualization and analysis of genotype data for association studies.  Merging SNP data sets from different populations.

Using Genome Variation Server Detailed online tutorial Select the search type to start the search upload your genotype data for analysis

GVS Search Example: rs (FTO gene)  Step 1: select query type 1 2

GVS Search Example: rs (FTO gene)  Step 2: Select population(s)

GVS Search Example: rs (FTO gene)  Step 3: Configure parameters

GVS Search Example: rs (FTO gene)  Step 4: Display Results—Genotype data

GVS Search Example: rs (FTO gene)  Step 4: Display Results—Genotype data SNP ID Sample rs

GVS Search Example: rs (FTO gene)  Step 5: Display results—TagSNPs TagSNPs Table Display

GVS Search Example: rs (FTO gene)  Step 5: Display results—TagSNPs Bin TagSNPs Graphic Display

GVS Search Example: rs (FTO gene)  Step 6: Display results—LD

GVS Search Example: rs (FTO gene)  Step 7: Display results—Summary

SNPs in Ensembl Most SNPs imported from dbSNP (rs……): Imported data: alleles, flanking sequences, frequencies, …. Calculated data: position, synonymous status, peptide shift, …. For human also: HGVbase TSC Affy GeneChip 100K and 500K Mapping Array Ensembl-called SNPs (from Celera reads) For mouse and rat also: Sanger- and Ensembl-called SNPs

SNPs in Ensembl MapView: SNP density on chromosome

SNPs in Ensembl ContigView: SNPs in genomic context

SNPs in Ensembl GeneSeqView: SNPs in genomic sequence

SNPs in Ensembl TransView & ProtView: SNPs in transcript/ protein

SNPs in Ensembl What SNPs does my gene contain? > GeneSNPView

SNPs in Ensembl Info about one specific SNP? > SNPView: SNP Report Genotype and allele frequencies per population Located in transcripts SNP Context Individual genotypes

User Question A recent report (Frayling et al. Science 2007) found a common variant (rs , A>T) in the FTO gene ( fat mass and obesity associated ) is associated with body mass index and predisposes to obesity and diabetes.rs The adults (16%) carrying homozygous risk allele A weighed 3 kg more and had 1.67 fold increased odds of obesity compared to those without the risk allele. Use the HapMap and dbSNP to find the genotype data of this SNP in different populations.

Answer 1: Searching HapMap Use the refSNP# (must starts with rs) as the landmark for the search Click on the pie chart for detailed population genotype data

Answer 1: Searching HapMap Population genotype data of the homozygous risk allele A Retrieve detailed genotyping data

Answer 2: Searching NCBI’s dbSNP Click on the rs record for detailed SNP data report

Answer 2: Searching NCBI’s dbSNP Genotype data from Perlegen’s project with different population samples

Acknowledgement In addition to those already stated, some slides of this workshop were adapted from the sources below: 1.Chattopadhyay A. and M.R. Tennant. “Genetic Variation Resources”. Lecture slides for 2007 NCBI Advanced Workshop for Bioinformatics Information Specialists.Genetic Variation Resources 2.Stein L. “Using HapMap.org: A tutorial”. Presentation slides as part of the Official HapMap Tutorial.Using HapMap.org: A tutorial 3.Overduin B. “Sequence Variation in Ensembl”. Lecture slides for “Ensembl Courses and Workshops”Sequence Variation in Ensembl

Recommend Topics for the Second Part of “Online Resources for Genetic Variation Study”  Functional analysis of SNPs  Tools for SNP discovery and genotyping  Tools for TagSNPs selection  Tools for genome wide association study  Genetic association databases  Others?? 

Please evaluate this workshop to help me improving future presentations: Have questions or comments about this workshop? Please contact: Yi-Bu Chen, Ph.D. Bioinformatics Specialist Norris Medical Library University of Southern California