Polymorphism discovery informatics Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA 02467.

Slides:



Advertisements
Similar presentations
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Advertisements

G ENOTYPE AND SNP C ALLING FROM N EXT - GENERATION S EQUENCING D ATA Authors: Rasmus Nielsen, et al. Published in Nature Reviews, Genetics, Presented.
Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.
Base quality and read quality: How should data quality be measured? Gabor T. Marth Boston College Biology Department 1000 Genomes Meeting Cold Spring Harbor.
Outline to SNP bioinformatics lecture
Variant discovery Different approaches: With or without a reference? With a reference – Limiting factors are CPU time and memory required – Crossbow –
Single nucleotide polymorphisms and applications Usman Roshan BNFO 601.
PCR-Based Genotyping Methods
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
SNP Resources: Finding SNPs Discovery and Databases Mark J. Rieder, PhD SeattleSNPs Workshop March 20-21, 2006.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
General methods of SNP discovery: PolyBayes Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
Polymorphism Structure of the Human Genome Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
The informatics of SNPs and haplotypes Gabor T. Marth Department of Biology, Boston College Cold Spring Harbor Laboratory Advanced Bioinformatics.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Single nucleotide polymorphisms Usman Roshan. SNPs DNA sequence variations that occur when a single nucleotide is altered. Must be present in at least.
BI420 – Course information Web site: Instructor: Gabor Marth Teaching.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
The informatics of SNPs and haplotypes Gabor T. Marth Department of Biology, Boston College Cold Spring Harbor Laboratory Advanced Bioinformatics.
Genotyping of James Watson’s genome from Low-coverage Sequencing Data Sanjiv Dinakar and Yözen Hernández.
SNPs Human Genome. SNP Typing Allele specific hybridization ASO probes usually with the polymorphic base in a central position in the probe sequence.
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Human Genome Sequence and Variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary,
Sequence Variation Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Single nucleotide polymorphisms and applications Usman Roshan BNFO 601.
Informatics for next-generation sequence analysis – SNP calling Gabor T. Marth Boston College Biology Department PSB 2008 January
Informatics challenges and computer tools for sequencing 1000s of human genomes Gabor T. Marth Boston College Biology Department Cold Spring Harbor Laboratory.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome Variations & GWAS
Genetic Variations Lakshmi K Matukumalli. Human – Mouse Comparison.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
MapNext: a software tool for spliced and unspliced alignments and SNP detection of short sequence reads Hua Bao Sun Yat-sen University, Guangzhou,
VarDetect: a nucleotide sequence variation exploratory tool VarDetect Chumpol Ngamphiw 1, Supasak Kulawonganunchai 2, Anunchai Assawamakin 3, Ekachai Jenwitheesuk.
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
MAPPING GENOMES – genetic, physical & cytological maps Genetic distance (in cM) 1 centimorgan = 1 map unit, corresponding to recombination frequency of.
What is a SNP?. Lecture topics What is a SNP? What use are they? SNP discovery SNP genotyping Introduction to Linkage Disequilibrium.
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
Linkage and Mapping. Figure 4-8 For linked genes, recombinant frequencies are less than 50 percent.
Chapter 20: Single-nucleotide Polymorphism Profiling.
1 DNA Polymorphisms: DNA markers a useful tool in biotechnology Any section of DNA that varies among individuals in a population, “many forms”. Examples.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Lecture 7.01 The informatics of SNPs and haplotypes Gabor T. Marth Department of Biology, Boston College CGDN Bioinformatics Workshop June.
Variant calling: number of individuals vs. depth of read coverage Gabor T. Marth Boston College Biology Department 1000 Genomes Meeting Cold Spring Harbor.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
SNP Scores. Overall Score Coverage Score * 4 optional scores ▫Read Balance Score  = 1 if reads are balanced in each direction ▫Allele Balance Score 
GSVCaller – R-based computational framework for detection and annotation of short sequence variations in the human genome Vasily V. Grinev Associate Professor.
SNP Discovery in Whole-Genome Light-Shotgun 454 Pyrosequences Aaron Quinlan 1, Andrew Clark 2, Elaine Mardis 3, Gabor Marth 1 (1) Department of Biology,
Aaron R. Quinlan and Gabor T. Marth Department of Biology, Boston College, Chestnut Hill, MA 02467
Synteny - many distantly related species have co- linear maps for portions of their genomes; co-linearity between maize and sorghum, between maize and.
Integrated variant detection Erik Garrison, Boston College.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
SNP Detection Congtam Pham 2/24/04 Dr. Marth’s Class.
Single Nucleotide Polymorphism
Genome sequencing informatics
Discovery tools for human genetic variations
Databases BI420 – Introduction to Bioinformatics Gabor T. Marth
Genome organization and Bioinformatics
BI820 – Seminar in Quantitative and Computational Problems in Genomics
Biological Databases BI420 – Introduction to Bioinformatics
Databases BI420 – Introduction to Bioinformatics Gabor T. Marth
Research for medical discovery at the Computational Genomics Laboratory at Boston College Biology Gabor T. Marth Department of Biology, Boston College.
Introduction to Bioinformatics
Presentation transcript:

Polymorphism discovery informatics Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA 02467

Types of sequence variations Substitution-type single-nucleotide polymorphisms are the most abundant form of sequence variations Various insertion-deletion type polymorphisms (INDELs) are also very common

Are all substitutions SNPs? systematic pattern of bi-allelism within the population examined

What is SNP discovery? comparative analysis of multiple sequences from the same region of the genome (redundant sequence coverage) includes the organization of sequences relative to each other, and determining if sequence differences are sequencing artifacts or true polymorphisms ?

Steps of SNP discovery Sequence clustering Paralog identification (cluster refinement) Multiple alignment SNP detection

SNP discovery in diverse sequences many different types of sequences are available for polymorphism discovery EST WGS BAC BAC-end genome restriction fragments different sequence types are radically different in terms of their accuracy genome sequence: 99.9 – 99.99% single pass sequence: 98-99% early methods of SNP discovery focused on specific sequence types

General SNP mining – PolyBayes sequence clustering simplifies to database search with genome reference paralog filtering by counting mismatches weighed by quality values multiple alignment by anchoring fragments to genome reference SNP detection by differentiating true polymorphism from sequencing error using quality values

SNP validation Direct re-sequencing African Asian Caucasian Hispanic CHM 1 Pooled sequencing Validation experiments show that the SNP probability or SNP score is accurate The SNP score allows one to choose cutoff values that balance false positive rate and the recovery of rare SNPs discardkeep

Genome-scale SNP mining projects Random, shotgun reads from whole-genome libraries aligned to the genome reference sequence Overlaps of large-insert clone sequences

SNP genotyping SNP discovery: which nucleotides in the genome are polymorphic? SNP genotyping: which alleles does an individual carry at a nucleotide locus that is known to be polymorphic? a g aacgtttatgtgatt|ccagtaaa|tacggca c t aacgtttatgtgattaccagtaaattacggca aacgtttatgtgattcccagtaaattacggca person 1. aacgtttatgtgattaccagtaaattacggca aacgtttatgtgattcccagtaaagtacggca person 2.

Genotyping by sequence heterozygous peak homoozygous peak

nucleotide diversity on human chromosomes Genome variation landscape “sparse” “dense” marker density “rare” “common” allele frequency