Databases BI420 – Introduction to Bioinformatics Gabor T. Marth

Slides:



Advertisements
Similar presentations
Genetic Map and Forward Genetics Tools for C. briggsae Presented by Dan Koboldt Ray Miller’s Group.
Advertisements

Sequence alignment Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
9 Genomics and Beyond Brief Chapter Outline
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
A coalescent computational platform for tagging marker selection for clinical studies Gabor T. Marth Department of Biology, Boston College
General methods of SNP discovery: PolyBayes Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
Positional Cloning LOD Sib pairs Chromosome Region Association Study Genetics Genomics Physical Mapping/ Sequencing Candidate Gene Selection/ Polymorphism.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Genome Annotation and the landscape of the Human Genome Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
How to access genomic information using Ensembl August 2005.
BI420 – Course information Web site: Instructor: Gabor Marth Teaching.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Human Genome Sequence and Variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary,
Polymorphism discovery informatics Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
Sequence Variation Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Informatics challenges and computer tools for sequencing 1000s of human genomes Gabor T. Marth Boston College Biology Department Cold Spring Harbor Laboratory.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Human Genome Project Seminal achievement. Scientific milestone. Scientific implications. Social implications.
BioInformatics (2). Physical Mapping - I Low resolution  Megabase-scale High resolution  Kilobase-scale or better Methods for low resolution mapping.
Mapping NGS sequences to a reference genome. Why? Resequencing studies (DNA) – Structural variation – SNP identification RNAseq – Mapping transcripts.
Mouse Genome Sequencing
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Genome Sequencing in the Legumes Le et al Phylogeny Major sequencing efforts Minor sequencing efforts ~14 MY ~45 MY.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
Status report on gap closure of the human chromosome 5 BAC map Authentication of C5 BAC maps Map and sequence status Gap status and steps used to close.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Chromosome 2 Doil Choi, Sunghwan Jo KOREA. Cytological architecture of chromosome kb/µm DAPI (4’-6-diamidino-2-phenylindole) stained pachytene chromosome.
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
Linkage and Mapping. Figure 4-8 For linked genes, recombinant frequencies are less than 50 percent.
Chromosome 12 M. Pietrella 1, G. Falcone 1, E. Fantini 1, A. Fiore 1, C. Perla 1, M.R. Ercolano 2, A. Barone 2, M.L. Chiusano 2, S. Grandillo 3, N. D’Agostino.
Human Genome.
Lecture 7.01 The informatics of SNPs and haplotypes Gabor T. Marth Department of Biology, Boston College CGDN Bioinformatics Workshop June.
Center for Integrated Fungal Research
What do we already know ? The rice disease resistance gene Pi-ta Genetically mapped to chromosome 12 Rybka et al. (1997). It has also been sequenced Bryan.
A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
SNP Discovery in Whole-Genome Light-Shotgun 454 Pyrosequences Aaron Quinlan 1, Andrew Clark 2, Elaine Mardis 3, Gabor Marth 1 (1) Department of Biology,
Chapter 5 Sequence Assembly: Assembling the Human Genome.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
1 Bioinformatics Tools for Genotyping Frances Tong Dr. Garry Larson, Ph.D City of Hope Department of Molecular Medicine Southern California Bioinformatics.
SNP Detection Congtam Pham 2/24/04 Dr. Marth’s Class.
Single Nucleotide Polymorphism
Of Sea Urchins, Birds and Men
Genome sequence assembly
Pre-genomic era: finding your own clones
Genome sequencing informatics
Discovery tools for human genetic variations
Genome organization and Bioinformatics
Geneomics and Database Mining and Genetic Mapping
BI820 – Seminar in Quantitative and Computational Problems in Genomics
Sequence alignment, Part 2
Progress in sequencing chromosome 6
Biological Databases BI420 – Introduction to Bioinformatics
Sequence alignment BI420 – Introduction to Bioinformatics
Caroline Durrant, Krina T. Zondervan, Lon R
Introduction to Sequencing
Databases BI420 – Introduction to Bioinformatics Gabor T. Marth
Sequence the 3 billion base pairs of human
Research for medical discovery at the Computational Genomics Laboratory at Boston College Biology Gabor T. Marth Department of Biology, Boston College.
Heat map of additive effects for PCs QTL
Human Genome Project Seminal achievement. Scientific milestone.
Introduction to Bioinformatics
Presentation transcript:

Databases BI420 – Introduction to Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu

SNP Mining in BAC Overlaps Human Chromosome Tiling path of BACs (finished or 5x shotgun) Clone overlap GATCGGATCTACTCTTCAAAGAGT GATCGGATCTGCTCTTCAAAGAGT Candidate SNP SNP Marker Map 100 kb

BAC overlap mining overlap detection SNP analysis overlap detection inter- & intra-chromosomal duplications known human repeats fragmentary nature of draft data candidate SNP predictions

Title NH0260K08 NH0407F02 Section of base-wise alignment with marked-up candidate SNP (alignment displayed with the CONSED sequence viewer) http://genome.wustl.edu/gsc/polybayes SNP mark-up tag produced by PolyBayes

BAC overlap mining results ~ 30,000 clones >CloneX ACGTTGCAACGT GTCAATGCTGCA >CloneY ACGTTGCAACGT GTCAATGCTGCA 25,901 clones (7,122 finished, 18,779 draft with basequality values) 21,020 clone overlaps (124,356 fragment overlaps) ACCTAGGAGACTGAACTTACTG 507,152 high-quality candidate SNPs (validation rate 83-96%) Marth et al., Nature Genetics 2001 ACCTAGGAGACCGAACTTACTG

Database schema C SNP (1) T Clone (1) NH0260K08 Clone (2) NH0407F02 id name received masked 1 NH0260K08 12-25-99 12-26-99 2 NH0407F02 12-28-99 01-03-00 Hit (1) HSP (1) HSP id sense 1 1 Clone (2) NH0407F02 Hit (2) ALLELE id hitID nucleotide 1 1 C 2 2 T HIT id cloneID hspID start end 1 1 1 1 17957 2 2 1 96912 114891 C Hit (1) Allele (1) SNP (1) SNP id submitted 1 01-04-00 Database tables: CLONE table: Clone attributes and sequence file location HSP table: Significant pair-wise BLAST similarity HIT table: Region of a clone that is part of an HSP SNP table: Candidate SNP attributes ALLELE table: Attributes of an allele within a SNP T Hit (2) Allele (2)