Databases BI420 – Introduction to Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu
SNP Mining in BAC Overlaps Human Chromosome Tiling path of BACs (finished or 5x shotgun) Clone overlap GATCGGATCTACTCTTCAAAGAGT GATCGGATCTGCTCTTCAAAGAGT Candidate SNP SNP Marker Map 100 kb
BAC overlap mining overlap detection SNP analysis overlap detection inter- & intra-chromosomal duplications known human repeats fragmentary nature of draft data candidate SNP predictions
Title NH0260K08 NH0407F02 Section of base-wise alignment with marked-up candidate SNP (alignment displayed with the CONSED sequence viewer) http://genome.wustl.edu/gsc/polybayes SNP mark-up tag produced by PolyBayes
BAC overlap mining results ~ 30,000 clones >CloneX ACGTTGCAACGT GTCAATGCTGCA >CloneY ACGTTGCAACGT GTCAATGCTGCA 25,901 clones (7,122 finished, 18,779 draft with basequality values) 21,020 clone overlaps (124,356 fragment overlaps) ACCTAGGAGACTGAACTTACTG 507,152 high-quality candidate SNPs (validation rate 83-96%) Marth et al., Nature Genetics 2001 ACCTAGGAGACCGAACTTACTG
Database schema C SNP (1) T Clone (1) NH0260K08 Clone (2) NH0407F02 id name received masked 1 NH0260K08 12-25-99 12-26-99 2 NH0407F02 12-28-99 01-03-00 Hit (1) HSP (1) HSP id sense 1 1 Clone (2) NH0407F02 Hit (2) ALLELE id hitID nucleotide 1 1 C 2 2 T HIT id cloneID hspID start end 1 1 1 1 17957 2 2 1 96912 114891 C Hit (1) Allele (1) SNP (1) SNP id submitted 1 01-04-00 Database tables: CLONE table: Clone attributes and sequence file location HSP table: Significant pair-wise BLAST similarity HIT table: Region of a clone that is part of an HSP SNP table: Candidate SNP attributes ALLELE table: Attributes of an allele within a SNP T Hit (2) Allele (2)