NCBI FieldGuide NCBI Molecular Biology Resources A Field Guide part 2 (post intermission) September 30, 2004 ICGEB
NCBI FieldGuide PSI-BLAST Position-Specific Iterated BLAST Mining for protein domains Confirming relationships among related proteins
NCBI FieldGuide Position Specific Substitution Rates Active site serine Weakly conserved serine
NCBI FieldGuide Position Specific Score Matrix (PSSM) A R N D C Q E G H I L K M F P S T W Y V 206 D G V I D S C N G D S G G P L N C Q A Active site nucleophile Serine scored differently in these two positions
NCBI FieldGuide >gi|113340|sp|P03958|ADA_MOUSE ADENOSINE DEAMINASE (ADENOSINE MAQTPAFNKPKVELHVHLDGAIKPETILYFGKKRGIALPADTVEELRNIIGMDKPLSLPGF VIAGCREAIKRIAYEFVEMKAKEGVVYVEVRYSPHLLANSKVDPMPWNQTEGDVTPDDVVD EQAFGIKVRSILCCMRHQPSWSLEVLELCKKYNQKTVVAMDLAGDETIEGSSLFPGHVEAY RTVHAGEVGSPEVVREAVDILKTERVGHGYHTIEDEALYNRLLKENMHFEVCPWSSYLTGA VRFKNDKANYSLNTDDPLIFKSTLDTDYQMTKKDMGFTEEEFKRLNINAAKSSFLPEEEKK PSI-BLAST e value cutoff for PSSM
NCBI FieldGuide RESULTS: Initial BLASTP Same results as protein-protein BLAST
NCBI FieldGuide Results of First PSSM Search Other purine nucleotide metabolizing enzymes not found by ordinary BLAST
NCBI FieldGuide Third PSSM Search: Convergence Just below threshold, another nucleotide metabolism enzyme Check to add to PSSM
NCBI FieldGuide MegaBLAST AI AI AI BE C:\seq\hs.4.fsa > gnl|UG|Hs#S qd43b11.x1 Homo sapiens cDNA, 3' end CATGTAAGCCATTTATTGGTTTGTTTTAAAAATATGTATTTTATTTATACATGAAGTTTG GTGAGAAGTGCTCGATTAGTTCAGACAACATCTGGCACTTGATGTCTGTCCTTCCCTCCT TTTTCCTACTCTCTTCTCCCCTCCTGCTGGTCATTGTGCAGTTCTGGAAATTAAAAAGGT GACAGCCAGGCTAAAAGCTAAGGGTTGGGTCTAGCTCACCTCCCACCCCCAACCACACCG TCTGCAGCCAGCCCCAGGCACCTGTCTCAAAGCTCCCGGGCTGTCCACACACACAAAAAC CACAGTCTCCTTCCGGCCAGCTGGGCTGGCAGCCCGACCTGC > gnl|UG|Hs#S qv37f11.x1 Homo sapiens cDNA, 3' end GAGAAGACGACAGAAGGGGAGAAGAGAGTAGGAAAAAGGAGGGAAGGACAGACATCAAGT GCCAGATGTTGTCTGAACTAATCGAGCACTTCTCACCAAACTTCATGTATAAATAAAATA CATATTTTTAAAACAAACCAATAAATGGCTTACATCAAAAAAAAAAAAAAAAAAAAAAAA GTCGTATCGATGT > gnl|UG|Hs#S qv33c06.x1 Homo sapiens cDNA, 3' end GAGAAGACGACAGAAGGGGAGAAGAGAGTAGGAAAAAGGAGGGAAGGACAGACATCAAGT GCCAGATGTTGTCTGAACTAATCGAGCACTTCTCACCAAACTTCATGTATAAATAAAATA CATATTTTTAAAACAAACCAATAAATGGCTTACATCAAAAAAAAAAAAAAAAAAAAAAAA GTCGTATCGATGT > gnl|UG|Hs#S e65f04.x1 Homo sapiens cDNA, 3' end TTTCATGTAAGCCATTTATTGGTTTGTTTTAAAAATATGTATTTTATTTATACATGAAGT TTGGTGAGAAGTGCTCGATTAGTTCAAACAACATCTGGCACTTGATGTCTGTCCTTCCCT CCTTTTTCCTACTCTCTTCTCCCCTCCTGCTGGTCATTGTGCAGTTCTGGAAATTAAAAA GGTGACAGCCAGGCTAAAAGCTAAGGGTTGGGTCTAGCTCACCTCCCACCCCCAACCACA CCGTCTGCAGCCAGCCCCAGGCACCTGTCTCAAAGCTCCCGGGCTGTCCACACACACAAA AACCACAGTCTCCTTCCGGCCAGCTGGGCTGGCAGCCCGACCTGCCTCCCAACCGCATTC CTGCCTGTGTAGCAGGCGGTGAGCACCCAGAAGGGGCACATACCTCTCCAAGCCTTGAAA GCAAAGCATGGAGATCTACAAAAATAGGATTTCCACTTGGAGAAATGTCGCTGGGACAGT
NCBI FieldGuide What is Discontiguous (Cross-species) MegaBLAST? W = 11, t = 16, coding: W = 11, t = 16, non-coding: W = 12, t = 16, coding: W = 12, t = 16, non-coding: W = 11, t = 18, coding: W = 11, t = 18, non-coding: W = 12, t = 18, coding: W = 12, t = 18, non-coding: W = 11, t = 21, coding: W = 11, t = 21, non-coding: W = 12, t = 21, coding: W = 12, t = 21, non-coding: Ma, B., Tromp, J., Li, M., "PatternHunter: faster and more sensitive homology search", Bioinformatics 2002 Mar;18(3):440-5
NCBI FieldGuide Neighbors: Precomputed BLAST Nucleotide Protein Entrez Related Sequences produces a list of sequences sorted by BLAST score, but with no alignment details.
NCBI FieldGuide Blink – Protein BLAST Alignments Lists only 200 hits List is nonredundant
NCBI FieldGuide Blink – Linking Sequence to Structure Cn3D
NCBI FieldGuide BLAST: Related Structures
NCBI FieldGuide BLAST Databases: Non-redundant protein nr (non-redundant protein sequences) –GenBank CDS translations –NP_ RefSeqs –Outside Protein PIR, Swiss-Prot, PRF –PDB (sequences from structures)
NCBI FieldGuide BLAST Databases: Nucleic Acid nr (nt) –Traditional GenBank Divisions –NM_ and XM_ RefSeqs dbest –EST Division htgs –HTG division gss –GSS division chromosome –NC_ RefSeqs wgs –whole genome shotgun
NCBI FieldGuide Genomic BLAST These pages provide customized nucleotide and protein databases for each genome If a Map Viewer is available, the BLAST hits can be viewed on the maps
NCBI FieldGuide What if Your Favorite Gene is not found in the latest genome build? POSSIBLE VARIANTS: The gene does not exist; It exists, but there is a problem with assembly; It exists, but there is a problem with annotation
NCBI FieldGuide An example: finding prestin in Human genome We start with rat prestin, BLAST it against the Human genome and look for evidences that human prestin exists as well.
NCBI FieldGuide Searching the Human Genome >gi| |emb|AJ |RNO Rattus norvegicus ATGGATCATGCTGAAGAAAATGAAATTCCTGCAGAGATCAGAAGTACCTCGTGGAA GTCATCCGGTCCTCCAGGAGAGGCTGCACGTCAAGGACAAAGTCACAGACTCCATC GCAGGCATTCACGTGCACTCCTAAAAAAGTAAGAAACATCATCTACATGTTCTTGC TTGCCAGCATATAAATTCAAGGAGTATGTGCTGGGTGACTTGGTCTCGGGCATAAG AGCTCCCCCAAGGCTTAGCCTTCGCGATGCTGGCAGCTGTGCCTCCGGTGTTCGGC On for same species comparisons
NCBI FieldGuide BLAST Results 16 hits to one contig Human Genome Database 953 contigs 2.9 billion letters
NCBI FieldGuide Map Viewer: Genomic Context of BLAST Hits Genes Genome Scan Models Human EST hits Contig GenBank Mouse EST hits
NCBI FieldGuide Human prestin: now appears in Build 34
NCBI FieldGuide Now we can compare genes
NCBI FieldGuide Three prestin genes: finally together!
NCBI FieldGuide Same prestin, different assemblies
NCBI FieldGuide Does homology mean the common biological function? Not always; the existence of the common ancestor does not guarantee that some function won’t be lost or acquired after the divergence. An example: zeta-crystallin is a component of a transparent lens matrix of the vertebrate eye. Its homolog in E.coli is the metabolic enzyme quinone oxidoreductase.
NCBI FieldGuide BLAST VAST Entrez Text Sequence Structure
NCBI FieldGuide Structure similarity: No More BLASTing! Three-dimensional structures are most conserved during the evolution; One still can detect the existence of the common ancestor based on the structure similarity; Spatial similarity is not calculated the same way we do it for sequences
NCBI FieldGuide VAST: Structure Neighbors Vector Alignment Search Tool For each protein chain, locate SSEs (secondary structure elements), and represent them as individual vectors Human IL-4
NCBI FieldGuide VAST: Structure Neighbors
NCBI FieldGuide Structure Neighbors in Cn3D SH3SH2 C-Src kinase Human vs. Chicken
NCBI FieldGuide 3D Domain Neighbors Human C-Src Kinase (Tyr) vs. Chk1 kinase (Ser/Thr)
NCBI FieldGuide NCBI is changing From sequence data storage facility to one-stop shop with integrated databases of various kind. You can be part of the future – work with us! Your expertise and data are indispensable.
NCBI FieldGuide GenBank
NCBI FieldGuide Refseq
NCBI FieldGuide Entrez Gene
NCBI FieldGuide Homologene database
NCBI FieldGuide New generation of databases: an example
NCBI FieldGuide Protein interaction database: a seed for future precomputed resources
NCBI FieldGuide New databases: GenSAT
NCBI FieldGuide PubChem
NCBI FieldGuide Headache? Take Aspirin
NCBI FieldGuide Aspirin has 432 neighbors
NCBI FieldGuide Link to 3D protein structures
NCBI FieldGuide PubCrawler – Update Alerting Service for PubMed and GenBank
NCBI FieldGuide MedBlast: searching for articles related to a sequence.
NCBI FieldGuide For More Information… General addresses The (free!) NCBI Newsletter The NCBI Handbook The NCBI Education Page Follow the link from the NCBI Home Page