NCBI FieldGuide NCBI Molecular Biology Resources A Field Guide part 2 August 2-3, 2005.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
BLAST Sequence alignment, E-value & Extreme value distribution.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
NCBI Minicourses BLAST Quick Start
NCBI Minicourses BLAST Quick Start
Heuristic alignment algorithms and cost matrices
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Introduction to bioinformatics
Similar Sequence Similar Function Charles Yan Spring 2006.
BLAST.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Sequence alignment, E-value & Extreme value distribution
Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive
NCBI FieldGuide NCBI Molecular Biology Resources A Field Guide part 2 September 30, 2004 ICGEB.
Sequence Alignment Lakshmanan Iyer, Ph. D.. The Building Blocks… ATGC VLMFNQEDHKRCSTPYW.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
An Introduction to Bioinformatics
BLAST What it does and what it means Steven Slater Adapted from pt.
Protein Sequence Alignment and Database Searching.
BLAST : Basic local alignment search tool B L A S T !
NCBI FieldGuide NCBI Molecular Biology Resources March 2007 Peter Cooper Using NCBI BLAST.
NCBI FieldGuide A Field Guide part 2 August 30, 2005 University of Colorado Health Sciences Center.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
NCBI FieldGuide NCBI Molecular Biology Resources January 2008 Using Entrez.
NCBI FieldGuide MapViewer Genome Resources and Sequence SimilarityLocusLink UniGene Homologene Basic Local Alignment Search Tool Gene database.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
School B&I TCD Bioinformatics Database homology searching May 2010.
Local alignment, BLAST and Psi-BLAST October 25, 2012 Local alignment Quiz 2 Learning objectives-Learn the basics of BLAST and Psi-BLAST Workshop-Use BLAST2.
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
NCBI FieldGuide NCBI Molecular Biology Resources January 2008 Peter Cooper Using NCBI BLAST.
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
Construction of Substitution Matrices
NCBI FieldGuide NCBI Molecular Biology Resources March 2007 Using Entrez.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Construction of Substitution matrices
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Sequence Search Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
Copyright OpenHelix. No use or reproduction without express written consent1.
What is BLAST? Basic BLAST search What is BLAST?
NCBI FieldGuide NCBI Molecular Biology Resources A Field Guide part 2 (post intermission) September 30, 2004 ICGEB.
Sequence Similarity The bioinformatics for molecular biologists lecture series.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Lab 3.2: Database Similarity Searching “The BLAST Buffet” Stephanie Minnema University of Calgary.
Keeping Current: Genetics Resources. This workshop will provide an overview of NCBI resources for finding-- Background information & journal articles.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
What is BLAST? Basic BLAST search What is BLAST?
Sequence similarity, BLAST alignments & multiple sequence alignments
A Practical Guide to NCBI BLAST
NCBI Molecular Biology Resources
Blast Basic Local Alignment Search Tool
Basics of BLAST Basic BLAST Search - What is BLAST?
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Identifying templates for protein modeling:
Sequence Based Analysis Tutorial
BLAST.
Sequence alignment, Part 2
Basic Local Alignment Search Tool
NCBI Molecular Biology Resources
Basic Local Alignment Search Tool (BLAST)
Genome of the week Bacillus subtilis Gram-positive soil bacterium
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Sequence alignment, E-value & Extreme value distribution
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

NCBI FieldGuide NCBI Molecular Biology Resources A Field Guide part 2 August 2-3, 2005

NCBI FieldGuide Web Access BLAST VAST Entrez Text Sequence Structure

NCBI FieldGuide Why do we need similarity searching?  To identify and annotate sequences with… incomplete (or no) annotations (GenBank) incorrect annotations  To assemble genomes  To explore evolutionary relationships by… finding homologous molecules developing phylogenetic trees NOTE: Similar sequences may NOT have similar function! Searching with Sequences

NCBI FieldGuide Basic Local Alignment Search Tool Widely used similarity search tool Heuristic approach based on Smith Waterman algorithm Finds best local alignments Provides statistical significance All combinations (DNA/Protein) query and database. –DNA vs DNA –DNA translation vs Protein –Protein vs Protein –Protein vs DNA translation –DNA translation vs DNA translation www, standalone, and network clients

NCBI FieldGuide Global vs Local Alignment Seq 1 Seq 2 Seq 1 Seq 2 Global alignment Local alignment

NCBI FieldGuide Global vs. Local Alignment Human: 15 IAKYNFHGTAEQDLPFCKGDVLTIVAVTKDPNWYKAKNKVGREGIIPANYVQKREGVKAGTKLSLMPWFH 84 +A DL F K D+L I+ T+ W+ GR G IP+NYV PW+ Worm: 63 VALFQYDARTDDDLSFKKDDILEILNDTQGDWWFARHKATGRTGYIPSNYVAREKSIES------QPWYF 125 Human: 85 GKITREQAERLLYPP--ETGLFLVRESTNYPGDYTLCVSCDGKVEHYRI-MYHASKLSIDEEVYFENLMQ 151 GK+ R AE+ L E G FLVR+S + D +L V + V+HYRI + H I F L Worm: 126 GKMRRIDAEKCLLHTLNEHGAFLVRDSESRQHDLSLSVRENDSVKHYRIQLDHGGYF-IARRRPFATLHD 194 Human: 152 LVEHYTSDADGLCTRLIKPKVMEGTVAAQDEFYRSGWALNMKELKLLQTIGKGEFGDVMLGDYRGN-KVA 220 L+ HY +ADGLC L P Y W L++ IG G+FG+V G + N VA Worm: 195 LIAHYQREADGLCVNLGAPCAKSEAPQTTTFTYDDQWEVDRRSVRLIRQIGAGQFGEVWEGRWNVNVPVA 264 Human: 221 VKCIK-NDATAQAFLAEASVMTQLRHSNLVQLLGVIVEEKGGLYIVTEYMAKGSLVDYLRSRGRSVLGGD 289 VK +K A FLAEA +M +LRH L+ L V ++ + IVTE M + +L+ +L+ RGR Worm: 265 VKKLKAGTADPTDFLAEAQIMKKLRHPKLLSLYAVCTRDE-PILIVTELMQE-NLLTFLQRRGRQCQMPQ 332 Human: 290 CLLKFSLDVCEAMEYLEGNNFVHRDLAARNVLVSEDNVAKVSDFGLT----KEASSTQDTG-KLPVKWTA 353 L++ S V M YLE NF+HRDLAARN+L++ K++DFGL KE TG + P+KWTA Worm: 333 -LVEISAQVAAGMAYLEEMNFIHRDLAARNILINNSLSVKIADFGLARILMKENEYEARTGARFPIKWTA 401 Human: 354 PEALREKKFSTKSDVWSFGILLWEIYSFGRVPYPRIPLKDVVPRVEKGYKMDAPDGCPPAVYEVMKNCWH 423 PEA +F+TKSDVWSFGILL EI +FGR+PYP + +V+ +V+ GY+M P GCP +Y++M+ CW Worm: 402 PEAANYNRFTTKSDVWSFGILLTEIVTFGRLPYPGMTNAEVLQQVDAGYRMPCPAGCPVTLYDIMQQCWR 471 Human: 424 LDAAMRPSFLQLREQLEHI 443 D RP+F L+ +LE + Worm: 472 SDPDKRPTFETLQWKLEDL 492 human M SAIQ AAWPSGT ECIAKYNFHG M S.. AA SG...A.... worm MGSCIGKEDPPPGATSPVHTSSTLGRESLPSHPRIPSIGPIAASSSGNTIDKNQNISQSANFVALFQYDA human REQLEHI KTHELHL..::. :... worm QWKLEDLFNLDSSEYKEASINF 500 Align program (Lipman and Pearson) BLASTp

NCBI FieldGuide Nucleotide Words GTACTGGACATGGACCCTACAGGAA Query : Word Size = 11 GTACTGGACAT TACTGGACATG ACTGGACATGG CTGGACATGGA TGGACATGGAC GGACATGGACC GACATGGACCC ACATGGACCCT Make a lookup table of words Minimum word size = 7 blastn default = 11 megablast default = 28

NCBI FieldGuide Protein Words GTQITVEDLFYNIATRRKALKN Query : Word Size = 3 Neighborhood Words LTV, MTV, ISV, LSV, etc. GTQ TQI QIT ITV TVE VED EDL DLF... Make a lookup table of words Word Size can be 2 or 3 (default = 3)

NCBI FieldGuide Initial Matches and Extensions Protein BLAST requires two neighboring matches within 40 aa GTQITVEDLFYNI ATCGCCATGCTTAATTGGGCTT neighborhood words exact word match one match two matches Nucleotide BLAST requires one exact match

NCBI FieldGuide An alignment that BLAST can’t find 1 GAATATATGAAGACCAAGATTGCAGTCCTGCTGGCCTGAACCACGCTATTCTTGCTGTTG || | || || || | || || || || | ||| |||||| | | || | ||| | 1 GAGTGTACGATGAGCCCGAGTGTAGCAGTGAAGATCTGGACCACGGTGTACTCGTTGTCG 61 GTTACGGAACCGAGAATGGTAAAGACTACTGGATCATTAAGAACTCCTGGGGAGCCAGTT | || || || ||| || | |||||| || | |||||| ||||| | | 61 GCTATGGTGTTAAGGGTGGGAAGAAGTACTGGCTCGTCAAGAACAGCTGGGCTGAATCCT 121 GGGGTGAACAAGGTTATTTCAGGCTTGCTCGTGGTAAAAAC |||| || ||||| || || | | |||| || ||| 121 GGGGAGACCAAGGCTACATCCTTATGTCCCGTGACAACAAC

NCBI FieldGuide An Alignment BLAST Can Make Solution: compare protein sequences; BLASTX Score = 290 bits (741), Expect = 7e-77 Identities = 147/331 (44%), Positives = 206/331 (61%), Gaps = 8/331 (2%) Frame = +3 BLAST 2 Sequences (blastx) output:

NCBI FieldGuide Scoring Systems - Nucleotides A G C T A +1 –3 –3 -3 G –3 +1 –3 -3 C –3 – T –3 –3 –3 +1 Identity matrix CAGGTAGCAAGCTTGCATGTCA || |||||||||||| ||||| raw score = 19-9 = 10 CACGTAGCAAGCTTG-GTGTCA

NCBI FieldGuide Scoring Systems - Proteins Position Independent Matrices PAM Matrices (Percent Accepted Mutation) Derived from observation; small dataset of alignments Implicit model of evolution All calculated from PAM1 PAM250 widely used BLOSUM Matrices (BLOck SUbstitution Matrices) Derived from observation; large dataset of highly conserved blocks Each matrix derived separately from blocks with a defined percent identity cutoff BLOSUM62 - default matrix for BLAST Position Specific Score Matrices (PSSMs) PSI- and RPS-BLAST

NCBI FieldGuide A 4 R -1 5 N D C Q E G H I L K M F P S T W Y V X A R N D C Q E G H I L K M F P S T W Y V X BLOSUM62 Common amino acids have low weights Rare amino acids have high weights Negative for less likely substitutions Positive for more likely substitutions

NCBI FieldGuide Gapped Alignments Gapping provides more biologically realistic alignments Statistical behavior is not completely understood for gapped alignments Gapped BLAST parameters must be found by simulations for each matrix Affine gap costs = -(a+bk) a = gap open penalty b = gap extend penalty A gap of length 1 receives the score -(a+b)

NCBI FieldGuide Scores V D S – C Y V E T L C F BLOSUM PAM Simply add the scores for each pair of aligned residues Different matrices produce different scores!

NCBI FieldGuide Local Alignment Statistics High scores of local alignments between two random sequences follow the Extreme Value Distribution Score Alignments (applies to ungapped alignments) E = Kmne - S E = mn2 -S’ K = scale for search space = scale for scoring system S’ = bitscore = ( S - lnK)/ln2 Expect Value E = number of database hits you expect to find by chance size of database your score expected number of random hits

NCBI FieldGuide Advanced BLAST Options: Nucleotide Example Entrez Queries nucleotide all[Filter] NOT mammalia[Organism] green plants[Organism] biomol mrna[Properties] gbdiv est[Properties] AND rat[organism] Other Advanced –e expect value -v 2000 descriptions -b 2000 alignments

NCBI FieldGuide Advanced BLAST Options: Protein Matrix Selection PAM30 -- most stringent BLOSUM45 -- least stringent Example Entrez Queries proteins all[Filter] NOT mammalia[Organism] green plants[Organism] srcdb refseq[Properties] Other Advanced –e expect value -v 2000 descriptions -b 2000 alignments Limit by taxon Mus musculus[Organism] Mammalia[Organism] Viridiplantae[Organism]

NCBI FieldGuide sp|P27476|NSR1_YEAST NUCLEAR LOCALIZATION SEQUENCE BINDING PROTEIN (P67) Length = 414 Score = 40.2 bits (92), Expect = Identities = 35/131 (26%), Positives = 56/131 (42%), Gaps = 4/131 (3%) Query: 362 STTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPLSKPLS---SQPQAIVTEDKTD 418 S++S SSS+S SS S + + S S S+ + E K Sbjct: 29 SSSSSESSSSSSSSSESESESESESESSSSSSSSDSESSSSSSSDSESEAETKKEESKDS 88 Filtered Unfiltered Low Complexity Filtering

NCBI FieldGuide >gi| |sp|Q96RF0|SNXI_HUMAN Sorting nexin 18 Length = 628 Score = 1048 bits (2710), Expect = 0.0 Identities = 528/628 (84%), Positives = 528/628 (84%) Query: 1 MALRARALYDFRSENPGEISLREHEVLSLCSEQDIEGWLEGVNSRGDRGLFPASYVQVIR 60 MALRARALYDFRSENPGEISLREHEVLSLCSEQDIEGWLEGVNSRGDRGLFPASYVQVIR Sbjct: 1 MALRARALYDFRSENPGEISLREHEVLSLCSEQDIEGWLEGVNSRGDRGLFPASYVQVIR 60 Query: 61 XXXXXXXXXXXXXXXXXXXNVPPGGFEXXXXXXXXXXXXXXXXXXXXXXXXXXXXXSTFQ 120 NVPPGGFE STFQ Sbjct: 61 APEPGPAGDGGPGAPARYANVPPGGFEPLPVAPPASFKPPPDAFQALLQPQQAPPPSTFQ Low Complexity Filter low complexity sequence

NCBI FieldGuide Neighbors: Precomputed BLAST Nucleotide Protein Entrez Related Sequences produces a list of sequences sorted by BLAST score, but with no alignment details.

NCBI FieldGuide Blink – Protein BLAST Alignments Lists only 200 hits List is nonredundant

NCBI FieldGuide Blink – Best Hits

NCBI FieldGuide Megablast: NCBI’s Genome Annotator Long alignments of similar DNA sequences Greedy algorithm Concatenation of query sequences Faster than blastn; less sensitive

NCBI FieldGuide MegaBLAST AI AI AI BE C:\seq\hs.4.fsa > gnl|UG|Hs#S qd43b11.x1 Homo sapiens cDNA, 3' end CATGTAAGCCATTTATTGGTTTGTTTTAAAAATATGTATTTTATTTATACATGAAGTTTG GTGAGAAGTGCTCGATTAGTTCAGACAACATCTGGCACTTGATGTCTGTCCTTCCCTCCT TTTTCCTACTCTCTTCTCCCCTCCTGCTGGTCATTGTGCAGTTCTGGAAATTAAAAAGGT GACAGCCAGGCTAAAAGCTAAGGGTTGGGTCTAGCTCACCTCCCACCCCCAACCACACCG TCTGCAGCCAGCCCCAGGCACCTGTCTCAAAGCTCCCGGGCTGTCCACACACACAAAAAC CACAGTCTCCTTCCGGCCAGCTGGGCTGGCAGCCCGACCTGC > gnl|UG|Hs#S qv37f11.x1 Homo sapiens cDNA, 3' end GAGAAGACGACAGAAGGGGAGAAGAGAGTAGGAAAAAGGAGGGAAGGACAGACATCAAGT GCCAGATGTTGTCTGAACTAATCGAGCACTTCTCACCAAACTTCATGTATAAATAAAATA CATATTTTTAAAACAAACCAATAAATGGCTTACATCAAAAAAAAAAAAAAAAAAAAAAAA GTCGTATCGATGT > gnl|UG|Hs#S qv33c06.x1 Homo sapiens cDNA, 3' end GAGAAGACGACAGAAGGGGAGAAGAGAGTAGGAAAAAGGAGGGAAGGACAGACATCAAGT GCCAGATGTTGTCTGAACTAATCGAGCACTTCTCACCAAACTTCATGTATAAATAAAATA CATATTTTTAAAACAAACCAATAAATGGCTTACATCAAAAAAAAAAAAAAAAAAAAAAAA GTCGTATCGATGT > gnl|UG|Hs#S e65f04.x1 Homo sapiens cDNA, 3' end TTTCATGTAAGCCATTTATTGGTTTGTTTTAAAAATATGTATTTTATTTATACATGAAGT TTGGTGAGAAGTGCTCGATTAGTTCAAACAACATCTGGCACTTGATGTCTGTCCTTCCCT CCTTTTTCCTACTCTCTTCTCCCCTCCTGCTGGTCATTGTGCAGTTCTGGAAATTAAAAA GGTGACAGCCAGGCTAAAAGCTAAGGGTTGGGTCTAGCTCACCTCCCACCCCCAACCACA CCGTCTGCAGCCAGCCCCAGGCACCTGTCTCAAAGCTCCCGGGCTGTCCACACACACAAA AACCACAGTCTCCTTCCGGCCAGCTGGGCTGGCAGCCCGACCTGCCTCCCAACCGCATTC CTGCCTGTGTAGCAGGCGGTGAGCACCCAGAAGGGGCACATACCTCTCCAAGCCTTGAAA GCAAAGCATGGAGATCTACAAAAATAGGATTTCCACTTGGAGAAATGTCGCTGGGACAGT

NCBI FieldGuide Discontiguous Megablast Uses discontiguous word matches Better for cross-species comparisons

NCBI FieldGuide Templates for Discontiguous MegaBLAST W = 11, t = 16, coding: W = 11, t = 16, non-coding: W = 12, t = 16, coding: W = 12, t = 16, non-coding: W = 11, t = 18, coding: W = 11, t = 18, non-coding: W = 12, t = 18, coding: W = 12, t = 18, non-coding: W = 11, t = 21, coding: W = 11, t = 21, non-coding: W = 12, t = 21, coding: W = 12, t = 21, non-coding: Ma, B., Tromp, J., Li, M., "PatternHunter: faster and more sensitive homology search", Bioinformatics 2002 Mar;18(3):440-5

NCBI FieldGuide Nucleotide vs. Protein BLAST aaccgggtgacggtggtgctcggtgcgcagtggggcgacgaaggc Human: N R V T V V L G A Q W G D E G + + V + V L G Q W G D E G A.th.: S Q V S G V L G C Q W G D E G agtcaagtatctggtgtactcggttgccaatggggagatgaaggt Comparing ADSS from H. sapiens and A. thaliana BLASTp finds three matching words BLASTn finds no match, because there are no 7 bp words Protein searches are generally more sensitive than nucleotide searches.

NCBI FieldGuide Translated BLAST QueryDatabaseProgram NP ucleotide rotein N N N N P P blastx tblastn tblastx PPP PPP PPP PPP PPP PPP PPP PPP Particularly useful for nucleotide sequences without protein annotations, such as ESTs or genomic DNA

NCBI FieldGuide Genomic BLAST These pages provide customized nucleotide and protein databases for each genome If a Map Viewer is available, the BLAST hits can be viewed on the maps

NCBI FieldGuide BLAST the Chicken Genome Program Accession for human TPO mRNA

NCBI FieldGuide BLAST Hit on the Genome

NCBI FieldGuide BLASTn Hit on the Map Viewer

NCBI FieldGuide TBLASTN Results Using NP_000538

NCBI FieldGuide Linking Protein Sequence, Structure, and Function sequence  function (pfam, smart) Structure PSI-BLAST RPS-BLAST VAST BLASTp sequence  structure structure  structure sequence  structure + function (cd)

NCBI FieldGuide Position Specific Substitution Rates Active site serineWeakly conserved serine

NCBI FieldGuide Position Specific Score Matrix (PSSM) A R N D C Q E G H I L K M F P S T W Y V 206 D G V I S S C N G D S G G P L N C Q A Serine is scored differently in these two positions Active site nucleophile

NCBI FieldGuide PSI-BLAST Create your own PSSM: Confirming relationships of purine nucleotide metabolism proteins query BLOSUM62 PSSM Alignment

NCBI FieldGuide PSI BLAST >gi|113340|sp|P03958|ADA_MOUSE ADENOSINE DEAMINASE (ADENOSINE AMINOH MAQTPAFNKPKVELHVHLDGAIKPETILYFGKKRGIALPADTVEELRNIIGMDKPLSLPGFLAKFDYY VIAGCREAIKRIAYEFVEMKAKEGVVYVEVRYSPHLLANSKVDPMPWNQTEGDVTPDDVVDLVNQGLQ EQAFGIKVRSILCCMRHQPSWSLEVLELCKKYNQKTVVAMDLAGDETIEGSSLFPGHVEAYEGAVKNG RTVHAGEVGSPEVVREAVDILKTERVGHGYHTIEDEALYNRLLKENMHFEVCPWSSYLTGAWDPKTTH VRFKNDKANYSLNTDDPLIFKSTLDTDYQMTKKDMGFTEEEFKRLNINAAKSSFLPEEEKKELLERLY e value cutoff for PSSM

NCBI FieldGuide PSI Results: Initial BLAST Run

NCBI FieldGuide First PSSM Search Other purine nucleotide metabolizing enzymes not found by ordinary BLAST

NCBI FieldGuide Third PSSM Search: Convergence Just below threshold, another nucleotide metabolism enzyme

NCBI FieldGuide Pfam-A seeds: HMM based models representing a wide variety of functional domains derived from SWISS-PROT COG SMART CD Entrez Domains (CDD) HMM based models originally concentrating on eukaryotic signaling domains, now expanding BLAST based alignments derived from complete proteomes of prokaryotes NCBI curated domains based on sequence and structural alignments Pfam pfam01234 smart00123 cd01234 COG0123 NCBI Sanger EMBL Single Domains Protein Families

NCBI FieldGuide Protein Links: Domains

NCBI FieldGuide Results of a CD-Search CD SMART Pfam Click on a colored bar to align your sequence to the CD

NCBI FieldGuide CDD Record – heme peroxidases aligned query red = high conservation blue = low conservation

NCBI FieldGuide Curated CD Record Curated CDs (cd12345) are based on sequence and structure alignments Annotated features Structural evidence aligned query

NCBI FieldGuide Blink: Sequence to Structure related structures

NCBI FieldGuide Related Structures Cn3D

NCBI FieldGuide Entrez Structure Derived from experimentally determined PDB records Add value to PDB records by: –Adding explicit chemical bonding information –Validating and indexing the sequences –Annotating 3D domains and secondary structure –Adding links to CDD, Taxonomy, Pubmed –Converting PDB data to ASN.1 Structure neighbors determined by Vector Alignment Search Tool (VAST) MM MMDB: Molecular Modeling Data Base Structure

NCBI FieldGuide Structure Summary Page Conserved Domains VAST Neighbors for chain C (domain 0) Cn3D VAST Neighbors for domain 2

NCBI FieldGuide VAST: Structure Neighbors Vector Alignment Search Tool For each 3D domain, locate SSEs (secondary structure elements), and represent them as individual vectors Human IL-4 VAST uses 3D Domains only! Whole polypeptides are assigned 3D domain 0 (zero).

NCBI FieldGuide VAST Neighbors 1D2V 1Q4G 3D domains!

NCBI FieldGuide Viewing a VAST Alignment RMSD in Angstroms Sequence percent identityVAST P value Cn3D 

NCBI FieldGuide Submitting a PDB File to VAST Redesigned interface! This is the best way to convert PDB into MMDB format! New!

NCBI FieldGuide Entrez PubChem PC Substance PC Compound PC BioAssay Primary database of chemical samples Derived database of known chemicals from PC Substance records Primary database of bioactivity screens of samples in PC Substance New!

NCBI FieldGuide Links from Structure N-acetylglucosamine heme mannose fucose

NCBI FieldGuide Search for thyroxine ChemID 24 KEGG 4 DTP-NCI 3 NIST 3 Biocyc 2 BIND 2 Chembank 2 NIAID 1 TOTAL 41

NCBI FieldGuide Sequence Polymorphisms SNPOMIM Primary database of submitted SNPs Curated database of reference SNPs Contains more than just SNPs: True SNPs MNP (multiple nucleotide) Insertions Deletions Microsatellites Mixed No variation (constant) Clinical literature database Curated at Johns Hopkins Univ Links human genes and genetic disorders to human disease Lists allelic variants that have clinical consequences Variations in SNP are not necessarily in OMIM, and vice versa! General PolymorphismsHuman Phenotypes

NCBI FieldGuide Linking to SNP Links to SNP are also available from Nucleotide and Protein Entrez Gene - TPO

NCBI FieldGuide Entrez SNP primary data: ss# SNP UID: rs#

NCBI FieldGuide Find Non-synonymous SNPs #7 AND coding nonsynon[Function Class] Function Class

NCBI FieldGuide Non-synonymous TPO SNPs Link to Map Viewer View all SNPs in locus Link to related 3D structures

NCBI FieldGuide GeneView in dbSNP

NCBI FieldGuide Links to OMIM Links to SNP are also available from Nucleotide and Protein Entrez Gene - TPO

NCBI FieldGuide OMIM Record

NCBI FieldGuide Explore a Disease SNP 799

NCBI FieldGuide Curated CD Record E799 Cn3D

NCBI FieldGuide For More Information…

NCBI FieldGuide For More Information… General addresses The (free!) NCBI Newsletter The NCBI Handbook The NCBI Education Page Follow the link from the NCBI Home Page