Presentation is loading. Please wait.

Presentation is loading. Please wait.

NCBI FieldGuide NCBI Molecular Biology Resources A Field Guide part 2 (post intermission) September 30, 2004 ICGEB.

Similar presentations


Presentation on theme: "NCBI FieldGuide NCBI Molecular Biology Resources A Field Guide part 2 (post intermission) September 30, 2004 ICGEB."— Presentation transcript:

1 NCBI FieldGuide NCBI Molecular Biology Resources A Field Guide part 2 (post intermission) September 30, 2004 ICGEB

2 NCBI FieldGuide PSI-BLAST Position-Specific Iterated BLAST Mining for protein domains Confirming relationships among related proteins

3 NCBI FieldGuide Position Specific Substitution Rates Active site serine Weakly conserved serine

4 NCBI FieldGuide Position Specific Score Matrix (PSSM) A R N D C Q E G H I L K M F P S T W Y V 206 D 0 -2 0 2 -4 2 4 -4 -3 -5 -4 0 -2 -6 1 0 -1 -6 -4 -1 207 G -2 -1 0 -2 -4 -3 -3 6 -4 -5 -5 0 -2 -3 -2 -2 -1 0 -6 -5 208 V -1 1 -3 -3 -5 -1 -2 6 -1 -4 -5 1 -5 -6 -4 0 -2 -6 -4 -2 209 I -3 3 -3 -4 -6 0 -1 -4 -1 2 -4 6 -2 -5 -5 -3 0 -1 -4 0 210 D -2 -5 0 8 -5 -3 -2 -1 -4 -7 -6 -4 -6 -7 -5 1 -3 -7 -5 -6 211 S 4 -4 -4 -4 -4 -1 -4 -2 -3 -3 -5 -4 -4 -5 -1 4 3 -6 -5 -3 212 C -4 -7 -6 -7 12 -7 -7 -5 -6 -5 -5 -7 -5 0 -7 -4 -4 -5 0 -4 213 N -2 0 2 -1 -6 7 0 -2 0 -6 -4 2 0 -2 -5 -1 -3 -3 -4 -3 214 G -2 -3 -3 -4 -4 -4 -5 7 -4 -7 -7 -5 -4 -4 -6 -3 -5 -6 -6 -6 215 D -5 -5 -2 9 -7 -4 -1 -5 -5 -7 -7 -4 -7 -7 -5 -4 -4 -8 -7 -7 216 S -2 -4 -2 -4 -4 -3 -3 -3 -4 -6 -6 -3 -5 -6 -4 7 -2 -6 -5 -5 217 G -3 -6 -4 -5 -6 -5 -6 8 -6 -8 -7 -5 -6 -7 -6 -4 -5 -6 -7 -7 218 G -3 -6 -4 -5 -6 -5 -6 8 -6 -7 -7 -5 -6 -7 -6 -2 -4 -6 -7 -7 219 P -2 -6 -6 -5 -6 -5 -5 -6 -6 -6 -7 -4 -6 -7 9 -4 -4 -7 -7 -6 220 L -4 -6 -7 -7 -5 -5 -6 -7 0 -1 6 -6 1 0 -6 -6 -5 -5 -4 0 221 N -1 -6 0 -6 -4 -4 -6 -6 -1 3 0 -5 4 -3 -6 -2 -1 -6 -1 6 222 C 0 -4 -5 -5 10 -2 -5 -5 1 -1 -1 -5 0 -1 -4 -1 0 -5 0 0 223 Q 0 1 4 2 -5 2 0 0 0 -4 -2 1 0 0 0 -1 -1 -3 -3 -4 224 A -1 -1 1 3 -4 -1 1 4 -3 -4 -3 -1 -2 -2 -3 0 -2 -2 -2 -3 Active site nucleophile Serine scored differently in these two positions

5 NCBI FieldGuide >gi|113340|sp|P03958|ADA_MOUSE ADENOSINE DEAMINASE (ADENOSINE MAQTPAFNKPKVELHVHLDGAIKPETILYFGKKRGIALPADTVEELRNIIGMDKPLSLPGF VIAGCREAIKRIAYEFVEMKAKEGVVYVEVRYSPHLLANSKVDPMPWNQTEGDVTPDDVVD EQAFGIKVRSILCCMRHQPSWSLEVLELCKKYNQKTVVAMDLAGDETIEGSSLFPGHVEAY RTVHAGEVGSPEVVREAVDILKTERVGHGYHTIEDEALYNRLLKENMHFEVCPWSSYLTGA VRFKNDKANYSLNTDDPLIFKSTLDTDYQMTKKDMGFTEEEFKRLNINAAKSSFLPEEEKK PSI-BLAST e value cutoff for PSSM

6 NCBI FieldGuide RESULTS: Initial BLASTP Same results as protein-protein BLAST

7 NCBI FieldGuide Results of First PSSM Search Other purine nucleotide metabolizing enzymes not found by ordinary BLAST

8 NCBI FieldGuide Third PSSM Search: Convergence Just below threshold, another nucleotide metabolism enzyme Check to add to PSSM

9 NCBI FieldGuide MegaBLAST AI217550 AI251192 AI254381 BE645079 C:\seq\hs.4.fsa > 1133045 gnl|UG|Hs#S1133045 qd43b11.x1 Homo sapiens cDNA, 3' end CATGTAAGCCATTTATTGGTTTGTTTTAAAAATATGTATTTTATTTATACATGAAGTTTG GTGAGAAGTGCTCGATTAGTTCAGACAACATCTGGCACTTGATGTCTGTCCTTCCCTCCT TTTTCCTACTCTCTTCTCCCCTCCTGCTGGTCATTGTGCAGTTCTGGAAATTAAAAAGGT GACAGCCAGGCTAAAAGCTAAGGGTTGGGTCTAGCTCACCTCCCACCCCCAACCACACCG TCTGCAGCCAGCCCCAGGCACCTGTCTCAAAGCTCCCGGGCTGTCCACACACACAAAAAC CACAGTCTCCTTCCGGCCAGCTGGGCTGGCAGCCCGACCTGC > 1141828 gnl|UG|Hs#S1141828 qv37f11.x1 Homo sapiens cDNA, 3' end GAGAAGACGACAGAAGGGGAGAAGAGAGTAGGAAAAAGGAGGGAAGGACAGACATCAAGT GCCAGATGTTGTCTGAACTAATCGAGCACTTCTCACCAAACTTCATGTATAAATAAAATA CATATTTTTAAAACAAACCAATAAATGGCTTACATCAAAAAAAAAAAAAAAAAAAAAAAA GTCGTATCGATGT > 1145899 gnl|UG|Hs#S1145899 qv33c06.x1 Homo sapiens cDNA, 3' end GAGAAGACGACAGAAGGGGAGAAGAGAGTAGGAAAAAGGAGGGAAGGACAGACATCAAGT GCCAGATGTTGTCTGAACTAATCGAGCACTTCTCACCAAACTTCATGTATAAATAAAATA CATATTTTTAAAACAAACCAATAAATGGCTTACATCAAAAAAAAAAAAAAAAAAAAAAAA GTCGTATCGATGT > 2291670 gnl|UG|Hs#S2291670 7e65f04.x1 Homo sapiens cDNA, 3' end TTTCATGTAAGCCATTTATTGGTTTGTTTTAAAAATATGTATTTTATTTATACATGAAGT TTGGTGAGAAGTGCTCGATTAGTTCAAACAACATCTGGCACTTGATGTCTGTCCTTCCCT CCTTTTTCCTACTCTCTTCTCCCCTCCTGCTGGTCATTGTGCAGTTCTGGAAATTAAAAA GGTGACAGCCAGGCTAAAAGCTAAGGGTTGGGTCTAGCTCACCTCCCACCCCCAACCACA CCGTCTGCAGCCAGCCCCAGGCACCTGTCTCAAAGCTCCCGGGCTGTCCACACACACAAA AACCACAGTCTCCTTCCGGCCAGCTGGGCTGGCAGCCCGACCTGCCTCCCAACCGCATTC CTGCCTGTGTAGCAGGCGGTGAGCACCCAGAAGGGGCACATACCTCTCCAAGCCTTGAAA GCAAAGCATGGAGATCTACAAAAATAGGATTTCCACTTGGAGAAATGTCGCTGGGACAGT

10 NCBI FieldGuide What is Discontiguous (Cross-species) MegaBLAST? W = 11, t = 16, coding: 1101101101101101 W = 11, t = 16, non-coding: 1110010110110111 W = 12, t = 16, coding: 1111101101101101 W = 12, t = 16, non-coding: 1110110110110111 W = 11, t = 18, coding: 101101100101101101 W = 11, t = 18, non-coding: 111010010110010111 W = 12, t = 18, coding: 101101101101101101 W = 12, t = 18, non-coding: 111010110010110111 W = 11, t = 21, coding: 100101100101100101101 W = 11, t = 21, non-coding: 111010010100010010111 W = 12, t = 21, coding: 100101101101100101101 W = 12, t = 21, non-coding: 111010010110010010111 Ma, B., Tromp, J., Li, M., "PatternHunter: faster and more sensitive homology search", Bioinformatics 2002 Mar;18(3):440-5

11 NCBI FieldGuide Neighbors: Precomputed BLAST Nucleotide Protein Entrez Related Sequences produces a list of sequences sorted by BLAST score, but with no alignment details.

12 NCBI FieldGuide Blink – Protein BLAST Alignments Lists only 200 hits List is nonredundant

13 NCBI FieldGuide Blink – Linking Sequence to Structure Cn3D

14 NCBI FieldGuide BLAST: Related Structures

15 NCBI FieldGuide BLAST Databases: Non-redundant protein nr (non-redundant protein sequences) –GenBank CDS translations –NP_ RefSeqs –Outside Protein PIR, Swiss-Prot, PRF –PDB (sequences from structures)

16 NCBI FieldGuide BLAST Databases: Nucleic Acid nr (nt) –Traditional GenBank Divisions –NM_ and XM_ RefSeqs dbest –EST Division htgs –HTG division gss –GSS division chromosome –NC_ RefSeqs wgs –whole genome shotgun

17 NCBI FieldGuide Genomic BLAST These pages provide customized nucleotide and protein databases for each genome If a Map Viewer is available, the BLAST hits can be viewed on the maps

18 NCBI FieldGuide What if Your Favorite Gene is not found in the latest genome build? POSSIBLE VARIANTS: The gene does not exist; It exists, but there is a problem with assembly; It exists, but there is a problem with annotation

19 NCBI FieldGuide An example: finding prestin in Human genome We start with rat prestin, BLAST it against the Human genome and look for evidences that human prestin exists as well.

20 NCBI FieldGuide Searching the Human Genome >gi|12188917|emb|AJ303372.1|RNO303372 Rattus norvegicus ATGGATCATGCTGAAGAAAATGAAATTCCTGCAGAGATCAGAAGTACCTCGTGGAA GTCATCCGGTCCTCCAGGAGAGGCTGCACGTCAAGGACAAAGTCACAGACTCCATC GCAGGCATTCACGTGCACTCCTAAAAAAGTAAGAAACATCATCTACATGTTCTTGC TTGCCAGCATATAAATTCAAGGAGTATGTGCTGGGTGACTTGGTCTCGGGCATAAG AGCTCCCCCAAGGCTTAGCCTTCGCGATGCTGGCAGCTGTGCCTCCGGTGTTCGGC On for same species comparisons

21 NCBI FieldGuide BLAST Results 16 hits to one contig Human Genome Database 953 contigs 2.9 billion letters

22 NCBI FieldGuide Map Viewer: Genomic Context of BLAST Hits Genes Genome Scan Models Human EST hits Contig GenBank Mouse EST hits

23 NCBI FieldGuide Human prestin: now appears in Build 34

24 NCBI FieldGuide Now we can compare genes

25 NCBI FieldGuide Three prestin genes: finally together!

26 NCBI FieldGuide Same prestin, different assemblies

27 NCBI FieldGuide Does homology mean the common biological function? Not always; the existence of the common ancestor does not guarantee that some function won’t be lost or acquired after the divergence. An example: zeta-crystallin is a component of a transparent lens matrix of the vertebrate eye. Its homolog in E.coli is the metabolic enzyme quinone oxidoreductase.

28 NCBI FieldGuide BLAST VAST Entrez Text Sequence Structure

29 NCBI FieldGuide Structure similarity: No More BLASTing! Three-dimensional structures are most conserved during the evolution; One still can detect the existence of the common ancestor based on the structure similarity; Spatial similarity is not calculated the same way we do it for sequences

30 NCBI FieldGuide VAST: Structure Neighbors Vector Alignment Search Tool For each protein chain, locate SSEs (secondary structure elements), and represent them as individual vectors. 1 2 3 4 5 6 Human IL-4

31 NCBI FieldGuide VAST: Structure Neighbors

32 NCBI FieldGuide Structure Neighbors in Cn3D SH3SH2 C-Src kinase Human vs. Chicken

33 NCBI FieldGuide 3D Domain Neighbors Human C-Src Kinase (Tyr) vs. Chk1 kinase (Ser/Thr)

34 NCBI FieldGuide NCBI is changing From sequence data storage facility to one-stop shop with integrated databases of various kind. You can be part of the future – work with us! Your expertise and data are indispensable.

35 NCBI FieldGuide GenBank

36 NCBI FieldGuide Refseq

37 NCBI FieldGuide Entrez Gene

38 NCBI FieldGuide Homologene database

39 NCBI FieldGuide New generation of databases: an example

40 NCBI FieldGuide Protein interaction database: a seed for future precomputed resources

41 NCBI FieldGuide New databases: GenSAT

42 NCBI FieldGuide PubChem

43 NCBI FieldGuide Headache? Take Aspirin

44 NCBI FieldGuide Aspirin has 432 neighbors

45 NCBI FieldGuide Link to 3D protein structures

46 NCBI FieldGuide PubCrawler – Update Alerting Service for PubMed and GenBank

47 NCBI FieldGuide MedBlast: searching for articles related to a sequence.

48 NCBI FieldGuide For More Information… General Helpinfo@ncbi.nlm.nih.gov BLASTblast-help@ncbi.nlm.nih.gov E-mail addresses The (free!) NCBI Newsletter The NCBI Handbook http://www.ncbi.nih.gov/Education/index.html The NCBI Education Page http://www.ncbi.nih.gov/About/newsletter.html Follow the link from the NCBI Home Page


Download ppt "NCBI FieldGuide NCBI Molecular Biology Resources A Field Guide part 2 (post intermission) September 30, 2004 ICGEB."

Similar presentations


Ads by Google