©CMBI 2008 Databases Data must be in a certain format for software to recognize Every database can have its own format but some data elements are essential for every database: 1. Unique identifier, or accession code 2. Name of depositor 3. Literature references 4. Deposition date 5. The real data
©CMBI 2008 Quality of Data SwissProt Data is only entered by annotation experts EMBL, PDB “Everybody” can submit data No human intervention when submitted; some automatic checks
©CMBI 2008 SwissProt database Database of protein sequences entries (Oct 2008) Ca. 200 Annotation experts worldwide Keyword-organised flatfile Obligatory deposit of in SwissProt before publication Presently, databases are being merged into UniProt.
©CMBI 2008 Important records in SwissProt (1) ID HBA_HUMAN Reviewed; 142 AA. AC P69905; P01922; Q3MIF5; Q96KF1; Q9NYR7; DT 21-JUL-1986, integrated into UniProtKB/Swiss-Prot. DT 23-JAN-2007, sequence version 2. DT 23-SEP-2008, entry version 63. DE RecName: Full=Hemoglobin subunit alpha; DE AltName: Full=Hemoglobin alpha chain; DE AltName: Full=Alpha-globin;
©CMBI 2008 Important records in SwissProt (2) Cross references section: Hyperlinks to all entries in other databases which are relevant for the protein sequence HBA_HUMAN
©CMBI 2008 Important records in SwissProt (3) Features section: post-translational modifications, signal peptides, binding sites, enzyme active sites, domains, disulfide bridges, local secondary structure, sequence conflicts between references etc. etc.
©CMBI 2008 And finally, the amino acid sequence!
©CMBI 2008 Protein Data Bank (PDB) Databank for macromolecular structure data (3-dimensional coordinates). Started ca. 30 years ago (on punched cards!) Obligatory deposit of coordinates in the PDB before publication ~ entries (April 2008) ( ~2500 “unique” structures) PDB file is a keyword-organised flat-file (80 column) 1) human readable 2) every line starts with a keyword (3-6 letters) 3) platform independent
©CMBI 2008 PDB important records (1) PDB nomenclature Filename= accession number= PDB Code Filename is 4 positions (often 1 digit & 3 letters, e.g. 1CRN) HEADER describes molecule & gives deposition date HEADER PLANT SEED PROTEIN 30-APR-81 1CRN CMPND name of molecule COMPND CRAMBIN SOURCE organism SOURCE ABYSSINIAN CABBAGE (CRAMBE ABYSSINICA) SEED
©CMBI 2008 PDB important records (2) SEQRES Sequence of protein; be aware: Not always all 3d-coordinates are present for all the amino acids in SEQRES!! SEQRES 1 46 THR THR CYS CYS PRO SER ILE VAL ALA ARG SER ASN PHE 1CRN 51 SEQRES 2 46 ASN VAL CYS ARG LEU PRO GLY THR PRO GLU ALA ILE CYS 1CRN 52 SEQRES 3 46 ALA THR TYR THR GLY CYS ILE ILE ILE PRO GLY ALA THR 1CRN 53 SEQRES 4 46 CYS PRO GLY ASP TYR ALA ASN 1CRN 54 SSBOND disulfide bridges SSBOND 1 CYS 3 CYS 40 SSBOND 2 CYS 4 CYS 32
©CMBI 2008 PDB important records (3) and at the end of the PDB file the “real” data: ATOM one line for each atom with its unique name and its x,y,z coordinates ATOM 1 N THR CRN 70 ATOM 2 CA THR CRN 71 ATOM 3 C THR CRN 72 ATOM 4 O THR CRN 73 ATOM 5 CB THR CRN 74 ATOM 6 OG1 THR CRN 75 ATOM 7 CG2 THR CRN 76 ATOM 8 N THR CRN 77 ATOM 9 CA THR CRN 78 ATOM 10 C THR CRN 79 ATOM 11 O THR CRN 80
©CMBI 2008 MRS home page
©CMBI 2008 MRS Search Steps Select database(s) of choice Formulate your query Hit “Search” The result is a “query set” or “hitlist” Analyze the results
©CMBI 2008 Simply type your keywords in the keyword field and choose SEARCH. If you know the fields of the database you are searching in you can specify your query further But think about your query first!! MRS Search options