Presentation is loading. Please wait.

Presentation is loading. Please wait.

Motif searching and protein structure prediction May 26, 2005 Hand in written assignments today! Learning objectives-Learn how to read structure information.

Similar presentations


Presentation on theme: "Motif searching and protein structure prediction May 26, 2005 Hand in written assignments today! Learning objectives-Learn how to read structure information."— Presentation transcript:

1 Motif searching and protein structure prediction May 26, 2005 Hand in written assignments today! Learning objectives-Learn how to read structure information from PDB record. Learn how the BLOCKs database is set up. Learn how to obtain information about a protein from a motif search. Learn how to display and manipulate protein structures with Deep View. Workshop-Get information about PTEN from BLIMPs agorithm. View hen lysozyme protein structure with Deep View.

2 Recognizing motifs in proteins. PROSITE is a database of protein families and domains. Most proteins can be grouped, on the basis of similarities in their sequences, into a limited number of families. Proteins or protein domains belonging to a particular family generally share functional attributes and are derived from a common ancestor.

3 PROSITE Database Contains 1087 different proteins and more than 1400 different patterns/motifs or signatures. A “signature” of a protein allows one to place a protein within a specific function based on structure and/or function. An example of an entry in PROSITE is: http://ca.expasy.org/cgi-bin/nicedoc.pl?PDOC50020

4 How are the profiles constructed in the first place? ALRDFATHDDVCGK.. SMTAEATHDSVACY.. ECDQAATHEAVTHR.. Sequences are aligned manually by expert in field. Then a profile is created. A-T-H-[DE]-X-V-X(4)-{ED} This pattern is translated as: Ala, Thr, His, [Asp or Glu], any, Val, any, any, any, any, any but Glu or Asp

5 Example of a PROSITE record ID ZINC_FINGER_C3HC4; PATTERN. PA C-x-H-x-[LIVMFY]-C-x(2)-C-[LIVMYA]

6 PROSITE Database Cont. 1 Families of proteins have a similar function: Enzyme activity Post-translational modification Domains-Ca 2+ binding domain DNA/RNA associated protein-Zn Finger Transport proteins-Albumin, transferrin Structural proteins-Fibronectin, collagen Receptors Peptide hormones

7 PROSITE Database Cont. 2 FindProfile is a program that searches the Prosite database. It uses dynamic programming to determine optimal alignments. If the alignment produces a high score, then the match is given. If a “hit” is obtained the program gives an output that shows the region of the query that contains the pattern and a reference to the 3-D structure database if available.

8 Example of output from FindProfile

9 Other algorithms that search for protein patterns. BLIMPs-A program that uses a query sequence to search the BLOCKs database. (written by Bill Alford) BLOCKs- database of multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins. The blocks that comprise the BLOCKs Database are made automatically by searching for the most highly conserved regions in groups of proteins documented in the Prosite Database. These blocks are then calibrated against the SWISS-PROT database to determine such a sequence would occur by chance.

10 Example of entry in BLOCKS database ID p99.1.2414; BLOCK AC BP02414A; distance from previous block=(29,215) DE PROTEIN ZINC-FINGER NUCLEAR FIN BL LCC; width=27; seqs=8; 99.5%=1080; strength=1292 RPT1_MOUSE|P15533 ( 101) EKLRLFCRKDMMVICWLCERSQEHRGH 62 Y129_HUMAN|Q14142 ( 30) RVAELFCRRCRRCVCALCPVLGAHRGH 100 RFP_HUMAN|P14373 ( 101) EPLKLYCEEDQMPICVVCDRSREHRGH 49 RFP_MOUSE|Q62158 ( 110) EPLKLYCEQDQMPICVVCDRSREHRDH 51 RO52_HUMAN|P19474 ( 97) ERLHLFCEKDGKALCWVCAQSRKHRDH 54 RO52_MOUSE|Q62191 ( 101) EKLHLFCEEDGQALCWVCAQSGKHRDH 52 TF1B_HUMAN|Q13263 ( 215) EPLVLFCESCDTLTCRDCQLNAHKDHQ 65 TF1B_MOUSE|Q62318 ( 216) EPLVLFCESCDTLTCRDCQLNAHKDHQ 65 Median of standardized scores for true positives Min and max dist to next block Family description Sequence weight (higher number is more distant) Start position of the sequence segment

11 How does BLIMPS search the BLOCKS database? It transforms each block into a position specific scoring matrix (PSSM). Each PSSM column corresponds to a block position and contains values based on frequency of occurrence at that position. A comparison is made between the query sequence and the BLOCK by sliding the PSSM over the query. For every alignment each sequence position receives a score. This sliding window procedure is repeated for all BLOCKS in the database.

12 Example of a pattern search using BLIMPS Note that any score less than 1000 may be due to chance. The score above 1000 is a score that is better than 95.5% of the true negatives.

13 Do workshop 17B

14 3D structure data The largest 3D structure database is the Protein Database It contains over 15,000 records Each record contains 3D coordinates for macromolecules 80% of the records were obtained from X-ray diffraction studies, 16% from NMR and the rest from other methods and theoretical calculations

15 ATOM 1 N ARG A 14 22.451 98.825 31.990 1.00 88.84 N ATOM 2 CA ARG A 14 21.713 100.102 31.828 1.00 90.39 C ATOM 3 C ARG A 14 22.583 101.018 30.979 1.00 89.86 C ATOM 4 O ARG A 14 22.105 101.989 30.391 1.00 89.82 O ATOM 5 CB ARG A 14 21.424 100.704 33.208 1.00 93.23 C ATOM 6 CG ARG A 14 20.465 101.880 33.215 1.00 95.72 C ATOM 7 CD ARG A 14 20.008 102.147 34.637 1.00 98.10 C ATOM 8 NE ARG A 14 18.999 103.196 34.718 1.00100.30 N ATOM 9 CZ ARG A 14 18.344 103.507 35.833 1.00100.29 C ATOM 10 NH1 ARG A 14 18.580 102.835 36.952 1.00 99.51 N ATOM 11 NH2 ARG A 14 17.441 104.479 35.827 1.00100.79 N Part of a record from the PDB

16 Protein structure viewers RasMol Deep View Cn3D WebLabViewer

17 Do workshop 18


Download ppt "Motif searching and protein structure prediction May 26, 2005 Hand in written assignments today! Learning objectives-Learn how to read structure information."

Similar presentations


Ads by Google