Motif searching and protein structure prediction May 26, 2005 Hand in written assignments today! Learning objectives-Learn how to read structure information.

Slides:



Advertisements
Similar presentations
Introduction to Bioinformatics
Advertisements

Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
© Wiley Publishing All Rights Reserved. Analyzing Protein Sequences.
Tertiary protein structure viewing and prediction July 1, 2009 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Sequence analysis June 20, 2006 Learning objectives-Understand sliding window programs. Understand difference between identity, similarity and homology.
Protein structure prediction Scoring matrices workshop review Learning objectives-Understand the basis of secondary structure prediction programs. Become.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Tertiary protein structure viewing and prediction July 5, 2006 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Fa05CSE 182 CSE182-L5: Position specific scoring matrices Regular Expression Matching Protein Domains.
Matching Problems in Bioinformatics Charles Yan Fall 2008.
Scoring Matrices June 19, 2008 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
Corrections. N-linked glycosylation (GlcNac): Look at the Swiss-Prot annotation (in a random ‘glycosylated’ entry)
Scoring Matrices June 22, 2006 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Introduction to bioinformatics
Tertiary protein structure modelling May 31, 2005 Graded papers will handed back Thursday Quiz#4 today Learning objectives- Continue to learn how to manipulate.
Pattern databases in protein analysis Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP.
Protein structure prediction May 30, 2002 Quiz#4 on June 4 Learning objectives-Understand difference between primary secondary and tertiary structure.
Similar Sequence Similar Function Charles Yan Spring 2006.
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
Protein Structure July 2, 2006 Learning objectives-Understand the basis of the secondary structure prediction program- Psi-PRED. Introduce the concept.
Single Motif Charles Yan Spring Single Motif.
Protein structure prediction May 24, 2005 Return of Quiz#3 Writing assignments-please hand in. Learning objectives-Understand the basis of secondary structure.
Remote Homology detection: A motif based approach CS 6890: Bioinformatics - Dr. Yan CS 6890: Bioinformatics - Dr. Yan Swati Adhau Swati Adhau 04/14/06.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
© Wiley Publishing All Rights Reserved. Searching Sequence Databases.
Protein structure prediction
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Sequence Alignment Techniques. In this presentation…… Part 1 – Searching for Sequence Similarity Part 2 – Multiple Sequence Alignment.
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Local alignment, BLAST and Psi-BLAST October 25, 2012 Local alignment Quiz 2 Learning objectives-Learn the basics of BLAST and Psi-BLAST Workshop-Use BLAST2.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
Secondary structure prediction
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Module 3 Protein Structure Database/Structure Analysis Learning objectives Understand how information is stored in PDB Learn how to read a PDB flat file.
Copyright OpenHelix. No use or reproduction without express written consent1.
Protein Domain Database
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
©CMBI 2008 Databases Data must be in a certain format for software to recognize Every database can have its own format but some data elements are essential.
Step 3: Tools Database Searching
Computational Biology, Part 3 Representing and Finding Sequence Features using Frequency Matrices Robert F. Murphy Copyright  All rights reserved.
Protein domain/family db Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e.
InterPro Sandra Orchard.
Protein structure prediction June 27, 2003 Learning objectives-Understand the basis of secondary structure prediction programs. Become familiar with the.
Practice -- BLAST search in your own computer 1.Download data file from the course web page, or Ensemble. Save in the blast\dbs folder. 2.Start a CMD window,
BLAST: Database Search Heuristic Algorithm Some slides courtesy of Dr. Pevsner and Dr. Dirk Husmeier.
Protein families, domains and motifs in functional prediction May 31, 2016.
Sequence similarity, BLAST alignments & multiple sequence alignments
Protein families, domains and motifs in functional prediction
Protein Families, Motifs & Domains.
Genome Center of Wisconsin, UW-Madison
Large-Scale Genomic Surveys
Sequence Based Analysis Tutorial
Presentation transcript:

Motif searching and protein structure prediction May 26, 2005 Hand in written assignments today! Learning objectives-Learn how to read structure information from PDB record. Learn how the BLOCKs database is set up. Learn how to obtain information about a protein from a motif search. Learn how to display and manipulate protein structures with Deep View. Workshop-Get information about PTEN from BLIMPs agorithm. View hen lysozyme protein structure with Deep View.

Recognizing motifs in proteins. PROSITE is a database of protein families and domains. Most proteins can be grouped, on the basis of similarities in their sequences, into a limited number of families. Proteins or protein domains belonging to a particular family generally share functional attributes and are derived from a common ancestor.

PROSITE Database Contains 1087 different proteins and more than 1400 different patterns/motifs or signatures. A “signature” of a protein allows one to place a protein within a specific function based on structure and/or function. An example of an entry in PROSITE is:

How are the profiles constructed in the first place? ALRDFATHDDVCGK.. SMTAEATHDSVACY.. ECDQAATHEAVTHR.. Sequences are aligned manually by expert in field. Then a profile is created. A-T-H-[DE]-X-V-X(4)-{ED} This pattern is translated as: Ala, Thr, His, [Asp or Glu], any, Val, any, any, any, any, any but Glu or Asp

Example of a PROSITE record ID ZINC_FINGER_C3HC4; PATTERN. PA C-x-H-x-[LIVMFY]-C-x(2)-C-[LIVMYA]

PROSITE Database Cont. 1 Families of proteins have a similar function: Enzyme activity Post-translational modification Domains-Ca 2+ binding domain DNA/RNA associated protein-Zn Finger Transport proteins-Albumin, transferrin Structural proteins-Fibronectin, collagen Receptors Peptide hormones

PROSITE Database Cont. 2 FindProfile is a program that searches the Prosite database. It uses dynamic programming to determine optimal alignments. If the alignment produces a high score, then the match is given. If a “hit” is obtained the program gives an output that shows the region of the query that contains the pattern and a reference to the 3-D structure database if available.

Example of output from FindProfile

Other algorithms that search for protein patterns. BLIMPs-A program that uses a query sequence to search the BLOCKs database. (written by Bill Alford) BLOCKs- database of multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins. The blocks that comprise the BLOCKs Database are made automatically by searching for the most highly conserved regions in groups of proteins documented in the Prosite Database. These blocks are then calibrated against the SWISS-PROT database to determine such a sequence would occur by chance.

Example of entry in BLOCKS database ID p ; BLOCK AC BP02414A; distance from previous block=(29,215) DE PROTEIN ZINC-FINGER NUCLEAR FIN BL LCC; width=27; seqs=8; 99.5%=1080; strength=1292 RPT1_MOUSE|P15533 ( 101) EKLRLFCRKDMMVICWLCERSQEHRGH 62 Y129_HUMAN|Q14142 ( 30) RVAELFCRRCRRCVCALCPVLGAHRGH 100 RFP_HUMAN|P14373 ( 101) EPLKLYCEEDQMPICVVCDRSREHRGH 49 RFP_MOUSE|Q62158 ( 110) EPLKLYCEQDQMPICVVCDRSREHRDH 51 RO52_HUMAN|P19474 ( 97) ERLHLFCEKDGKALCWVCAQSRKHRDH 54 RO52_MOUSE|Q62191 ( 101) EKLHLFCEEDGQALCWVCAQSGKHRDH 52 TF1B_HUMAN|Q13263 ( 215) EPLVLFCESCDTLTCRDCQLNAHKDHQ 65 TF1B_MOUSE|Q62318 ( 216) EPLVLFCESCDTLTCRDCQLNAHKDHQ 65 Median of standardized scores for true positives Min and max dist to next block Family description Sequence weight (higher number is more distant) Start position of the sequence segment

How does BLIMPS search the BLOCKS database? It transforms each block into a position specific scoring matrix (PSSM). Each PSSM column corresponds to a block position and contains values based on frequency of occurrence at that position. A comparison is made between the query sequence and the BLOCK by sliding the PSSM over the query. For every alignment each sequence position receives a score. This sliding window procedure is repeated for all BLOCKS in the database.

Example of a pattern search using BLIMPS Note that any score less than 1000 may be due to chance. The score above 1000 is a score that is better than 95.5% of the true negatives.

Do workshop 17B

3D structure data The largest 3D structure database is the Protein Database It contains over 15,000 records Each record contains 3D coordinates for macromolecules 80% of the records were obtained from X-ray diffraction studies, 16% from NMR and the rest from other methods and theoretical calculations

ATOM 1 N ARG A N ATOM 2 CA ARG A C ATOM 3 C ARG A C ATOM 4 O ARG A O ATOM 5 CB ARG A C ATOM 6 CG ARG A C ATOM 7 CD ARG A C ATOM 8 NE ARG A N ATOM 9 CZ ARG A C ATOM 10 NH1 ARG A N ATOM 11 NH2 ARG A N Part of a record from the PDB

Protein structure viewers RasMol Deep View Cn3D WebLabViewer

Do workshop 18