©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning.

Slides:



Advertisements
Similar presentations
Proteins: Structure reflects function….. Fig. 5-UN1 Amino group Carboxyl group carbon.
Advertisements

A Ala Alanine Alanine is a small, hydrophobic
Applications of knowledge discovery to molecular biology: Identifying structural regularities in proteins Shaobing Su Supervisor: Dr. Lawrence B. Holder.
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Sequence analysis June 20, 2006 Learning objectives-Understand sliding window programs. Understand difference between identity, similarity and homology.
1 Levels of Protein Structure Primary to Quaternary Structure.
Amino Acids and Proteins 1.What is an amino acid / protein 2.Where are they found 3.Properties of the amino acids 4.How are proteins synthesized 1.Transcription.
Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Introduction to Bioinformatics Algorithms Sequence Alignment.
©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It.
Sequence analysis June 19, 2007 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Sequence analysis June 17, 2003 Learning objectives-Review amino acids structures. Understand sliding window programs. Understand difference between identity,
Scoring Matrices June 19, 2008 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Alignment methods April 12, 2005 Return Homework (Ave. = 7.5)
It & Health 2009 Summary Thomas Nordahl Petersen.
Scoring Matrices June 22, 2006 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
Introduction to bioinformatics
ProteinStructuralDatabases. Proteins are built from amino-acids. Introduction H | NH2-c-CO2H | R.
©CMBI 2005 Why align sequences? Lots of sequences with unknown structure and function. A few sequences with known structure and function If they align,
Introduction to Bioinformatics Algorithms Sequence Alignment.
Single Motif Charles Yan Spring Single Motif.
The relative orientation observed for  helices packed on ß sheets.
Protein Structural Prediction. Protein Structure is Hierarchical.
Multiple sequence alignment
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
Protein Synthesis. DNA RNA Proteins (Transcription) (Translation) DNA (genetic information stored in genes) RNA (working copies of genes) Proteins (functional.
PROTEIN SEQUENCE ANALYSIS. Need good protein sequence analysis tools because: As number of sequences increases, so gap between seq data and experimental.
Protein Sequence Alignment and Database Searching.
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
On the nature of cavities on protein surfaces: Application to the Identification of drug-binding sites Murad Nayal, Barry Honig Columbia University, NY.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
. Sequence Alignment. Sequences Much of bioinformatics involves sequences u DNA sequences u RNA sequences u Protein sequences We can think of these sequences.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Secondary structure prediction
Learning Targets “I Can...” -State how many nucleotides make up a codon. -Use a codon chart to find the corresponding amino acid.
Protein Secondary Structure Prediction G P S Raghava.
Manually Adjusting Multiple Alignments Chris Wilton.
A program of ITEST (Information Technology Experiences for Students and Teachers) funded by the National Science Foundation Background Session #3 DNA &
1 Protein synthesis How a nucleotide sequence is translated into amino acids.
Amino Acids ©CMBI 2001 “ When you understand the amino acids, you understand everything ”
Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning sequences.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Proteins Structure of proteins Proteins are made of C, H, O and nitrogen and may have sulfur. The monomers of proteins are amino acids An amino acid.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Sequence Alignment.
©CMBI 2008 Databases Data must be in a certain format for software to recognize Every database can have its own format but some data elements are essential.
Step 3: Tools Database Searching
Alignment methods April 17, 2007 Quiz 1—Question on databases Learning objectives- Understand difference between identity, similarity and homology. Understand.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Protein Sequence Alignment Multiple Sequence Alignment
Bioinformatics A Summary seminar (with many hints for exam questions)
©CMBI 2009 Transfer of information The main topic of this course is transfer of information. In the protein world that leads to the questions: 1)From which.
©CMBI 2001 Alignment Most alignment programs create an alignment that represents what happened during evolution at the DNA level. To carry over information.
Arginine, who are you? Why so important?. Release 2015_01 of 07-Jan-15 of UniProtKB/Swiss-Prot contains sequence entries, comprising
©CMBI 2001 Alignment Most alignment programs create an alignment that represents what happened during evolution at the DNA level. To carry over information.
Sequence similarity, BLAST alignments & multiple sequence alignments
Protein Structure Visualisation
Have (y)Our Protein Explained
Cathode (attracts (+) amino acids)
Figure 3.14A–D Protein structure (layer 1)
Aligning Sequences You have learned about: Data & databases Tools
The 20 amino acids.
Levels of Protein Structure
The 20 amino acids.
“When you understand the amino acids,
Presentation transcript:

©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning sequences After this: You know how to perform structural alignments You are ready to apply this knowledge in your bioinformatics research project!

©CMBI 2009 Why align sequences? The problem: There a lots of sequences with unknown structure and/or function There are a few sequences with known structure and/or function Alignment can help: If sequences align well, they are likely to be similar If they are similar, then they very likely share structural and/or functional aspects If one of them has known structure/function, then alignment gives us insight in structural and/or functional aspects of the aligned sequence(s) TRANSFER OF INFORMATION!

©CMBI 2009 Sequence Alignment (1) A sequence alignment is a representation of a whole series of evolutionary events, which left traces in the sequences. Things that are more likely to happen during evolution should be most prominently observed in your alignment. The purpose of a sequence alignment is to line up all residues in the sequence that were derived from the same residue position in the ancestral gene or protein.

©CMBI 2009 Sequence Alignment (2) gap = insertion or deletion (indel) A B B A

©CMBI 2009 Structural alignment To carry over information from a well studied protein sequence and its structure to a newly discovered protein sequence, we need a sequence alignment that represents the protein structures today, a structural alignment. The implicit meaning of placing amino acid residues below each other in the same column of a protein (multiple) sequence alignment is that they are at the equivalent position in the 3D structures of the corresponding proteins!!

©CMBI 2009 Examples 1) the 3 active site residues H, D, S, of the serine protease we saw earlier 2) Cysteine bridges (disulfide bridges): STCTKGALKLPVCRK TSCTEG--RLPGCKR

©CMBI 2009 Transfer of information Such information can be: Phosphorylation sites Glycosylation sites Stabilizing mutations Membrane anchors Ion binding sites Ligand binding residues Cellular localization Typically what one finds in the feature (FT) records of Swissprot!

©CMBI 2009 Significance of alignment One can only transfer information if the similarity is significantly high between the two sequences. Schneider (group of Sander) determined the “threshold curve” for transferring structural information from one known protein structure to another protein sequence: If the sequences are > 80 aa long, then >25% sequence identity is enough to reliably transfer structural information. If the sequences are smaller in length, a higher percentage of identity is needed. Structure is much more conserved than sequence!

©CMBI 2009 Significance of alignment (2)

©CMBI 2009 Aligning sequences by hand Most information that enters the alignment procedure comes from the physico-chemical properties of the amino acids. Examples: which is the better alignment (left or right)? 1) CPISRTWASIFRCW CPISRTWASIFRCW CPISRT---LFRCW CPISRTL---FRCW 2)CPISRTRASEFRCW CPISRTRASEFRCW CPISRTK---FRCW CPISRT---KFRCW

©CMBI 2009 Aligning sequences by hand (2) Procedure of aligning depends on information available: 1)Use “only” identity of amino acid and its physico-chemical properties. This is more or less what alignment programs do. 2)Also use explicitly the secondary structure preference of the amino acids. Example: aligning 2 helices when sequence identity is low 3)Use 3D information if one or more of the structures in the alignment are known. In most cases you will start with a alignment program (e.g. CLUSTAL) and then use your knowledge of the amino acids to improve the alignment, for instance by correcting the position of gaps.

©CMBI 2009 Helix

©CMBI total H H H H H ASP Dataset of good helices from PDB files Count all Asp residues in & before helices Identify preferential positions for Asp residues Positional preferences in helices (1) Position 1 in helix

©CMBI 2009 Fill this table for all 20 amino acids Use this information when aligning helices who have low percentage of sequence identity total H H H H H ALA CYS ASP GLU (…) TRP TYR Positional preferences in helices (2) Position 1 in helix

©CMBI 2009 Aligning 2 sequences when sequence identity is low S G V S P D Q L A A L K L I L E L A L K G T S L E T A L L M Q I A Q K L I A G Helix 1: Helix 2:

©CMBI 2009 Aligning 2 helices when sequence identity is low S G V S P D Q L A A L K L I L E L A L K G T S L E T A L L M Q I A Q K L I A G Final alignment: S G V S P D Q L A A L K L I L E L A L K - G T S L E T A L L M Q I A Q K L I A G

Protein threading The word threading implies that one drags the sequence (ACDEFG...) step by step through each location on the template ©CMBI 2009

Use of 3D structure info (1) If you know that in structure 1 the Ala is pointing outside and the Ser is pointing inside: Where does the Arg in structure 2 go? (and what will CLUSTAL choose?) 1 2

©CMBI 2009 Use of 3D structure info (2) AILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL B1VAL CYS ARG THR PRO GLU ALA ILE B2VAL CYS ARG THR PRO GLU ALA ILE

©CMBI 2009 An even more real example AILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL B1VAL CYS ARG THR PRO GLU ALA ILE B2VAL CYS ARG THR PRO GLU ALA ILE IVVIVV CCCCCC RRRRRR LT-LT- PP-PP- G- -G- - S-TS-T A-PA-P EEEEEE AAAAAA V I I

©CMBI )Are crucial for being able to transfer information 2)Can be optimized by using secondary structure preferences (e.g. helix positioning) 3)Can be optimized by using 3D structure info We have seen that alignments ….

©CMBI 2009 If we have more than two related sequences aligned, the alignment is called a multiple sequence alignment (MSA) MSA’s can: 1)reveal structural information (e.g. cys-bridges, calcium binding sites) 2)validate PROSITE search results 3)confirm or improve pair-wise sequence alignments (Course Day 6) Multiple sequence alignments

©CMBI 2009 MSA and cysteine bridges Multiple sequence alignments can reveal structural information: ASCTRGCIKLPTCKKMGRCTGY STCTKGALKLPVCRKMGKSSAY ATSTHGCMKLPCSRRFGKCSSY TSCTEGCLRLPGCKRFGRCTSY TTCTKGLLKLPGCKRFGKSSAY ASSTKGCMKLPVSRRFGRCTAY

©CMBI 2009 MSA to validate PROSITE results (1) You have seen the PROSITE database of protein patterns before. PROSITE glycosylation pattern: N-{P}-[ST]-{P} where N is the glycosylation site. PROSITE Syntax: A-[BC]-X-D(2,5)-{EFG}-H Means: A B or C Anything 2-5 D’s Not E,F or G H

©CMBI 2009 MSA to validate PROSITE results (2) The chance of finding N-{P}-[ST]-{P} is rather high. So how can you be sure? Look at the MSA of related sequences! ASLRNASTVVTIGDTITGNLTLASYHW GSIKNGSSVITLPGTMEGNLSTTTYHY ATLRNASTVMEINGTITGDLTLASFHW

©CMBI 2009 MSA to validate PROSITE results (3) The chance of finding N-{P}-[ST]-{P} is rather high. So how can you be sure? Look at the MSA of related sequences! ASLRNASTVVTIGDTITGNLTLASYHW GSIKNGSSVITLPGTMEGNLSTTTYHY ATLRNASTVMEINGTITGDLTLASFHW

©CMBI 2009 What you have learned today A good sequence alignment is necessary to carrying over information between proteins. Putting amino acids below each other in a sequence alignment implies that you predict that they are on equivalent positions in both proteins. If the aligned sequences are > 80 aa long, then >25% sequence identity is enough to reliably transfer structural information.

©CMBI 2009 You are now capable of… Applying these lessons to the practical exercises Performing your own bioinformatics research project! Take home lesson: Please remember to always use all structural information available to you to optimize a sequence alignment. This can be real 3D data, but can also be “just” your own knowledge about the properties and preferences of the amino acids.

©CMBI 2009 CWPVAASYGR CWPT---YGR CWPTA-SYGR CWPTLGLFGR MSA for improvement of pair-wise alignments