Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning sequences.

Slides:



Advertisements
Similar presentations
Applications of Homology Modeling
Advertisements

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Applications of Homology Modeling Hanka Venselaar.
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
1 Levels of Protein Structure Primary to Quaternary Structure.
Introduction to Bioinformatics Algorithms Sequence Alignment.
©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It.
Scoring Matrices June 19, 2008 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Scoring Matrices June 22, 2006 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
©CMBI 2002 Homology modelling ? X-ray ? NMR ? Intro Proteins Modelling 8 Steps Detect Threading Alignment Template Side chain Indels Optimize Validate.
The Protein Data Bank (PDB)
Introduction to bioinformatics
Determination of Sites of CYP1B1 Mutations in Aligned Sequences of Cytochrome P450 Family Members and 3D-Structural Model by: Betsabeh Khoramian Tusi.
©CMBI 2005 Why align sequences? Lots of sequences with unknown structure and function. A few sequences with known structure and function If they align,
Applications of Homology Modeling Hanka Venselaar.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Applications of Homology Modeling Hanka Venselaar.
Single Motif Charles Yan Spring Single Motif.
Homology Modeling Seminar produced by Hanka Venselaar.
Multiple sequence alignment
Protein Sequence Alignment and Database Searching.
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
. Sequence Alignment. Sequences Much of bioinformatics involves sequences u DNA sequences u RNA sequences u Protein sequences We can think of these sequences.
Secondary structure prediction
Homology modelling of serine proteases Roland J. Siezen Industrial examples.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Protein Evolution: Introduction to Protein Structure and Function protEvolEllsEmblSept2009 Please open the.
es/by-sa/2.0/. From Protein Sequence to Protein Properties Prof:Rui Alves Dept Ciencies.
©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning.
Manually Adjusting Multiple Alignments Chris Wilton.
COT 6930 HPC and Bioinformatics Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
1 Protein synthesis How a nucleotide sequence is translated into amino acids.
Amino Acids ©CMBI 2001 “ When you understand the amino acids, you understand everything ”
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
3DM: Protein Super-family Platforms 3DM Protein super-family data integration Tom van den Bergh Bio-Prodict.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Sequence Alignment.
Hyperthermophile subtilases
©CMBI 2008 Databases Data must be in a certain format for software to recognize Every database can have its own format but some data elements are essential.
Step 3: Tools Database Searching
Alignment methods April 17, 2007 Quiz 1—Question on databases Learning objectives- Understand difference between identity, similarity and homology. Understand.
Applications of Homology Modeling Hanka Venselaar.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
InterPro Sandra Orchard.
Bioinformatics A Summary seminar (with many hints for exam questions)
©CMBI 2009 Transfer of information The main topic of this course is transfer of information. In the protein world that leads to the questions: 1)From which.
©CMBI 2001 Alignment Most alignment programs create an alignment that represents what happened during evolution at the DNA level. To carry over information.
©CMBI 2001 Alignment Most alignment programs create an alignment that represents what happened during evolution at the DNA level. To carry over information.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
Have (y)Our Protein Explained
There are four levels of structure in proteins
Aligning Sequences You have learned about: Data & databases Tools
Entry Task Apply: Suppose a template strand of DNA had the following sequence: DNA: T A C G G A T A A C T A C C G G G T A T T C A A What would.
Prediction of protein structure
The 20 amino acids.
Protein structure prediction.
The 20 amino acids.
Two Structures of Cyclophilin 40
Hideki Kusunoki, Ruby I MacDonald, Alfonso Mondragón  Structure 
“When you understand the amino acids,
Three protein kinase structures define a common motif
Looking at periodicity in protein sequence and structure
NMR ? X-ray ? Homology modelling ? Intro Proteins Modelling 8 Steps
Presentation transcript:

Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning sequences After this: You know how to perform structural alignments You are ready to apply this knowledge in your bioinformatics research project!

©CMBI 2011 Why align sequences? The problem: There a lots of sequences with unknown structure and/or function There are a few sequences with known structure and/or function Alignment can help: If one of them has known structure/function, then alignment gives us insight in structural and/or functional aspects of the aligned sequence(s) Transfer of information!

©CMBI 2011 Sequence Alignment (1) A sequence alignment is a representation of a whole series of evolutionary events, which left traces in the sequences. The purpose of a sequence alignment is to line up all residues in the sequence that were derived from the same residue position in the ancestral gene or protein.

©CMBI 2009 Sequence Alignment (2) gap = insertion or deletion (indel) A B B A

©CMBI 2011 Structural alignment To carry over structural information, we need a structural alignment. The implicit meaning of placing amino acid residues below each other in the same column of a protein (multiple) sequence alignment is that they are at the equivalent position in the 3D structures of the corresponding proteins!!

©CMBI 2009 Examples 1) the 3 active site residues H, D, S, of the serine protease we saw earlier 2) Cysteine bridges (disulfide bridges): STCTKGALKLPVCRK TSCTEG--RLPGCKR

©CMBI 2009 Transfer of information Such information can be: Phosphorylation sites Glycosylation sites Stabilizing mutations Membrane anchors Ion binding sites Ligand binding residues Cellular localization Typically what one finds in the feature (FT) records of Swissprot!

©CMBI 2011 Significance of alignment One can only transfer information if the similarity is significantly high between the two sequences. The “threshold curve” for transferring structural information from one known protein structure to another protein sequence: If the sequences are > 80 aa long, then >25% sequence identity is enough to reliably transfer structural information. Structure is much more conserved than sequence!

©CMBI 2009 Significance of alignment (2)

©CMBI 2009 Aligning sequences by hand Examples: which is the better alignment (left or right)? 1) CPISRTWASIFRCW CPISRTWASIFRCW CPISRT---LFRCW CPISRTL---FRCW 2)CPISRTRASEFRCW CPISRTRASEFRCW CPISRTK---FRCW CPISRT---KFRCW

©CMBI 2011 Aligning sequences by hand (2) Procedure of aligning depends on information available: 1)In most cases you will start with a alignment program (e.g. CLUSTAL) 2)Then use your knowledge of the amino acids to improve the alignment, for instance by correcting the position of gaps. 3)Also use explicitly the secondary structure preference of the amino acids, especially for N-termini of helices and beta-turns. 4)Use 3D information if one or more of the structures in the alignment are known.

©CMBI 2009 Helix

©CMBI total H H H H H ASP Dataset of good helices from PDB files Count all Asp residues in & before helices Identify preferential positions for Asp residues Positional preferences in helices (1) Position 1 in helix

©CMBI 2009 Aligning 2 sequences when sequence identity is low S G V S P D Q L A A L K L I L E L A L K G T S L E T A L L M Q I A Q K L I A G Helix 1: Helix 2:

©CMBI 2009 Fill this table for all 20 amino acids Use this information when aligning helices who have low percentage of sequence identity total H H H H H ALA CYS ASP GLU (…) TRP TYR Positional preferences in helices (2) Position 1 in helix

Protein threading The word threading implies that one drags the sequence (ACDEFG...) step by step through each location on the template ©CMBI 2009

Aligning 2 helices when sequence identity is low S G V S P D Q L A A L K L I L E L A L K G T S L E T A L L M Q I A Q K L I A G

©CMBI 2009 Aligning 2 helices when sequence identity is low S G V S P D Q L A A L K L I L E L A L K G T S L E T A L L M Q I A Q K L I A G Final alignment: S G V S P D Q L A A L K L I L E L A L K - G T S L E T A L L M Q I A Q K L I A G

©CMBI 2009 Use of 3D structure info (1) If you know that in structure 1 the Ala is pointing outside and the Ser is pointing inside: Where does the Arg in structure 2 go? (and what will CLUSTAL choose?) A B

Use of 3D-structure info (2) Sequence A: FDICRLPGSAEAV Sequence B1: FNVCRMP---EAI Sequence B2: FNVCR---MPEAI S G P L A E R C IV C R M P E V C R M P E  Correct alignment F-D- -A-V

©CMBI 2011 What you have learned today A good sequence alignment is necessary to carrying over information between proteins. Putting amino acids below each other in a sequence alignment implies that you predict that they are on equivalent positions in both proteins. Alignments can be optimized by using secondary structure preferences (especially for helix positioning and prediction of beta-turns) 3D structure info If the aligned sequences are > 80 aa long, then >25% sequence identity is enough to reliably transfer structural information.

©CMBI 2011 Alignment videos Swift.cmbi.ru.nl/teach/B1M => Seminars => Link to Aligning video page

©CMBI 2009 You are ready to… Applying these lessons to the practical exercises Performing your own bioinformatics research project! Take home lesson: Please remember to always use all structural information available to you to optimize a sequence alignment. This can be real 3D data, but can also be “just” your own knowledge about the properties and preferences of the amino acids.