©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It.

Slides:



Advertisements
Similar presentations
Proteins: Structure reflects function….. Fig. 5-UN1 Amino group Carboxyl group carbon.
Advertisements

The Chemical Nature of Enzyme Catalysis
Review.
Applications of knowledge discovery to molecular biology: Identifying structural regularities in proteins Shaobing Su Supervisor: Dr. Lawrence B. Holder.
• Exam II Tuesday 5/10 – Bring a scantron with you!
5’ C 3’ OH (free) 1’ C 5’ PO4 (free) DNA is a linear polymer of nucleotide subunits joined together by phosphodiester bonds - covalent bonds between.
Sequence analysis June 20, 2006 Learning objectives-Understand sliding window programs. Understand difference between identity, similarity and homology.
Protein-a chemical view A chain of amino acids folded in 3D Picture from on-line biology bookon-line biology book Peptide Protein backbone N / C terminal.
1 Levels of Protein Structure Primary to Quaternary Structure.
Amino Acids and Proteins 1.What is an amino acid / protein 2.Where are they found 3.Properties of the amino acids 4.How are proteins synthesized 1.Transcription.
Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Introduction to Bioinformatics Algorithms Sequence Alignment.
It og Sundhed Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU
Scoring Matrices June 19, 2008 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Alignment methods April 12, 2005 Return Homework (Ave. = 7.5)
It & Health 2009 Summary Thomas Nordahl Petersen.
Scoring Matrices June 22, 2006 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Molecular Techniques in Molecular Systematics. DNA-DNA hybridisation -Measures the degree of genetic similarity between pools of DNA sequences. -Normally.
Introduction to bioinformatics
©CMBI 2005 Why align sequences? Lots of sequences with unknown structure and function. A few sequences with known structure and function If they align,
Introduction to Bioinformatics Algorithms Sequence Alignment.
The relative orientation observed for  helices packed on ß sheets.
Proteins. The central role of proteins in the chemistry of life Proteins have a variety of functions. Structural proteins make up the physical structure.
How does DNA work? What is a gene?
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
Protein Synthesis. DNA RNA Proteins (Transcription) (Translation) DNA (genetic information stored in genes) RNA (working copies of genes) Proteins (functional.
PROTEIN SEQUENCE ANALYSIS. Need good protein sequence analysis tools because: As number of sequences increases, so gap between seq data and experimental.
CHAPTER 12 PROTEIN SYNTHESIS AND MUTATIONS -RNA -PROTEIN SYNTHESIS -MUTATIONS.
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
On the nature of cavities on protein surfaces: Application to the Identification of drug-binding sites Murad Nayal, Barry Honig Columbia University, NY.
BIOCHEMISTRY REVIEW Overview of Biomolecules Chapter 4 Protein Sequence.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
. Sequence Alignment. Sequences Much of bioinformatics involves sequences u DNA sequences u RNA sequences u Protein sequences We can think of these sequences.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Biology 4900 Biocomputing.
AMINO ACIDS.
Secondary structure prediction
Learning Targets “I Can...” -State how many nucleotides make up a codon. -Use a codon chart to find the corresponding amino acid.
©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning.
Protein Secondary Structure Prediction G P S Raghava.
A program of ITEST (Information Technology Experiences for Students and Teachers) funded by the National Science Foundation Background Session #3 DNA &
RNA 2 Translation.
1 Protein synthesis How a nucleotide sequence is translated into amino acids.
Amino Acids ©CMBI 2001 “ When you understand the amino acids, you understand everything ”
In-Class Assignment #1: Research CD2
Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning sequences.
Proteins.
Proteins Structure of proteins Proteins are made of C, H, O and nitrogen and may have sulfur. The monomers of proteins are amino acids An amino acid.
©2001 Timothy G. Standish Hebrews 12:28 28Wherefore we receiving a kingdom which cannot be moved, let us have grace, whereby we may serve God acceptably.
X-ray detection xray/facilities.html.
Hyperthermophile subtilases
©CMBI 2008 Databases Data must be in a certain format for software to recognize Every database can have its own format but some data elements are essential.
Alignment methods April 17, 2007 Quiz 1—Question on databases Learning objectives- Understand difference between identity, similarity and homology. Understand.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Protein Sequence Alignment Multiple Sequence Alignment
Supplementary Fig. 1 Relative concentrations of amino acids after transamination reaction catalyzed by PpACL1, α- ketoglutarate as the amino acceptor.
Stephen Taylor i-Biology.net Photo credit: Firefly with glow, by Terry Priest on Flickr (Creative Commons)
Bioinformatics A Summary seminar (with many hints for exam questions)
Fibrous Proteins Examples 1. a-keratins 2. Silk Fibroin 3. Collagen
Arginine, who are you? Why so important?. Release 2015_01 of 07-Jan-15 of UniProtKB/Swiss-Prot contains sequence entries, comprising
©CMBI 2001 Alignment Most alignment programs create an alignment that represents what happened during evolution at the DNA level. To carry over information.
Sequence similarity, BLAST alignments & multiple sequence alignments
Cathode (attracts (+) amino acids)
Figure 3.14A–D Protein structure (layer 1)
Aligning Sequences You have learned about: Data & databases Tools
The 20 amino acids.
Levels of Protein Structure
The 20 amino acids.
“When you understand the amino acids,
Presentation transcript:

©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It is a representation of a whole series of evolutionary events, which left traces in the sequences. Things that are more likely to happen during evolution should be most prominently observed in your alignment.

©CMBI 2008 Why align sequences? Lots of sequences with unknown structure and function. A few sequences with known structure and function If they align, they are likely to be similar If they are similar, then they very likely have same structure and/or function If one of them has known structure/function, then alignment to the other yields insight about how the structure or function works

©CMBI 2008 Sequence Alignment The purpose of a sequence alignment is to line up all residues in the sequence that were derived from the same residue position in the ancestral gene or protein gap = insertion or deletion A B B A

©CMBI 2008 Alignment To carry over information from a well studied protein sequence and its structure to a newly discovered protein sequence, we need an sequence alignment that represents the protein structures today, a structural alignment.

©CMBI 2008 Alignment The implicit meaning of placing amino acid residues below each other in the same column of a protein (multiple) sequence alignment is that they are at the equivalent position in the 3D structures of the corresponding proteins!! Two very simple examples: 1) the 3 active site residues H, D, S, of the serine protease we saw earlier 2) Cys-bridges: STCTKGALKLPVCRK TSCTEG--RLPGCKR

©CMBI 2008 Things one can do with a good alignment Carry information from a well studied to a less well studied protein. Such information can be: Phosphorylation sites Glycosylation sites Stabilizing mutations Membrane anchors Ion binding sites Ligand binding residues Cellular localization Typically what one finds in the FT records of Swissprot!

©CMBI 2008 Significance of alignment One can only transfer information if the similarity is significantly high between the two sequences. Schneider (group of Sander) determined the “threshold curve” for transferring structural information from one known protein structure to another protein sequence: If the sequences are > 80 aa long, then >25% sequence identity is enough to reliably transfer structural information. If the sequences are smaller in length, a higher percentage of identity is needed. Structure is much more conserved than sequence!

©CMBI 2008 Significance of alignment (2)

©CMBI 2008 Aligning sequences by hand Most information that enters the alignment procedure comes from the physico-chemical properties of the amino acids. Examples: which is the better alignment (left or right)? 1) CPISRTWASIFRCW CPISRTWASIFRCW CPISRT---LFRCW CPISRTL---FRCW 2)CPISRTRASEFRCW CPISRTRASEFRCW CPISRTK---FRCW CPISRT---KFRCW

©CMBI 2008 Aligning sequences by hand (2) Procedure of aligning depends on information available: 1)Use “only” identity of amino acid and its physico-chemical properties. This is more or less what alignment programs do. 2)Also use explicitly the secondary structure preference of the amino acids. 3)Use 3D information if one or more of the structures in the alignment are known. In most cases you will start with a alignment program (e.g. CLUSTAL) and then use your knowledge of the amino acids to improve the alignment, for instance by correcting the position of gaps.

©CMBI 2008 Helix

©CMBI total H H H H H ALA CYS ASP GLU PHE GLY HIS ILE LYS LEU MET ASN PRO GLN ARG SER THR VAL TRP TYR Helix preferences

©CMBI 2008 Helix preferences and alignment 1) S G V S P D Q L A A L K L I L E L A L K 2) G T S L E T A L L M Q I A Q K L I A G S G V S P D Q L A A L K L I L E L A L K G T S L E T A L L M Q I A Q K L I A G

©CMBI 2008 Helix preferences and alignment 1)S G V S P D Q L A A L K L I L E L A L K 2)G T S L E T A L L M Q I A Q K L I A G S G V S P D Q L A A L K L I L E L A L K G T S L E T A L L M Q I A Q K L I A G

©CMBI 2008 Helix preferences and alignment S G V S P D Q L A A L K L I L E L A L K G T S L E T A L L M Q I A Q K L I A G Final alignment: S G V S P D Q L A A L K L I L E L A L K - G T S L E T A L L M Q I A Q K L I A G

©CMBI 2008 A ‘real’ example of threading If you know that in structure 1 the Ala is pointing outside and the Ser is pointing inside: Where does the Arg in structure 2 go? (and what will CLUSTAL choose?) 1 2

©CMBI 2008 An even more real example AILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL B1VAL CYS ARG THR PRO GLU ALA ILE B2VAL CYS ARG THR PRO GLU ALA ILE

©CMBI 2008 An even more real example AILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL B1VAL CYS ARG THR PRO GLU ALA ILE B2VAL CYS ARG THR PRO GLU ALA ILE IVVIVV CCCCCC RRRRRR LT-LT- PP-PP- G- -G- - S-TS-T A-PA-P EEEEEE AAAAAA V I I

©CMBI 2008 Multiple sequence alignments can confirm or improve pair-wise sequence alignments: CWPVAASYGR CWPT---YGR CWPTA-SYGR CWPTLGLFGR Multiple sequence alignment

©CMBI 2008 Multiple sequence alignment Multiple sequence alignments can reveal structural information: ASCTRGCIKLPTCKKMGRCTGY STCTKGALKLPVCRKMGKSSAY ATSTHGCMKLPCSRRFGKCSSY TSCTEGCLRLPGCKRFGRCTSY TTCTKGLLKLPGCKRFGKSSAY ASSTKGCMKLPVSRRFGRCTAY

©CMBI 2008 Multiple sequence alignment Multiple sequence alignments can validate PROSITE search results. In N-{P}-[ST]-{P} the N is the glycosylation site. The chance of finding N-{P}-[ST]-{P} is rather high. So how can you be sure? Look at the multiple sequence alignment: ASLRNASTVVTIGDTITGNLTLASYHW GSIKNGSSVITLPGTMEGNLSTTTYHY ATLRNASTVMEINGTITGDLTLASFHW

©CMBI 2008 Summary Bioinformatics is all about obtaining information. Everything you can find in a database saves you doing experiments. Sequence alignment is important for carrying over information between ‘similar proteins’. To align sequences, you need to understand the amino acids.