Proteins dictate function in an organism:

Slides:



Advertisements
Similar presentations
Proteins: Structure reflects function….. Fig. 5-UN1 Amino group Carboxyl group carbon.
Advertisements

The 20 amino acids. AAlaAlanine Small Hydrophobic Helix: ++ Strand: – Turn: – – Mutate to Ala if you have to mutate but have no clue to which residue.
Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Measuring the degree of similarity: PAM and blosum Matrix
Sequence analysis June 20, 2006 Learning objectives-Understand sliding window programs. Understand difference between identity, similarity and homology.
1 Levels of Protein Structure Primary to Quaternary Structure.
Amino Acids and Proteins 1.What is an amino acid / protein 2.Where are they found 3.Properties of the amino acids 4.How are proteins synthesized 1.Transcription.
Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Sequence Alignment.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Bioinformatics and Phylogenetic Analysis
©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It.
Sequence analysis June 19, 2007 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Sequence analysis June 17, 2003 Learning objectives-Review amino acids structures. Understand sliding window programs. Understand difference between identity,
Scoring Matrices June 19, 2008 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
It & Health 2009 Summary Thomas Nordahl Petersen.
Scoring Matrices June 22, 2006 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Introduction to bioinformatics
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
©CMBI 2005 Why align sequences? Lots of sequences with unknown structure and function. A few sequences with known structure and function If they align,
Protein: Linear chain of amino acids called residues (4 in this toy protein) Ser Trp Leu O N N N N O O C C C C O O CαCα CαCα CαCα CαCα Lys H H H H H The.
Similar Sequence Similar Function Charles Yan Spring 2006.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.
Marlou Snelleman 2012 Proteins and amino acids. Overview Proteins Primary structure Secondary structure Tertiary structure Quaternary structure Amino.
Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
1 BLAST: Basic Local Alignment Search Tool Jonathan M. Urbach Bioinformatics Group Department of Molecular Biology.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
An Introduction to Bioinformatics
Introduction to Bioinformatics Algorithms Sequence Alignment.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment (II) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 27, 2005 ChengXiang Zhai Department of Computer Science University.
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Biology 4900 Biocomputing.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
Secondary structure prediction
Comp. Genomics Recitation 3 The statistics of database searching.
Construction of Substitution Matrices
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
A program of ITEST (Information Technology Experiences for Students and Teachers) funded by the National Science Foundation Background Session #3 DNA &
Basic Local Alignment Search Tool BLAST Why Use BLAST?
1 Protein synthesis How a nucleotide sequence is translated into amino acids.
In-Class Assignment #1: Research CD2
Marlou Snelleman 2011 Proteins and amino acids. Overview Proteins Primary structure Secondary structure Tertiary structure Quaternary structure Amino.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Construction of Substitution matrices
Step 3: Tools Database Searching
Alignment methods April 17, 2007 Quiz 1—Question on databases Learning objectives- Understand difference between identity, similarity and homology. Understand.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Protein Sequence Alignment Multiple Sequence Alignment
BLAST: Database Search Heuristic Algorithm Some slides courtesy of Dr. Pevsner and Dr. Dirk Husmeier.
Using BLAST To Teach ‘E-value-tionary’ Concepts Cheryl A. Kerfeld 1, 2 and Kathleen M. Scott 3 1.Department of Energy-Joint Genome Institute, Walnut Creek,
Prepared By: Syed Khaleelulla Hussaini. Outline Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Sequence similarity, BLAST alignments & multiple sequence alignments
Introduction to Bioinformatics Resources for DNA Barcoding
Protein Sequence Alignments
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool
BLAST Slides adapted & edited from a set by
Sequence alignment, E-value & Extreme value distribution
BLAST Slides adapted & edited from a set by
Presentation transcript:

Proteins dictate function in an organism: What happens as proteins evolve? Budding yeast Fission yeast Saccharomyces pombe (sugar fungus) Schizosaccharomyces pombe In our project, we'll be determining if functional homologs of S. cerevisiae Met proteins are present in S. pombe

What organism should the class study after we finish S. pombe genes? This semester: Five genes from S. pombe will be transferred to S. cerevisiae What organism should the class study after we finish S. pombe genes? A look at the molecular phylogeny should help

Are there any correlations between the kind of amino acid substitutions observed over evolution with their chemistry? How are bioinformatics tools used to analyze the conservation of protein sequences? How can I identify regions of proteins that are most strongly conserved and most likely to be important for function?

Met16p from S. cerevisiae complexed with PAP (2OQ2) For proteins to maintain their function, they don't tolerate drastic changes to their shapes Amino acid substitutions that significantly perturb the structure of a protein or alter its chemistry can cause the protein to lose function Met16p from S. cerevisiae complexed with PAP (2OQ2)

Recall that the final folded form of a protein is determined by its primary sequence R (“reactive”) groups form a variety of bonds important for structure and function

Custom view of Met16p highlights Cys Cysteine is one of the most evolutionarily constrained amino acids Cys-254 is in close proximity to the end-product, PAP, suggesting that it plays a role in catalysis Custom view of Met16p highlights Cys Protein: backbone view PAP: ball-and-stick Cysteine: space-fill

Charged Acidic Basic Polar Aromatic Small Neutral Hydrophobic Glu (E) Amino acids can be grouped according to the chemistry and size of their R groups Glu (E) Asp (D) Acidic Arg (R) Lys (K) His (H) Basic Charged Asn (N) Gln (Q) Polar Thr (T) Gly (G) Cys (C) Ser (S) Ala (A) Small Neutral Tyr (Y) Aromatic Hydrophobic Val (V) Ile (I) Leu (L) Met (M) Pro (P) Trp (W) Phe (F)

Most amino acids are abbreviated by their first letter: (Abundant, hydrophobic ones get preference) A Ala alanine C Cys cysteine G Gly glycine H His histidine I Ile isoleucine L Leu leucine M Met methionine P Pro proline S Ser serine T Thr threonine V Val valine Phonetic abbreviations: F Phe phenylalanine R Arg arginine Oddballs: (Charged, aromatic, some polar) D Asp aspartic acid E Glu glutamic acid K Lys lysine N Asn asparagine Q Gln glutamine W Trp tryptophan Y Tyr tyrosine The one letter code needs to be part of a 21st century biologist’s vocabulary

Matrix assigns scores for substitutions: Studying the evolutionary conservation of amino acids in sequences provides a sense of the importance of the amino acid to protein function BLOSUM62 (BLOck SUbstitution Matrix) was based on statistical alignments seen in proteins that are at least 62% identical Matrix assigns scores for substitutions: Maximum score for the same amino acid (completely conserved, possibly essential) Positive scores are awarded for common amino acid substitutions, in decreasing order, based on their occurrence in proteins Negative scores are unlikely substitutions Note the high score for Cys! The biochemical connection: Higher scores are frequently correlated with conservative amino acid substitutions based on amino acids chemistry and size

Are there any correlations between the kind of amino acid substitutions observed over evolution with their biochemistry? How are bioinformatics tools used to analyze the conservation of protein sequences? How can I identify regions of proteins that are most strongly conserved and most likely to be important for function?

BLAST BLAST is an acronym for Basic Local Alignment Search Tool, a computer algorithm for finding homologous sequences in databases BLASTN compares nucleic acid sequences BLASTP compares protein sequences BLOSUM62 is the default scoring matrix for BLASTP

Qi and Qj are probabilities of finding i and j randomly in a sequence BLOSUM 62 scores relate the frequency of a particular substitution to the probability that it occurs by chance in proteins that are at least 62% identical throughout their length Score = k log10 Pij Qi * Qj ( ) Scaling factor used to produce integral values Pij is the observed frequency of two amino acids (i and j) replacing each other in homologous sequences Qi and Qj are probabilities of finding i and j randomly in a sequence

Positive and negative scores suggest amino acid changes have been selected for (positive) or against (negative) during evolution Magnitude of the score suggests the strength of the selection Score of zero suggests that a particular substitution can be explained by chance alone

BLASTP begins with a query sequence (e.g. your MET sequence) The query sequence is broken into "words" that will act as seeds in alignments Words Query BLAST searches for matches (or synonyms) in target entries in the database Word match Target sequence If a target entry has two or more matches to "words" from the query, the alignment is extended in both directions looking for additional similarity Word match Target sequence

E A G A G L G L E L E S "Words" are integral to the BLASTP search BLASTP uses a sliding window to identify words Consider the sequence: E A G L E S BLASTP would break this down into a series of four 3-letter words: E A G A G L G L E L E S Tip! Use a non-proportional word font such as Courier when working with database entries. The fonts are uglier, but the letters have a constant spacing that generates nice columns! Next: words are given a numerical score

E A G A G L G L E L E S E A G A G L G L E L E S BLASTP uses the BLOSUM62 matrix as its default for assigning values to words E A G A G L G L E L E S 5 + 4 + 6 = 15 4 + 6 + 4 = 14 6 + 4 + 5 = 15 4 + 5 + 4 = 13 BLASTP next checks for word synonyms (1-letter replacements)with a score greater than a default threshold of 10 E A G A G L G L E L E S K A G (11) E S G (12) E C G (11) E T G (11) E V G (11) G I E (13) G L D (12) G L Q (12) S G L (11) A G I (12) I E S (13) BLASTP will search for all of these words and synonyms in the protein database Of the 60 possible synonyms for each word, only a small handful are statistically likely to appear in homologous proteins

Sequences must have at least two words for further consideration BLASTP uses word matches as a nucleus and extends them in both directions, looking for additional similarity Word match Target sequence Original search word Q A S T L Y E - A G L E S E A T T N - - R R E I + A + T + + + G L E S E A + + R + E + N A A T Y W D A S G L E S - - - S Q I I R K E L Query Summary Target As BLASTP extends the alignment out from the match, it calculates a running score – extension stops when the score drops below a threshold value Penalties are assigned for gaps and mismatches Plus signs in summary line indicate a positive BLOSUM62 value

Are there any correlations between the kind of amino acid substitutions observed over evolution with their biochemistry? How are bioinformatics tools used to analyze the conservation of protein sequences? How can I identify regions of proteins that are most strongly conserved and most likely to be important for function?

Highly conserved protein sequences are often essential for function You will compare sequences of homologous proteins from model organisms Caenorhabditis elegans Escherichia coli K-12 (gram negative) Arabidopsis thaliana Mus musculus Bacillus subtilis str. 168 (gram positive)

Phylogeny.fr provides tools for preparing multiple sequence alignments and phylogenetic trees

Multiple sequence alignments show regions of conservation Identical amino acids are shown in blue – conservative changes in grey

Tree Dyn generates a phylogenetic tree Length of branches reflects time since divergene from a node Bootstrap values predict reliability of nodes in the tree (max = 1.0) Length corresponds to 600 million years

Weblogo program provides a graphical depiction of multiple sequence alignments Sizes of different amino acids reflects the frequency with which a particular amino acid is found at the position – note the positions of amino acids with high BLOSUM scores