Sequence analysis June 20, 2006 Learning objectives-Understand sliding window programs. Understand difference between identity, similarity and homology.

Slides:



Advertisements
Similar presentations
Proteins: Structure reflects function….. Fig. 5-UN1 Amino group Carboxyl group carbon.
Advertisements

Amino Acids PHC 211.  Characteristics and Structures of amino acids  Classification of Amino Acids  Essential and Nonessential Amino Acids  Levels.
Review of Basic Principles of Chemistry, Amino Acids and Proteins Brian Kuhlman: The material presented here is available on the.
NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.
5’ C 3’ OH (free) 1’ C 5’ PO4 (free) DNA is a linear polymer of nucleotide subunits joined together by phosphodiester bonds - covalent bonds between.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Protein Secondary Structures
Protein-a chemical view A chain of amino acids folded in 3D Picture from on-line biology bookon-line biology book Peptide Protein backbone N / C terminal.
1 Levels of Protein Structure Primary to Quaternary Structure.
Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Introduction to Bioinformatics Algorithms Sequence Alignment.
©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It.
It og Sundhed Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU
Sequence analysis June 19, 2007 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Sequence analysis June 17, 2003 Learning objectives-Review amino acids structures. Understand sliding window programs. Understand difference between identity,
Scoring Matrices June 19, 2008 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Alignment methods April 12, 2005 Return Homework (Ave. = 7.5)
Scoring Matrices June 22, 2006 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Molecular Techniques in Molecular Systematics. DNA-DNA hybridisation -Measures the degree of genetic similarity between pools of DNA sequences. -Normally.
Introduction to bioinformatics
©CMBI 2005 Why align sequences? Lots of sequences with unknown structure and function. A few sequences with known structure and function If they align,
Sequence similarity.
Sequence comparisons June 23, 2009 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Introduction to Bioinformatics Algorithms Sequence Alignment.
The relative orientation observed for  helices packed on ß sheets.
Protein Structure FDSC400. Protein Functions Biological?Food?
You Must Know How the sequence and subcomponents of proteins determine their properties. The cellular functions of proteins. (Brief – we will come back.
Protein structure prediction
Thursday and Friday Dr Michael Carton Formerly VO’F group, now National Disease Surveillance Centre (NDSC) Wed (tomorrow) 10am - this suite booked for.
Protein Synthesis. DNA RNA Proteins (Transcription) (Translation) DNA (genetic information stored in genes) RNA (working copies of genes) Proteins (functional.
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
BIOCHEMISTRY REVIEW Overview of Biomolecules Chapter 4 Protein Sequence.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Pairwise Sequence Alignment (II) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 27, 2005 ChengXiang Zhai Department of Computer Science University.
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
. Sequence Alignment. Sequences Much of bioinformatics involves sequences u DNA sequences u RNA sequences u Protein sequences We can think of these sequences.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Biology 4900 Biocomputing.
AMINO ACIDS.
Amino Acids & Side Groups Polar Charged ◦ ACIDIC negatively charged amino acids  ASP & GLU R group with a 2nd COOH that ionizes* above pH 7.02nd COOH.
Protein Secondary Structure Prediction
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
Secondary structure prediction
Learning Targets “I Can...” -State how many nucleotides make up a codon. -Use a codon chart to find the corresponding amino acid.
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning.
Maik Friedel, Swetlana Nikolajewa, Thomas Wilhelm Theoretical Systems Biology, FLI-Jena, Germany Codons and the reverse codons.
Protein Secondary Structure Prediction G P S Raghava.
CELL REPRODUCTION: MITOSIS INTERPHASE: DNA replicates PROPHASE: Chromatin condenses into chromosomes, centrioles start migrating METAPHASE: chromosomes.
A program of ITEST (Information Technology Experiences for Students and Teachers) funded by the National Science Foundation Background Session #3 DNA &
1 Protein synthesis How a nucleotide sequence is translated into amino acids.
Amino Acids ©CMBI 2001 “ When you understand the amino acids, you understand everything ”
Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS.
Proteins Structure of proteins Proteins are made of C, H, O and nitrogen and may have sulfur. The monomers of proteins are amino acids An amino acid.
Chapter 3 Proteins.
Sequence Alignment.
Sequence comparisons April 9, 2002 Review homework Learning objectives-Review amino acids. Understand difference between identity, similarity and homology.
Secondary Structure Prediction Lecture 7 Structural Bioinformatics Dr. Avraham Samson
Alignment methods April 17, 2007 Quiz 1—Question on databases Learning objectives- Understand difference between identity, similarity and homology. Understand.
Protein Sequence Alignment Multiple Sequence Alignment
Supplementary Fig. 1 Relative concentrations of amino acids after transamination reaction catalyzed by PpACL1, α- ketoglutarate as the amino acceptor.
Protein structure prediction Haixu Tang School of Informatics.
Protein structure prediction June 27, 2003 Learning objectives-Understand the basis of secondary structure prediction programs. Become familiar with the.
Doug Raiford Lesson 14.  Reminder  Involved in virtually every chemical reaction ▪ Enzymes catalyze reactions  Structure ▪ muscle, keratins (skin,
Fibrous Proteins Examples 1. a-keratins 2. Silk Fibroin 3. Collagen
Arginine, who are you? Why so important?. Release 2015_01 of 07-Jan-15 of UniProtKB/Swiss-Prot contains sequence entries, comprising
Sequence similarity, BLAST alignments & multiple sequence alignments
BIOLOGY 12 Protein Synthesis.
Protein Sequence Alignments
Figure 3.14A–D Protein structure (layer 1)
Presentation transcript:

Sequence analysis June 20, 2006 Learning objectives-Understand sliding window programs. Understand difference between identity, similarity and homology. Understand difference between global alignment and local alignment. Review amino acids structures. Workshop-Perform sliding window to compute %G+C as a function of position in sequence.

Sliding window (1) This refers to the number of characters looked at, during one particular time. Below is a sequence shown three times with different window lengths GCATATGCGCATATCCCGTCAATACCA 4 5 6

Sliding window (2) A "window" can be defined as a span of a certain number of residues (nucleotides or amino acids). One calculates some value for the residues in that window. Once the calculation is completed, the program shifts the window and the process repeats itself until the end of the sequence is reached. A simple example is to calculate the %G+C content within a window. Then move the window one nucleotide and repeat the calculation.

Sliding window (3) If the window is too small it is difficult to detect the trend of the measurement. If too large you could miss meaningful data. Large window size Small window size %G+C Number in sequence

Sliding window (4) From: Segurado et al., 2003

Dot Plots A T G C C T A G ATGCCTAGATGCCTAG * * * * * * * * * * * * * * * * Window = 1 Note that 25% of the table will be filled due to random chance. 1 in 4 chance at each position

Dot Plots with window = 2 A T G C C T A G ATGCCTAGATGCCTAG * * * * * * * Window = 2 The larger the window the more noise can be filtered What is the percent chance that you will receive a match randomly? One in (four) 2 chance. 1/16 * 100 = 6.25% { { { { { { {

Chou-Fasman-the first sliding window programs to predict protein secondary structure First widely used procedure If propensity in a window of six residues (for a helix) is above a certain threshold the helix is chosen as secondary structure. If propensity in a window of five residues (for a beta strand) is above a certain threshold then beta strand is chosen. Each classification is extended until the average propensity in a 4 residue window falls below a value. Output-helix, strand or turn.

Chou-Fasman Rules (Mathews, Van Holde, Ahern) Amino Acid  -Helix  -SheetTurn Ala Cys Leu Met Glu Gln His Lys Val Ile Phe Tyr Trp Thr Gly Ser Asp Asn Pro Arg Favors  -Helix Favors  -Sheet Favors Turns

Chou&Fasman structure prediction Chou & Fasman [Biochemistry 13(2): (1974)]. By studying a number of proteins whose structures were known, they were able to determine stretches of amino acids that could serve to form an  -helix or a  - sheet. These amino acids are called helix formers or sheet formers and can have different strengths for forming their structures. Once these nucleation sites are determined, adjacent amino acids are examined to see if the structure can be extended in either or both directions. Values for some amino acids allow extension, other amino acids do not. Some amino acids are categorized as helix breakers, or sheet breakers. A string of these will terminate the current structure. This method is about 60-65% accurate.

Kyte-Doolittle Hydropathy – Another sliding window routine [J. Mol. Biol. 157: (1982)]. They determine a "hydropathy scale" for each amino acid based on empirical observations

Workshop 3

Evolutionary Basis of Sequence Alignment 1. Identity: Quantity that describes how much two sequences are alike in the strictest terms. 2. Similarity: Quantity that relates how much two amino acid sequences are alike. 3. Homology: a conclusion drawn from data suggesting that two genes share a common evolutionary history.

Purpose of finding differences and similarities of amino acids in two proteins. Infer structural information Infer functional information Infer evolutionary relationships

Modular nature of proteins (cont. 1) Exon 1a Exon 2a Duplication of Exon 2a Exon 1a Exon 2a Exchange with Gene B Gene A Gene B Exon 1a Exon 2a Exon 3 (Exon 2b from Gene B) Exon 1b Exon 2b Exon 3 (Exon 2a from Gene A) Exon 1b Exon 2b Gene B

Evolutionary Basis of Sequence Alignment (Cont. 1) Why are there regions of identity? 1) Conserved function-residues participate in reaction. 2) Structural (For example, conserved cysteine residues that form a disulfide linkage) 3) Historical-Residues that are conserved solely due to a common ancestor gene.

One is mouse trypsin and the other is crayfish trypsin. They are homologous proteins. The sequences share 41% identity.

Evolutionary Basis of Sequence Alignment (Cont. 2) Note: it is possible that two proteins share a high degree of similarity but have two different functions. For example, human gamma-crystallin is a lens protein that has no known enzymatic activity. It shares a high percentage of identity with E. coli quinone oxidoreductase. These proteins likely had a common ancestor but their functions diverged. Analogous to railroad car and diner function.

Modular nature of proteins The previous alignment was global. However, many proteins do not display global patterns of similarity. Instead, they possess local regions of similarity. Proteins can be thought of as assemblies of modular domains. THINK OF MR. POTATOHEAD. It is thought that this may, in some cases, be due to a process known as exon shuffling.THINK OF MR. POTATOHEAD.

Identity Matrix Simplest type of scoring matrix LICA 1000L 100I 10C 1A

Similarity It is easy to score if an amino acid is identical to another (the score is 1 if identical and 0 if not). However, it is not easy to give a score for amino acids that are somewhat similar. + NH 3 CO NH 3 CO 2 - Leucine Isoleucine Should they get a 0 (non-identical) or a 1 (identical) or Something in between?

Two proteins that are similar in certain regions Tissue plasminogen activator (PLAT) Coagulation factor 12 (F12).

The Dotter Program Program consists of three components: Sliding window A table that gives a score for each amino acid match A graph that converts the score to a dot of certain density. The higher the density the higher the score.

Region of similarity Single region on F12 is similar to two regions on PLAT