Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Slides:



Advertisements
Similar presentations
Fa05CSE 182 CSE182-L4: Scoring matrices, Dictionary Matching.
Advertisements

Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Protein Structure C483 Spring 2013.
Protein Structure Prediction
Protein Structure – Part-2 Pauling Rules The bond lengths and bond angles should be distorted as little as possible. No two atoms should approach one another.
The amino acids in their natural habitat. Topics: Hydrogen bonds Secondary Structure Alpha helix Beta strands & beta sheets Turns Loop Tertiary & Quarternary.
1 Profile Hidden Markov Models For Protein Structure Prediction Colin Cherry
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
1 September, 2004 Chapter 5 Macromolecular Structure.
Strict Regularities in Structure-Sequence Relationship
Fa 05CSE182 CSE182-L7 Protein sequencing and Mass Spectrometry.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Fa05CSE 182 CSE182-L5: Position specific scoring matrices Regular Expression Matching Protein Domains.
Fa 06CSE182 CSE182-L6 Protein sequence analysis Fa 06CSE182 Possible domain queries Case 1: –You have a collection of sequences that belong to a family.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Pattern databases in protein analysis Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP.
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST.
Introduction to Bioinformatics - Tutorial no. 8 Protein Prediction: - PROSITE - Pfam - SCOP - TOPITS - genThreader.
CSE182-L5: Scoring matrices Dictionary Matching
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Structure Lecture 2/26/2003. beta sheets are twisted Parallel sheets are less twisted than antiparallel and are always buried. In contrast, antiparallel.
Protein sequencing and Mass Spectrometry. Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
Protein Structural Prediction. Protein Structure is Hierarchical.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Motifs of Protein Structure. Adapted from “Introduction to Protein Structure” by Branden & Tooze.
Introduction to Protein Structure
Proteins Secondary Structure Predictions Structural Bioinformatics.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Proteins: Amino Acid Chains DNA Polymerase from E. coli Standard amino acid backbone: Carboxylic acid group, amino group, the alpha hydrogen and an R group.
Protein “folding” occurs due to the intrinsic chemical/physical properties of the 1° structure “Unstructured” “Disordered” “Denatured” “Unfolded” “Structured”
Sequence Alignment Techniques. In this presentation…… Part 1 – Searching for Sequence Similarity Part 2 – Multiple Sequence Alignment.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
Doug Raiford Lesson 19.  Framework model  Secondary structure first  Assemble secondary structure segments  Hydrophobic collapse  Molten: compact.
Monday, October 25, 5:56:17 PM What are gene families?  A gene family is a group of genes that share important characteristics.  In many cases, genes.
Protein Secondary Structure, Bioinformatics Tools, and Multiple Sequence Alignments Finding Similar Sequences Predicting Secondary Structures Predicting.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
The α-helix forms within a continuous strech of the polypeptide chain 5.4 Å rise, 3.6 aa/turn  1.5 Å/aa N-term C-term prototypical  = -57  ψ = -47 
10-07CSE182 CSE182-L7 Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Peptide Identification via Tandem Mass Spectrometry Sorin Istrail.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
SUPERSECONDARY STRUCTURE, DOMAINS AND TERTIARY STRUCTURE.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Protein backbone Biochemical view:
Levels of Protein Structure. Why is the structure of proteins (and the other organic nutrients) important to learn?
Levels of Protein Structure. Why is the structure of proteins (and the other organic nutrients) important to learn?
Proteins Structure Predictions Structural Bioinformatics.
EBI is an Outstation of the European Molecular Biology Laboratory. A web based integrated search service to understand ligand binding and secondary structure.
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
Structural organization of proteins
Peptide de novo sequencing Peptide de novo sequencing is the analytical process that derives a peptide’s amino acid sequence from its tandem mass spectrum.
Dicitionary matching Pattern matching
Sequence Based Analysis Tutorial
Sequence Based Analysis Tutorial
Rosetta: De Novo determination of protein structure
Protein structure prediction.
Presentation transcript:

Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing

Fa 05CSE182 Announcements Midterm 1: Nov 1, in class. Assignment 2: Online, due October 20.

Fa 05CSE182 Distinguishing between families

Fa 05CSE182 Distinguishing between families Assignment 2

Fa 05CSE182 Profiles Start with an alignment of strings of length m, over an alphabet A, Build an |A| X m matrix F=(f ki ) Each entry f ki represents the frequency of symbol k in position i

Fa 05CSE182 Scoring Profiles k i s f ki Scoring Matrix

Fa 05CSE182 Psi-BLAST idea Multiple alignments are important for capturing remote homology. Profile based scores are a natural way to handle this. Q: What if the query is a single sequence. A: Iterate: –Find homologs using Blast on query –Discard very similar homologs –Align, make a profile, search with profile.

Fa 05CSE182 Psi-BLAST speed Two time consuming steps. 1.Multiple alignment of homologs 2.Searching with Profiles. 1.Does the keyword search idea work? Multiple alignment: –Use ungapped multiple alignments only Pigeonhole principle again: –If profile of length m must score >= T –Then, a sub-profile of length l must score >= lT|/m –Generate all l-mers that score at least lT|/M –Search using an automaton

Fa 05CSE182 Protein Domains An important realization (in the last decade) is that proteins have a modular architecture of domains/folds. Example: The zinc finger domain is a DNA-binding domain. What is a domain? –Part of a sequence that can fold independently, and is present in other sequences as well

Fa 05CSE182 Domain review What is a domain? How are domains expressed –Motifs (Regular expression & others) –Multiple alignments –Profiles –Profile HMMs

Fa 05CSE182 Domain databases Can you speed up HMM search?

Fa 05CSE182 A structural view of proteins

Fa 05CSE182 CS view of a protein >sp|P00974|BPT1_BOVIN Pancreatic trypsin inhibitor precursor (Basic protease inhibitor) (BPI) (BPTI) (Aprotinin) - Bos taurus (Bovine). MKMSRLCLSVALLVLLGTLAASTPGCDTSNQAKAQ RPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGG CRAKRNNFKSAEDCMRTCGGAIGPWENL

Fa 05CSE182 Protein structure basics

Fa 05CSE182 Side chains determine amino-acid type The residues may have different properties. Aspartic acid (D), and Glutamic Acid (E) are acidic residues

Fa 05CSE182 Bond angles form structural constraints

Fa 05CSE182 Various constraints determine 3d structure Constraints –Structural constraints due to physiochemical properties –Constraints due to bond angles –H-bond formation Surprisingly, a few conformations are seen over and over again.

Fa 05CSE182 Alpha-helix 3.6 residues per turn H-bonds between 1st and 4th residue stabilize the structure. First discovered by Linus Pauling

Fa 05CSE182 Beta-sheet Each strand by itself has 2 residues per turn, and is not stable. Adjacent strands hydrogen-bond to form stable beta-sheets, parallel or anti-parallel. Beta sheets have long range interactions that stabilize the structure, while alpha-helices have local interactions.

Fa 05CSE182 Domains The basic structures (helix, strand, loop) combine to form complex 3D structures. Certain combinations are popular. Many sequences, but only a few folds

Fa 05CSE182 3D structure Predicting tertiary structure is an important problem in Bioinformatics. Premise: Clues to structure can be found in the sequence. While de novo tertiary structure prediction is hard, there are many intermediate, and tractable goals. The PDB database is a compendium of structures PDB

Fa 05CSE182 Searching structure databases Threading, and other 3d Alignments can be used to align structures. Database filtering is possible through geometric hashing.

Fa 05CSE182 Trivia Quiz What research won the Nobel prize in Chemistry in 2004? In 2002?

Fa 05CSE182 How are Proteins Sequenced? Mass Spec 101:

Fa 05CSE182 Nobel Citation 2002

Fa 05CSE182 Nobel Citation, 2002

Fa 05CSE182 Mass Spectrometry

Fa 05CSE182 Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation

Fa 05CSE182 Single Stage MS Mass Spectrometry LC-MS: 1 MS spectrum / second

Fa 05CSE182 Tandem MS Secondary Fragmentation Ionized parent peptide