Presentation is loading. Please wait.

Presentation is loading. Please wait.

#19 - Protein Structure Basics & Classification

Similar presentations


Presentation on theme: "#19 - Protein Structure Basics & Classification"— Presentation transcript:

1 #19 - Protein Structure Basics & Classification
BCB 444/544 10/5/07 Lecture 19 A bit of: Protein Structure - Basics Protein Structure Visualization, Classification & Comparison #19_Oct05 BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

2 Required Reading (before lecture)
#19 - Protein Structure Basics & Classification Required Reading (before lecture) 10/5/07 √Mon Oct 1 - Lecture 17 Protein Motifs & Domain Prediction Chp 7 - pp 85-96 √ Wed Oct 3 - Lecture 18 Protein Structure: Basics (Note chg in Lecture Schedule online ) Chp 12 - pp √Thurs Oct 4 & Fri Oct 5 - Lab 6 & Lecture 19 Protein Structure: Basics, Databases, Visualization, Classification & Comparison Chp 13 - pp BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

3 BCB 544 - Extra Required Reading
#19 - Protein Structure Basics & Classification 10/5/07 BCB Extra Required Reading Assigned Mon Sept 24 BCB 544 Extra Required Reading Assignment: for 544 Extra HW#1 Task 2 Pollard KS, …., Haussler D. (2006) An RNA gene expressed during cortical development evolved rapidly in humans. Nature 443: doi: /nature05113 PDF available on class website - under Required Reading Link BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

4 BCB 544 Projects (Optional for BCB 444)
#19 - Protein Structure Basics & Classification 10/5/07 BCB 544 Projects (Optional for BCB 444) For a better idea about what's involved in the Team Projects, please look over last year's expectations for projects: Criteria for evaluation of projects (oral presentations) are summarized here: Please note: wrong URL (instead of that shown above) was included in originally posted 544ExtraHW#1; corrected version is posted now BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

5 Assignments & Announcements - #1
#19 - Protein Structure Basics & Classification 10/5/07 Assignments & Announcements - #1 Students registered for BCB 444: Two Grading Options 1) Take Final Exam per original Grading Policies 2) Instead of taking Final Exam - you may participate in a Team Research Project If you choose #2, please do 3 things: Contact Drena (in person) Send to Michael Terribilini Complete 544 Extra HW#1 - Task 1.1 by noon on Mon Oct 1 BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

6 Assignments & Announcements - #2
#19 - Protein Structure Basics & Classification 10/5/07 Assignments & Announcements - #2 BCB 444s (Standard): 200 pts Midterm Exams = 100 points each 200 Homework & Laboratory assignments = 200 points 100 Final Exam 500 pts Total for BCB 444 BCB 444p (Project): 190 Team Research Project 590 pts Total for BCB 444p BCB 544: 200 pts Midterm Exams = 100 points each 200 Homework & Laboratory assignments 200 Discussion Questions & Team Research Projects 700 pts Total for BCB 544 BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

7 Assignments & Announcements #3
#19 - Protein Structure Basics & Classification 10/5/07 Assignments & Announcements #3 ALL: HomeWork #3 Due: Mon Oct 8 by 5 PM HW544: HW544Extra #1 √Due: Task Mon Oct 1 by noon Due: Task 1.2 & Task 2 - Fri Oct 12 by 5 PM (not Monday) 444 "Project-instead-of-Final" students should also submit: HW544Extra #1 Due: Task Mon Oct 8 by noon Due: Task Fri Oct 12 by 5 PM (not Monday) Task 2 NOT required! BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

8 QUESTIONS re: HW#3? Due Mon
#19 - Protein Structure Basics & Classification 10/5/07 QUESTIONS re: HW#3? Due Mon BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

9 HMM example from Eddy HMM paper: Toy HMM for Splice Site Prediction
#19 - Protein Structure Basics & Classification 10/5/07 This is a new slide HMM example from Eddy HMM paper: Toy HMM for Splice Site Prediction BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

10 An HMM for Occasionally Dishonest Casino
#19 - Protein Structure Basics & Classification An HMM for Occasionally Dishonest Casino 10/5/07 Transition probabilities Prob(Fair  Loaded) = 0.01 Prob(Loaded  Fair) = 0.2 But, where do you start? "Begin" state not shown BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

11 Occasionally Dishonest Casino - HW#3
#19 - Protein Structure Basics & Classification Occasionally Dishonest Casino - HW#3 10/5/07 "Begin" state? 50:50 chance of starting with F vs L die BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

12 Calculating Different Paths to an Observed Sequence
#19 - Protein Structure Basics & Classification 10/5/07 This slide has been changed Calculating Different Paths to an Observed Sequence Calculations such as those shown below are used to fill a matrix with probability values for every state at every position transition probability emission probability BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

13 #19 - Protein Structure Basics & Classification
10/5/07 Calculate optimal path? Construct a matrix of probability values for every state at every residue How: one way = Viterbi Algorithm Initialization (i = 0) Recursion (i = 1, , L): For each state k Termination: To find *, use trace-back, as in dynamic programming BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

14 Viterbi for Calculating Most Probable Path*
#19 - Protein Structure Basics & Classification Viterbi for Calculating Most Probable Path* 10/5/07 1 x 6 2  (1/6)(1/2) = 1/12 (1/2)(1/2) = 1/4 (1/6)max{(1/12)0.99, (1/4)0.2} = (1/10)max{(1/12)0.01, (1/4)0.8} = 0.02 B F L (1/6)max{ 0.99, 0.020.2} = (1/2)max{ 0.01, 0.020.8} = 0.08 * Path within HMM that matches query sequence with highest probability BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

15 #19 - Protein Structure Basics & Classification
Total Probability 10/5/07 Several different paths can result in observation x Probability that our model will emit x is: BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

16 Calculating the Total Probability:
#19 - Protein Structure Basics & Classification 10/5/07 This slide has bee changed Calculating the Total Probability: Note: This not the same as matrix on previous slide! Here, last column contains sums for each row 1 x 6 2  (1/6)(1/2) = 1/12 (1/2)(1/2) = 1/4 (1/6)sum{(1/12)0.99, (1/4)0.2} = (1/10)sum{(1/12)0.01, (1/4)0.8} = B F L (1/6)sum{ 0.99, 0.2} = (1/2)sum{ 0.01, 0.8} = Total probability = = = 0.012 BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

17 A few more Details re: Profiles & HMMs
#19 - Protein Structure Basics & Classification 10/5/07 A few more Details re: Profiles & HMMs Smoothing or "Regularization" - method used to avoid "over-fitting" Common problem in machine learning (data-driven) approaches Limited training sample size causes over-representation of observed characters while "ignoring" unobserved characters Result? Miss members of family not yet sampled (too many false negative hits) Pseudocounts - adding artificial values for 'extra' amino acid(s) not observed in the training set Treated as a 'real' values in calculating probabilities Improve predictive power of profiles & HMMs Dirichlet mixture - commonly used mathematical model to simulate the aa distribution in a sequence alignment To "correct" problems in an observed alignment based on limited number of sequences BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

18 Chp 7 - Protein Motifs & Domain Prediction
#19 - Protein Structure Basics & Classification 10/5/07 Chp 7 - Protein Motifs & Domain Prediction SECTION II SEQUENCE ALIGNMENT Xiong: Chp 7 Protein Motifs and Domain Prediction √Identification of Motifs & Domains in MSAs √Motif & Domain Databases Using Regular Expressions √Motif & Domain Databases Using Statistical Models Protein Family Databases Motif Discovery in Unaligned Sequences √Sequence Logos BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

19 #19 - Protein Structure Basics & Classification
10/5/07 Motifs & Domains Motif - short conserved sequence pattern Associated with distinct function in protein or DNA Avg = 10 residues (usually 6-20 residues) e.g., zinc finger motif - in protein e.g., TATA box - in DNA Domain - "longer" conserved sequence pattern, defined as a independent functional and/or structural unit Avg = 100 residues (range from in proteins) e.g., kinase domain or transmembrane domain - in protein Domains may (or may not) include motifs BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

20 #19 - Protein Structure Basics & Classification
10/5/07 2 Approaches for Representing "Consensus" Information in Motifs & Domains Regular expression - symbolic representation of information from MSA e.g., protein phosphorylation site motif: [S,T]- X- [R,K] Symbols represent specific or unspecified residues, spaces, etc. 2 mechanisms for matching: Exact "Fuzzy" (inexact, approximate) - flexible, more permissive to detect "near matches" Statistical model - includes probability information derived from MSA e.g., PSSM, Profile, or HMM BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

21 Motif & Domain Databases
#19 - Protein Structure Basics & Classification 10/5/07 Motif & Domain Databases Based on regular expressions: Prosite (Interpro includes Prosite, PRINTS, etc) Emofit Limitation: these don't take probability info into account Based on statistical models: PRINTS BLOCKS ProDom Pfam SMART CDART Reverse PsiBLAST READ your textbook & try some of these at home; there are distinct advantages/disadvantages associated with each TAKE HOME LESSON: Always try several methods! (not just one!) BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

22 Protein Family Databases
#19 - Protein Structure Basics & Classification 10/5/07 Protein Family Databases In addition to databases of "related" protein sequences, based on shared motifs or domains (Pfam, BLOCKS, CDART), some databases "cluster" sequences into families based on near full-length sequence comparisons COGs - Clusters of Orthologous Groups (at NCBI) Mostly Prokaryotic sequences KOG = newer Eukaryotic version COGnitor - softwared to search database ProtoNet - also clusters of homologous protein sequences Advantages: tree-like hierarchical structure Provide GO (gene ontology) annotations Provides InterPro keywords BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

23 Motif Discovery in Unaligned Sequences
#19 - Protein Structure Basics & Classification 10/5/07 Motif Discovery in Unaligned Sequences Expectation Maximization - generate"random" alignment of all sequences, derive PSSM, iteratively match individual sequences to PSSM to edit & improve it Problems? Can hit a local optimum (premature convergence) Sensitive to initial alignment MEME - Multiple EM for Motif Elicitation - modified EM, avoids local optimum issues; two step procedure Gibbs Sampling - generate "trial" PSSM from random alignment first, as in EM, but leave one sequence out of initial alignment, then iteratively match PSSM to left-out sequences Gibbs Sampler - web-based motif search via Gibbs sampling Not mentioned in textbook: Stochastic context-free grammers Other "state of the art"pproaches in recent literature, but not available in web-based servers (yet) BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

24 Chp 12 - Protein Structure Basics
#19 - Protein Structure Basics & Classification 10/5/07 Chp 12 - Protein Structure Basics SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 12 Protein Structure Basics LAB 6 Introduction to Protein DataBank - PDB PyMol Cn3D? BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

25 Chp 12 - Protein Structure Basics
#19 - Protein Structure Basics & Classification 10/5/07 Chp 12 - Protein Structure Basics SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 12 Protein Structure Basics Amino Acids Peptide Bond Formation Dihedral Angles Hierarchy Secondary Structures Tertiary Structures Determination of Protein 3-Dimensional Structure Protein Structure DataBank (PDB) BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

26 Protein Structure & Function
#19 - Protein Structure Basics & Classification Protein Structure & Function 10/5/07 Protein structure - primarily determined by sequence Protein function - primarily determined by structure Globular proteins: compact hydrophobic core & hydrophilic surface Membrane proteins: special hydrophobic surfaces Folded proteins are only marginally stable Some proteins do not assume a stable "fold" until they bind to something = Intrinsically disordered Predicting protein structure and function can be very hard -- & fun! BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

27 4 Basic Levels of Protein Structure
#19 - Protein Structure Basics & Classification 4 Basic Levels of Protein Structure 10/5/07 BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

28 Primary & Secondary Structure
#19 - Protein Structure Basics & Classification 10/5/07 Primary & Secondary Structure Primary Linear sequence of amino acids Description of covalent bonds linking aa’s Secondary Local spatial arrangement of amino acids Description of short-range non-covalent interactions Periodic structural patterns: -helix, b-sheet BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

29 Tertiary & Quaternary Structure
#19 - Protein Structure Basics & Classification Tertiary & Quaternary Structure 10/5/07 Tertiary Overall 3-D "fold" of a single polypeptide chain Spatial arrangement of 2’ structural elements; packing of these into compact "domains" Description of long-range non-covalent interactions (plus disulfide bonds) Quaternary In proteins with > 1 polypeptide chain, spatial arrangement of subunits BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

30 "Additional" Structural Levels
#19 - Protein Structure Basics & Classification "Additional" Structural Levels 10/5/07 Super-secondary elements Motifs Domains Foldons BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

31 #19 - Protein Structure Basics & Classification
Amino Acids 10/5/07 Each of 20 different amino acids has different "R-Group" or side chain attached to Ca BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

32 Peptide Bond is Rigid and Planar
#19 - Protein Structure Basics & Classification Peptide Bond is Rigid and Planar 10/5/07 BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

33 Hydrophobic Amino Acids
#19 - Protein Structure Basics & Classification Hydrophobic Amino Acids 10/5/07 BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

34 #19 - Protein Structure Basics & Classification
Charged Amino Acids 10/5/07 BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

35 #19 - Protein Structure Basics & Classification
Polar Amino Acids 10/5/07 BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

36 Certain Side-chain Configurations are Energetically Favored (Rotamers)
#19 - Protein Structure Basics & Classification 10/5/07 Ramachandran plot: "Allowable" psi & phi angles BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

37 Glycine is Smallest Amino Acid R group = H atom
#19 - Protein Structure Basics & Classification Glycine is Smallest Amino Acid R group = H atom 10/5/07 Glycine residues increase backbone flexibility because they have no R group BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

38 #19 - Protein Structure Basics & Classification
Proline is Cyclic 10/5/07 Proline residues reduce flexibility of polypeptide chain Proline cis-trans isomerization is often a rate-limiting step in protein folding Recent work suggests it also may also regulate ligand binding in native proteins Andreotti (BBMB) BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

39 Cysteines can Form Disulfide (S-S) Bonds
#19 - Protein Structure Basics & Classification Cysteines can Form Disulfide (S-S) Bonds 10/5/07 Disulfide bonds (covalent) stabilize 3-D structures In eukaryotes, disulfide bonds are often found in secreted proteins or extracellular domains BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

40 Globular Proteins Have a Compact Hydrophobic Core
#19 - Protein Structure Basics & Classification Globular Proteins Have a Compact Hydrophobic Core 10/5/07 Packing of hydrophobic side chains into interior is main driving force for folding Problem? Polypeptide backbone is highly polar (hydrophilic) due to polar -NH and C=O in each peptide unit (which are charged at neutral pH=7, found in biological systems); these polar groups must be neutralized Solution? Form regular secondary structures, e.g., -helix, b-sheet, stabilized by H-bonds BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

41 Exterior Surface of Globular Proteins is Generally Hydrophilic
#19 - Protein Structure Basics & Classification 10/5/07 Exterior Surface of Globular Proteins is Generally Hydrophilic Hydrophobic core formed by packed secondary structural elements provides compact, stable core "Functional groups" of protein are attached to this framework; exterior has more flexible regions (loops) and polar/charged residues Hydrophobic "patches" on protein surface are often involved in protein-protein interactions BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

42 Protein Secondary Structures
#19 - Protein Structure Basics & Classification Protein Secondary Structures 10/5/07 Helices Sheets Loops Coils BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

43 Helix: Stabilized by H-bonds between every ~ 4th residue in Backbone
#19 - Protein Structure Basics & Classification Helix: Stabilized by H-bonds between every ~ 4th residue in Backbone 10/5/07 C = black O = red N = blue H = white Look! - Charges on backbone are "neutralized" by hydrogen bonds (H-bonds) - red fuzzy vertical bonds BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

44 Certain Amino Acids are "Preferred" & Others are Rare in Helices
#19 - Protein Structure Basics & Classification 10/5/07 Certain Amino Acids are "Preferred" & Others are Rare in Helices Ala, Glu, Leu, Met = good helix formers Pro, Gly Tyr, Ser = very poor Amino acid composition & distribution varies, depending on on location of helix in 3-D structure BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

45 -Sheets - also Stabilized by H-bonds Between Backbone Atoms
#19 - Protein Structure Basics & Classification 10/5/07 -Sheets - also Stabilized by H-bonds Between Backbone Atoms Anti-parallel Parallel BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

46 #19 - Protein Structure Basics & Classification
10/5/07 Loops Connect helices and sheets Vary in length and 3-D configurations Are located on surface of structure Are more "tolerant" of mutations Are more flexible and can adopt multiple conformations Tend to have charged and polar amino acids Are frequently components of active sites Some fall into distinct structural families (e.g., hairpin loops, reverse turns) BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

47 #19 - Protein Structure Basics & Classification
Coils 10/5/07 Regions of 2' structure that are not helices, sheets, or recognizable turns Intrinsically disordered regions appear to play important functional roles BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs

48 Chp 13 - Protein Structure Basics
#19 - Protein Structure Basics & Classification 10/5/07 Chp 13 - Protein Structure Basics SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 13 Protein Structure Visualization, Comparison & Classfication Protein Structural Visualization Protein Structure Comparison Protein Structure Classification BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification BCB 444/544 Fall 07 Dobbs


Download ppt "#19 - Protein Structure Basics & Classification"

Similar presentations


Ads by Google