Discovering critical residues in glutathione reductase bioinformatics_GR_presentation.ppt Donnie Berkholz.

Slides:



Advertisements
Similar presentations
Motivation “Nothing in biology makes sense except in the light of evolution” Christian Theodosius Dobzhansky.
Advertisements

Statistics in Bioinformatics May 2, 2002 Quiz-15 min Learning objectives-Understand equally likely outcomes, Counting techniques (Example, genetic code,
Bayesian Estimation in MARK
Undergraduate Exercises with Trp Cage Paula Evans, Chet Fornari, Jeff Hansen, Jennifer Inlow, Larry Merkle.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
Protein Tertiary Structure Prediction
© Wiley Publishing All Rights Reserved. Phylogeny.
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Correlated Mutations and Co-evolution May 1 st, 2002.
Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.
Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.
Tutorial 5 Motif discovery.
Methods for Phylogenetics and Evolutionary analysis Jianpeng Xu University of Nebraska-Omah a.
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Multiple sequence alignments and motif discovery Tutorial 5.
07/05/2004 Evolution/Phylogeny Introduction to Bioinformatics MNW2.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Protein Multiple Sequence Alignment Sarah Aerni CS374 December 7, 2006.
Exploring Protein Sequences Tutorial 5. Exploring Protein Sequences Multiple alignment –ClustalW Motif discovery –MEME –Jaspar.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Probabilistic methods for phylogenetic trees (Part 2)
CIS786, Lecture 8 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Protein Tertiary Structure Prediction
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
BINF6201/8201 Molecular phylogenetic methods
Christian M Zmasek, PhD 15 June 2010.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
How to Raise the Dead: The Nuts & Bolts of Ancestral Sequence Reconstruction Jeffrey Boucher Theobald Laboratory.
Phylogenetic Analysis Dayong Guo. Introduction Phylogenetics is the study of evolutionary relatedness among various species, populations, or among a set.
ZORRO : A masking program for incorporating Alignment Accuracy in Phylogenetic Inference Sourav Chatterji Martin Wu.
WEBLOGO PLUS Sagar Gaikwad and Mohit Agrawal. LTMT.-RGDIGNYLGLTVETISRLLGRFQKLGVL LTMT.-RGDIGNYLGLTVETISR LTMT.-RGDIGNYLGLTVETISRLLGRFQKLGVI.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Bayes estimators for phylogenetic reconstruction Ruriko Yoshida.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Bayes estimators for phylogenetic reconstruction Ruriko Yoshida.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Parallel & Distributed Systems and Algorithms for Inference of Large Phylogenetic Trees with Maximum Likelihood Alexandros Stamatakis LRR TU München Contact:
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Prediction of Protein Binding Sites in Protein Structures Using Hidden Markov Support Vector Machine.
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
Motif Search and RNA Structure Prediction Lesson 9.
Phyloinformatics or How to analyze LOTS of sequences Heath Blackmon University of Texas at Arlington Bioinformatics – Spring 2014.
Expected accuracy sequence alignment Usman Roshan.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Ubiquitination Sites Prediction Dah Mee Ko Advisor: Dr.Predrag Radivojac School of Informatics Indiana University May 22, 2009.
Bayesian statistics named after the Reverend Mr Bayes based on the concept that you can estimate the statistical properties of a system after measuting.
PatchFinder. The ConSurf web-server calculates the evolutionary rate for each position in the protein. Surface clusters of spatially close & conserved.
HANDS-ON ConSurf! Web-Server: The ConSurf webserver.
Protein Tertiary Structure Prediction Structural Bioinformatics.
HANDS-ON ConSurf! Web-Server: The ConSurf webserver.
Bayesian II Spring Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have a large enough sample of the posterior?
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
Phylip PHYLIP (the PHYLogeny Inference Package) is a package of programs for inferring phylogenies (evolutionary trees). PHYLIP is the most widely-distributed.
Canadian Bioinformatics Workshops
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Clustering (1) Clustering Similarity measure Hierarchical clustering
Bayesian inference Presented by Amir Hadadi
Predicting Active Site Residue Annotations in the Pfam Database
Support Vector Machine (SVM)
Volume 9, Issue 9, Pages (September 2016)
Lecture 19: Evolution/Phylogeny
LC8 is structurally variable but conserved in sequence.
Presentation transcript:

Discovering critical residues in glutathione reductase bioinformatics_GR_presentation.ppt Donnie Berkholz

What and How ● Role – Reduced thiols – Oxidative stress – DNA precursors – H + transport ● Mechanism – Flavoprotein – NADPH – Disulfide

Goals ● Figure out the best programs and methods for this analysis ● Search for unknown critical residues ● Verify whether residues already thought critical are actually conserved ● Check for potential differences in function and specificity among subfamilies (Podar et al.)

Multiple sequence alignments

ClustalWDialign-TMuscleProbCons

● “Probabilistic consistency” ● Pair-HMM based ● Three-way alignment consistency ● Parameters derived from training ● Maximized accuracy ProbCons

How to find important residues? ● Principal component analysis (PCA) – Each sequence becomes a vector – Successive dimensions grow less significant ● Evolutionary trace and friends – Divide tree into groups, then check them – So, first we need trees

Trees ● Maximum likelihood – ProML (PHYLIP) – Gamma distribution + invariant sites – Approximate with 5 rate categories ● Bayesian – MrBayes – Gamma distribution + invariant sites – MCMC: Markov chain Monte Carlo – Mixed: sample with probability -> WAG – Try variable-rate models

ConSurf ● Calculates evolutionary conservation (Bayesian) ● Maps onto protein structure ● Input flexibility – PDB -> seq. -> PSI-BLAST -> MSA -> NJ -> CS ● Can't yet analyze subfamilies

NADPH environment

Disulfide environment

Catalytic: H467+D472

Structure without function?

Surface: F354+D22

Surface: D316+T321

FAD binding

Stabilizing the phosphate

Structural stability

What next? ● Check for validity of tree model ● Tree-determinant residues ● Experimental functional determination

Summary ● ProbCons is great for MSA's ● Bayesian trees take forever, but they provide confidence values (no bootstrap!) ● ConSurf maps sequence conservation onto protein structures ● Supports catalytic hypothesis ● New putative functional roles: – Interactions? F354+D22, D316+T321 – Binding: I26, R218 – Structure: H434 etc

References ClustalW: Chenna et al. NAR 31: 3497 (2003). Muscle: Edgar. NAR 32: 1792 (2004). Dialign-T: Morgenstern. NAR 32: W33 (2004). ProbCons: Chuong et al. Genome Res. 15: 330 (2005) Jalview: Clamp et al. Bioinform. 12: 426 (2004). PHYLIP: Felsenstein. Distributed by author (2005). MrBayes: Ronquist and Huelsenbeck. Bioinform. 19: 1572 (2003). ConSurf: Landau et al. NAR 33: W299 (2005). PyMol: DeLano. (2005).