Tools to analyze protein characteristics Protein sequence -Family member -Multiple alignments Identification of conserved regions Evolutionary relationship.

Tools to analyze protein characteristics Protein sequence -Family member -Multiple alignments Identification of conserved regions Evolutionary relationship (Phylogeny) 3-D fold model Protein sorting and sub-cellular localization Anchoring into the membrane Signal sequence (tags) Protein modifications  Some nascent proteins contain a specific signal, or targeting sequence that directs them to the correct organelle. ( ER, mitochondrial, chloroplast, lysosome, vacuoles, Golgi, or cytosol )

 Can we train the computers:  To detect signal sequences and predict protein destination?  T o identify conserved domains (or a pattern) in proteins?  To predict the membrane-anchoring type of a protein? ( Transmembrane domain, GPI anchor… )  T o predict the 3D structure of a protein?  Learning algorithms are good for solving problems in pattern recognition because they can be trained on a sample data set.  Classes of learning algorithms: -Artificial neural networks (ANNs) -Hidden Markov Models (HMM) Questions

Artificial neural networks (ANN)  Machine learning algorithms that mimic the brain. Real brains, however, are orders of magnitude more complex than any ANN.  ANNs, like people, learn by example. ANNs cannot be programmed to perform a specific task.  ANN is composed of a large number of highly interconnected processing elements (neurons) working simultaneously to solve specific problems.  The first artificial neuron was developed in 1943 by the neurophysiologist Warren McCulloch and the logician Walter Pits.

Hidden Markov Models (HMM)  HMM is a probabilistic process over a set of states, in which the states are “hidden”. It is only the outcome that visible to the observer. Hence, the name Hidden Markov Model.  HMM has many uses in genomics:  Gene prediction (GENSCAN)  SignalP  Finding periodic patterns  Used to answer questions like:  What is the probability of obtaining a particular outcome?  What is the best model from many combinations?

 Expasy server (http://au.expasy.org) is dedicated to the analysis of protein sequences and structures. The ExPASy (Expert Protein Analysis System)  Sequence analysis tools include:  DNA -> Protein [ Translate ]  Pattern and profile searches  Post-translational modification and topology prediction  Primary structure analysis  Structure prediction (2D and 3D)  Alignment

 PredictProtein: A service for sequence analysis, and structure prediction http://www.predictprotein.org/newwebsite/submit.html  TMpred: http://www.ch.embnet.org/software/TMPRED_form.html http://www.ch.embnet.org/software/TMPRED_form.html  TMHMM: Predicts transmembrane helices in proteins (CBS; Denmark) http://www.cbs.dtu.dk/services/TMHMM-2.0/  big-PI : Predicts GPI-anchor site : http://mendel.imp.univie.ac.at/sat/gpi/gpi_server.html http://mendel.imp.univie.ac.at/sat/gpi/gpi_server.html  DGPI: Predicts GPI-anchor site : http://129.194.185.165/dgpi/index_en.html http://129.194.185.165/dgpi/index_en.html  SignalP : Predicts signal peptide : http://www.cbs.dtu.dk/services/SignalP/ http://www.cbs.dtu.dk/services/SignalP/  PSORT: Predicts sub-cellular localization: http://www.psort.org/ http://www.psort.org/  TargetP: Predicts sub-cellular localization: http://www.cbs.dtu.dk/services/TargetP/ http://www.cbs.dtu.dk/services/TargetP/  NetNGlyc: Predicts N-glycosylation sites : http://www.cbs.dtu.dk/services/NetNGlyc/ http://www.cbs.dtu.dk/services/NetNGlyc/  PTS1: Predicts peroxisomal targeting sequences http://mendel.imp.univie.ac.at/mendeljsp/sat/pts1/PTS1predictor.jsp  MITOPROT: Predicts of mitochondrial targeting sequences http://ihg.gsf.de/ihg/mitoprot.html  Hydrophobicity : http://www.vivo.colostate.edu/molkit/hydropathy/index.htmlhttp://www.vivo.colostate.edu/molkit/hydropathy/index.html

http://www.cbs.dtu.dk/services/http://www.cbs.dtu.dk/services/: prediction server  NetNGlyc: Predicts N-glycosylation sites: http://www.cbs.dtu.dk/services/NetNGlyc/http://www.cbs.dtu.dk/services/NetNGlyc/  NetPhos: Predicts phosphorylation of residues: http://www.cbs.dtu.dk/services/NetPhos/http://www.cbs.dtu.dk/services/NetPhos/  NetPhosK: Predicts recognition sites for specific kinases: http://www.cbs.dtu.dk/services/NetPhosK/ http://www.cbs.dtu.dk/services/NetPhosK/  NetAcet: N-terminal acetylation in eukaryotic proteins: http://www.cbs.dtu.dk/services/NetAcet/ NetAcet http://www.cbs.dtu.dk/services/NetAcet/  NetCGlyc: C-mannosylation sites in mammalian proteins NetCGlyc

Multiple alignment  Used to do phylogenetic analysis:  Same protein from different species  Evolutionary relationship: history  Used to find conserved regions  Local multiple alignment reveals conserved regions  Conserved regions usually are key functional regions  These regions are prime targets for drug developments  Protein domains are often conserved across many species  Algorithm for search of conserved regions:  Block maker : http://blocks.fhcrc.org/blocks/make_blocks.html http://blocks.fhcrc.org/blocks/make_blocks.html

Multiple alignment tools  Free programs:  Phylip and PAUP : http://evolution.genetics.washington.edu/phylip.html http://evolution.genetics.washington.edu/phylip.html  Phyml : http://atgc.lirmm.fr/phyml/ http://atgc.lirmm.fr/phyml/  The most used websites :  http://align.genome.jp/ http://align.genome.jp/  http://prodes.toulouse.inra.fr/multalin/multalin.html http://prodes.toulouse.inra.fr/multalin/multalin.html  http://www.ch.embnet.org/index.html (T-COFFEE and ClustalW) http://www.ch.embnet.org/index.html  ClustalW:  Standard popular software  It aligns 2 and keep on adding a new sequence to the alignment  Problem: It is simply a heuristics.  Motif discovery: use your own motif to search databases :  PatternFind: http://myhits.isb-sib.ch/cgi-bin/pattern_searchhttp://myhits.isb-sib.ch/cgi-bin/pattern_search http://meme.nbcr.net/meme4_6_0/intro.html

Phylogenetic analysis  Phylogenetic trees  Describe evolutionary relationships between sequences  Major modes that drive the evolution:  Point mutations modify existing sequences  Duplications (re-use existing sequence)  Rearrangement  Two most common methods  Maximum parsimony  Maximum likelihood http://www.megasoftware.net/mega4/m_con_select.html  The most useful software:

Definitions  Homologous: Have a common ancestor. Homology cannot be measured.  Orthologous: The same gene in different species. It is the result of speciation (common ancestral)  Paralogous : Related genes (already diverged) in the same species. It is the result of genomic rearrangements or duplication

Determining protein Structure-Function  Direct measurement of structure  X-ray crystallography  NMR spectroscopy  Site-directed mutagenesis  Computer modeling  Prediction of structure  Comparative protein-structure modeling

Comparative protein-structure modeling  Goal: Construct 3-D model of a protein of unknown structure (target), based on similarity of sequence to proteins of known structure (templates) Blue : predicted model by PROSPECT Red : NMR structure  Procedure:  Template selection  Template–target alignment  Model building  Model evaluation

The Protein 3-D Database  The Protein DataBase (PDB) contains 3-D structural data for proteins  Founded in 1971 with a dozen structures  As of June 2004, there were 25,760 structures in the database. All structures are reviewed for accuracy and data uniformity.  Structural data from the PDB can be freely accessed at http://www.rcsb.org/pdb/  80% come from X-ray crystallography  16% come from NMR  2% come from theoretical modeling

High-throughput methods

Most used websites for 3-D structure prediction  Protein Homology/analogY Recognition Engine (Phyre) at http://www.sbg.bio.ic.ac.uk/phyre/html/index.html  PredictProtein at http://www.predictprotein.org/newwebsite/submit.html  UCLA Fold Recognition at http://www.doe-mbi.ucla.edu/Services/FOLD/

Commercial bioinformatics softwares CLC Genomics Workbench  Genomics: 454, Illumina Genome Analyzer and SOLiD sequencing data; De novo assembly of genomes of any size; Advanced visualization, scrolling, and zooming tools; SNP detection using advanced quality filtering;  Transcriptomics: RNA-seq including paired data and transcript-level expression; Small RNA analysis; Expression profiling by tags;  Epigenetics: Chromatin immunoprecipitation sequencing (ChIP-seq) analysis; Peak finding and peak refinement; Graph and table of background distribution; false discovery rate; Peak table and annotations;  VectorNTI: Sequence analysis and illustration; restriction mapping; recombinant molecule design and cloning; in silico gel electrophoresis; synthetic biology workflows  AlignX:  BioAnnotator:  ContigExpress :  GenomBench

The bioinformatics not covered in this class  Comparative genomics and Genome browser: http://genome.lbl.gov/vista/index.shtml http://www.sanger.ac.uk/resources/software/artemis/  Genome annotation: http://linux1.softberry.com/berry.phtml http:// http:// rast.nmpdr.org/  Metagenomics: http://metagenomics.anl.gov/  System biology tools.

Tools to analyze protein characteristics Protein sequence -Family member -Multiple alignments Identification of conserved regions Evolutionary relationship.

Similar presentations

Presentation on theme: "Tools to analyze protein characteristics Protein sequence -Family member -Multiple alignments Identification of conserved regions Evolutionary relationship."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Tools to analyze protein characteristics Protein sequence -Family member -Multiple alignments Identification of conserved regions Evolutionary relationship.

Similar presentations

Presentation on theme: "Tools to analyze protein characteristics Protein sequence -Family member -Multiple alignments Identification of conserved regions Evolutionary relationship."— Presentation transcript:

Similar presentations

About project

Feedback