PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)

Slides:



Advertisements
Similar presentations
An Introduction to Life
Advertisements

Weighing Evidence in the Absence of a Gold Standard Phil Long Genome Institute of Singapore (joint work with K.R.K. “Krish” Murthy, Vinsensius Vega, Nir.
Web Resources for Bioinformatics Vadim Alexandrov and Mark Gerstein.
Classification. Taxonomy Science of grouping organisms according to their presumed natural relationships Artificial May change with new evidence.
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Pfam(Protein families )
Unit 1: DNA and the Genome Key area 8: Genomic sequencing.
PDB-Protein Data Bank SCOP –Protein structure classification CATH –Protein structure classification genTHREADER–3D structure prediction Swiss-Model–3D.
Archives and Information Retrieval
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
The Protein Data Bank (PDB)
ProteinStructuralDatabases. Proteins are built from amino-acids. Introduction H | NH2-c-CO2H | R.
CHAPTER 25 TRACING PHYLOGENY. I. PHYLOGENY AND SYSTEMATICS A.TAXONOMY EMPLOYS A HIERARCHICAL SYSTEM OF CLASSIFICATION  SYSTEMATICS, THE STUDY OF BIOLOGICAL.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
PDB-Protein Data Bank SCOP –Protein structure classification CATH –Protein structure classification genTHREADER–3D structure prediction Swiss-Model–3D.
Protein Structure Prediction II
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Phylogeny and the Tree of Life
Comparing protein structure and sequence similarities Sumi Singh Sp 2015.
Classification Organizing the Diversity of Life. Why do we classify things? – Supermarket aisles – Libraries – Classes – Teams/sports – Members of a family.
The Science of Life Biology unifies much of natural science
Chapter 1 Invitation to Biology Hsueh-Fen Juan 阮雪芬 Sep. 11, 2012.
Protein Tertiary Structure Prediction
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Condor: BLAST Monday, July 19 th, 3:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Classification Chapter 8. Learning Outcomes By the end of this week, you should:  recognise the value of identification and scientific naming (nomenclature).
Condor: BLAST Rob Quick Open Science Grid Indiana University.
PROTEIN STRUCTURE SIMILARITY CALCULATION AND VISUALIZATION CMPS 561-FALL 2014 SUMI SINGH SXS5729.
Protein motif extraction with neuro-fuzzy optimization Bill C. H. Chang and Author : Bill C. H. Chang and Saman K. Halgamuge Saman K. Halgamuge Adviser.
DDPIn Distance and Density Based Protein Indexing David Hoksza Charles University in Prague Department of Software Engineering Czech Republic.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
Classification Chapter 18.
Condor: BLAST Monday, 3:30pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
5.4 Cladistics Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. The images above are both.
AQA Biology AS Chapter 15 – Evidence for relationships between organisms.
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
InterPro Sandra Orchard.
Ubiquitination Sites Prediction Dah Mee Ko Advisor: Dr.Predrag Radivojac School of Informatics Indiana University May 22, 2009.
Starter: Group the TV Shows Friends Neighbours X factor Big Brother Doctor Who Lost ER House Sponge Bob Squarepants Star Trek The Simpsons Futurama Eastenders.
Exit Ticket Review 1. What is the best way to determine the evolutionary relationships between species? A. by comparing their bone structures B. by comparing.
Protein Evolution Introducing the use of Biology Workbench as a Bioinformatics Tool.
5.4 Cladistics The images above are both cladograms. They show the statistical similarities between species based on their DNA/RNA. The cladogram on the.
Classification. Taxonomy Science of grouping organisms according to their presumed natural relationships Artificial May change with new evidence.
Taxonomy & Phylogeny. B-5.6 Summarize ways that scientists use data from a variety of sources to investigate and critically analyze aspects of evolutionary.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
The process of evolution drives the diversity and unity of life
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
Lecture 1 Human Biology.
The Major Lineages of Life
There are four levels of structure in proteins
Evidence and Phylogenetic trees
5 kingdoms.
Prediction of protein function from sequence analysis
Classification and binomial naming
Classification of Organisms
5.4 Cladistics Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. The images above are both.
Applying principles of computer science in a biological context
Unit Genomic sequencing
Condor: BLAST Tuesday, Dec 7th, 10:45am
Presentation transcript:

PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)

Levels of Protein Structure 2

Thanks to: Frank Lloyd Wright for graphics WOOD, BRICK etc. material/building blocksAMINO ACIDS ***************************************************************** Number of Amino Acids found in Eukaryotic Proteins = 20 (found in universal genetic code)+ 2 (synthetically incorporated therefore not included in discussion) Possible number of protein sequence of size 300 = This number is greater than the total number of atoms in the universe 3

Evolution Evolution has selected a very small subset of those protein sequences < 30,000 in humans and an even smaller number of protein structures (1000–5000) Ratio– 1:6 Conserved structures are expected to reflect functional similarities (interaction with other molecules) 4

SequenceStructureFunction Why Compare Protein Structures?  Low sequence similarity may yield very similar structures  Sometimes high sequence similarity yields different structures 5

FOR THE CURRENT PROJECT Know your dataset

PDB: Protein Data Bank The Protein Data Bank (PDB) archive is the single worldwide repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids. These are the molecules of life that are found in all organisms including bacteria, yeast, plants, flies, other animals, and humans. Understanding the shape of a molecule helps to understand how it works. This knowledge can be used to help deduce a structure's role in human health and disease, and in drug development. The structures in the archive range from tiny proteins and bits of DNA to complex molecular machines like the ribosome. Web address:

SCOP Dataset “Nearly all proteins have structural similarities with other proteins and, in some of these cases, share a common evolutionary origin. The SCOP database, created by manual inspection and abetted by a battery of automated methods, aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known. As such, it provides a broad survey of all known protein folds, detailed information about the close relatives of any particular protein, and a framework for future research and classification.”

Starting at the bottom, the hierarchy of SCOP domains comprises the following levels -- Species representing a distinct protein sequence. -- Protein grouping together similar sequences of essentially the same functions. -- Family containing proteins with similar sequences but typically distinct functions. -- Superfamily bridging together protein families with common functional and structural features inferred to be from a common evolutionary ancestor. -- Levels above Superfamily are classified based on structual features and similarity, and do not imply homology:Folds grouping structurally similar superfamilies.

Structural Fingerprints/Features Structure comparison is an NP-Hard problem. There are no fast structural alignment algorithms that can guarantee optimality within any given similarity measure. Therefore, existing structure comparison methods employ heuristics. There are different approaches for extracting structural features. We use Triangular Spatial Relationship to generate keys.

FOR THE CURRENT PROJECT HUH!! LOOKS LIKE I HAVE DONE EVERYTHING.. SO WHY ARE WHY AM I HERE?

What do you get from me? A file of Keys Created Representing each Protein Structure. Each of these files of keys representing protein has been correctly classified into their respective Superfamilies. That will give you the class information for the files. It is a hypothesis that each file belonging to same class must have similar keys. You must be able to test this hypothesis.

Biggest Challenge

For the current project Develop SIGNATUREs for the PROTEIN KEYS. These SIGNATUREs must be used to CLASSIFY the proteins correctly into their respective SUPERFAMILIES. Performance and Speed are important

Signature for Keys Accurately/concisely represent the keys. Signatures can be simple statistics like mean, median etc. of the keys or a complex combination of features. What ever may be the choice of Signature/s, it/they must be able to perform extremely fast and accurate classification of the protein/s.

Choice of Classifier/Tool Criteria: 1. ACCURACY 2. SPEED

Final Product A software that takes in “keys” as input and classifies it correctly. There must be a check if the “new” protein-keys already exists in the system.