T. Hamp & L. Richter Protein Prediction II Exercise
T. Hamp & L. Richter Exercise – Project Layout General remarks – recap: Report 60pts, Exam 40 pts, weekly presentations of each group, one bad presentation allowed, groups of 3-4 students Contact & Questions: The exercise is taken from the CAFA competition Prediction of HPO terms HPO: Human phenotype ontology 2
T. Hamp & L. Richter Terms – Definitions and Explanations Amino acids (aa): Building blocks for proteins, 20 different aa are found in proteins Protein sequence: String of characters representing a sequence of amino acids (string from a 20 letter alphabet) The protein sequence defines the protein structure and the protein function (within some limits) Proteins sequences are stored in large publicly available repositories One of the most well known repositories is UniProt ( and its section Swiss-Prothttp:// Besides the sequence these databases hold additional information about the protein, too 3
T. Hamp & L. Richter Ontology (in information science) Ontology: An ontology represents knowledge as a set of concepts within a domain, using a shard vocabulary to denote types, properties and interrelationships of those concepts Human Phenotype ontology (HPO): Set of concepts describing human appearing (shape, health, a.s.f.) HPO concepts are hierarchically ordered, i.e. there is a “is-a” relation ship. they are arranged in a tree-like fashion 4
T. Hamp & L. Richter Our competition Proteins are annotated (described) with experimentally determined information As time goes by: Proteins are associated with information about experimentally confirmed effects on the human phenotype The associated term are taken form the Human Phenotype ontology Experimental determination is slow and expensive => we try to predict associated HPO terms for the yet un-annotated 5
T. Hamp & L. Richter More formal steps Find a function that assigns a set of HPO terms T to a sequence s so that the number of false assignment is minimal and the number of true assignments is maximal Remember: The true evaluation is done after submission when so far not annotated sequences get experimentally determined annotations 6
T. Hamp & L. Richter Tasks Download files from Get familiar with the provided files Especially the column names (look for at Uniprot and HPO) Read: dja.pdf 7