Presentation is loading. Please wait.

Presentation is loading. Please wait.

T. Hamp & L. Richter Protein Prediction II Exercise.

Similar presentations


Presentation on theme: "T. Hamp & L. Richter Protein Prediction II Exercise."— Presentation transcript:

1 T. Hamp & L. Richter Protein Prediction II Exercise

2 T. Hamp & L. Richter Exercise – Project Layout  General remarks – recap: Report 60pts, Exam 40 pts, weekly presentations of each group, one bad presentation allowed, groups of 3-4 students  Contact & Questions: pp2ex@rostlab.org only!pp2ex@rostlab.org  The exercise is taken from the CAFA competition  Prediction of HPO terms  HPO: Human phenotype ontology 2

3 T. Hamp & L. Richter Terms – Definitions and Explanations  Amino acids (aa): Building blocks for proteins, 20 different aa are found in proteins  Protein sequence: String of characters representing a sequence of amino acids (string from a 20 letter alphabet)  The protein sequence defines the protein structure and the protein function (within some limits)  Proteins sequences are stored in large publicly available repositories  One of the most well known repositories is UniProt (http://www.uniprot.org/) and its section Swiss-Prothttp://www.uniprot.org/  Besides the sequence these databases hold additional information about the protein, too 3

4 T. Hamp & L. Richter Ontology (in information science)  Ontology: An ontology represents knowledge as a set of concepts within a domain, using a shard vocabulary to denote types, properties and interrelationships of those concepts  Human Phenotype ontology (HPO): Set of concepts describing human appearing (shape, health, a.s.f.)  HPO concepts are hierarchically ordered, i.e. there is a “is-a” relation ship.  they are arranged in a tree-like fashion 4

5 T. Hamp & L. Richter Our competition  Proteins are annotated (described) with experimentally determined information  As time goes by: Proteins are associated with information about experimentally confirmed effects on the human phenotype  The associated term are taken form the Human Phenotype ontology  Experimental determination is slow and expensive  => we try to predict associated HPO terms for the yet un-annotated 5

6 T. Hamp & L. Richter More formal steps  Find a function that assigns a set of HPO terms T to a sequence s so that the number of false assignment is minimal and the number of true assignments is maximal  Remember: The true evaluation is done after submission when so far not annotated sequences get experimentally determined annotations 6

7 T. Hamp & L. Richter Tasks  Download files from www.rostlab.org/~richter/pp2_files.tgz  Get familiar with the provided files  Especially the column names (look for at Uniprot and HPO)  Read: http://biofunctionprediction.org/sites/default/files/IntroductionCAFA_pe dja.pdf 7


Download ppt "T. Hamp & L. Richter Protein Prediction II Exercise."

Similar presentations


Ads by Google