T. Hamp & L. Richter Protein Prediction II Exercise.

Slides:



Advertisements
Similar presentations
ICDL Software Applications - Database Concepts. Unit 6 Data and Data Representation Database Concepts –File Structure –Relationships Database Design –Data.
Advertisements

Prep Year Curriculum What will my child learn?.
Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.
BLAST Sequence alignment, E-value & Extreme value distribution.
Lecture 2 Introduction to C Programming
Introduction to C Programming
Gibbs sampling for motif finding in biological sequences Christopher Sheldahl.
CSE115: Introduction to Computer Science I Dr. Carl Alphonce 219 Bell Hall Office hours: M-F 11:00-11:
COG and GO tutorial.
 2007 Pearson Education, Inc. All rights reserved Introduction to C Programming.
Sequence alignment, E-value & Extreme value distribution
Mutations Section 12–4 This section describes and compares gene mutations and chromosomal mutations.
Claire O’Donovan EMBL-EBI. In UniProtKB, we aim to provide… o A high quality protein sequence database A non redundant protein database, with maximal.
Comparing protein structure and sequence similarities Sumi Singh Sp 2015.
Finding prokaryotic genes and non intronic eukaryotic genes
Cardinality of Sets Section 2.5.
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Computational Biology BS123A/MB223 UC-Irvine Ray Luo, MBB, BS.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Bruno Oluka Tel: Technical Director, Ubunifu Systems Microsoft Access Database Lecture 1 – Introduction To Microsoft.
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S Primary Supervisor: Prof. Heiko Schroder.
Fission Yeast Computing Workshop -1- Searching, querying, browsing downloading and analysing data using PomBase Basic PomBase Features Gene Page Overview.
Constraint Satisfaction Problems (CSPs) CPSC 322 – CSP 1 Poole & Mackworth textbook: Sections § Lecturer: Alan Mackworth September 28, 2012.
G040 – Lecture 02 Audience, Purpose and Language Mr C Johnston ICT Teacher
DNA alphabet DNA is the principal constituent of the genome. It may be regarded as a complex set of instructions for creating an organism. Four different.
Piecewise Defined Functions Lesson 2.3. How to Write a Weird Function What if we had to write a single function to describe this graph … We must learn.
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. Chapter 2 Chapter 2 - Introduction to C Programming.
Slide 1 Propositional Definite Clause Logic: Syntax, Semantics and Bottom-up Proofs Jim Little UBC CS 322 – CSP October 20, 2014.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
MA/CSSE 474 Theory of Computation Decision Problems DFSMs.
Lecture 5 Regular Expressions CSCI – 1900 Mathematics for Computer Science Fall 2014 Bill Pine.
Copyright OpenHelix. No use or reproduction without express written consent1.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Condor: BLAST Rob Quick Open Science Grid Indiana University.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
PROTEIN STRUCTURE SIMILARITY CALCULATION AND VISUALIZATION CMPS 561-FALL 2014 SUMI SINGH SXS5729.
Earth Science I How to succeed in this class. A Technique for Reading 1.Survey headings and topic sentences 2.Turn each heading into a question 3.Read.
Identification of Helix-Turn-Helix (HTH) DNA-Binding Motifs
Statistical Testing with Genes Saurabh Sinha CS 466.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. 1 Chapter 2 - Introduction to C Programming Outline.
Copyright OpenHelix. No use or reproduction without express written consent1.
Programming Languages Meeting 4 September 16/17, 2014.
 2007 Pearson Education, Inc. All rights reserved. A Simple C Program 1 /* ************************************************* *** Program: hello_world.
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
1 Lecture 2 - Introduction to C Programming Outline 2.1Introduction 2.2A Simple C Program: Printing a Line of Text 2.3Another Simple C Program: Adding.
` Comparison of Gene Ontology Term Annotations Between E.coli K12 Databases REDDYSAILAJA MARPURI WESTERN KENTUCKY UNIVERSITY.
Peptide de novo sequencing Peptide de novo sequencing is the analytical process that derives a peptide’s amino acid sequence from its tandem mass spectrum.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
Biological Databases By: Komal Arora.
Knowledge Representation Part I Ontology
Chapter 2 - Introduction to C Programming
Challenges in Creating an Automated Protein Structure Metaserver
Statistical Testing with Genes
Chapter 2 - Introduction to C Programming
CS 240 – Lecture 11 Pseudocode.
IM 300Competitive Success/tutorialrank.com
The making of proteins for …..
Binary Files.
Week 4: Making Proteins. An Overview
Section 11.1 Sequences and Series
What will my child learn?
Table 1. Occurrence of N-X-S/T motives in tryptic peptides1
Introduction Reading: Sections 1.5 – 1.7.
Applying principles of computer science in a biological context
Millennium High School Agenda Calendar
Part II SeqViewer AraCyc Help
Statistical Testing with Genes
Lab 3 – BLAST – Directed It’s a BLAST! (too easy?)
Sequence alignment, E-value & Extreme value distribution
Presentation transcript:

T. Hamp & L. Richter Protein Prediction II Exercise

T. Hamp & L. Richter Exercise – Project Layout  General remarks – recap: Report 60pts, Exam 40 pts, weekly presentations of each group, one bad presentation allowed, groups of 3-4 students  Contact & Questions:  The exercise is taken from the CAFA competition  Prediction of HPO terms  HPO: Human phenotype ontology 2

T. Hamp & L. Richter Terms – Definitions and Explanations  Amino acids (aa): Building blocks for proteins, 20 different aa are found in proteins  Protein sequence: String of characters representing a sequence of amino acids (string from a 20 letter alphabet)  The protein sequence defines the protein structure and the protein function (within some limits)  Proteins sequences are stored in large publicly available repositories  One of the most well known repositories is UniProt ( and its section Swiss-Prothttp://  Besides the sequence these databases hold additional information about the protein, too 3

T. Hamp & L. Richter Ontology (in information science)  Ontology: An ontology represents knowledge as a set of concepts within a domain, using a shard vocabulary to denote types, properties and interrelationships of those concepts  Human Phenotype ontology (HPO): Set of concepts describing human appearing (shape, health, a.s.f.)  HPO concepts are hierarchically ordered, i.e. there is a “is-a” relation ship.  they are arranged in a tree-like fashion 4

T. Hamp & L. Richter Our competition  Proteins are annotated (described) with experimentally determined information  As time goes by: Proteins are associated with information about experimentally confirmed effects on the human phenotype  The associated term are taken form the Human Phenotype ontology  Experimental determination is slow and expensive  => we try to predict associated HPO terms for the yet un-annotated 5

T. Hamp & L. Richter More formal steps  Find a function that assigns a set of HPO terms T to a sequence s so that the number of false assignment is minimal and the number of true assignments is maximal  Remember: The true evaluation is done after submission when so far not annotated sequences get experimentally determined annotations 6

T. Hamp & L. Richter Tasks  Download files from  Get familiar with the provided files  Especially the column names (look for at Uniprot and HPO)  Read: dja.pdf 7