Presentation is loading. Please wait.

Presentation is loading. Please wait.

EECS 730 Introduction to Bioinformatics Function Luke Huan Electrical Engineering and Computer Science

Similar presentations


Presentation on theme: "EECS 730 Introduction to Bioinformatics Function Luke Huan Electrical Engineering and Computer Science"— Presentation transcript:

1 EECS 730 Introduction to Bioinformatics Function Luke Huan Electrical Engineering and Computer Science http://people.eecs.ku.edu/~jhuan/

2 2015-9-4EECS 7302 Overview Gene ontology Challenges What is gene ontology construct gene ontology Text mining, natural language processing and information extraction: An Introduction Summary

3 2015-9-4EECS 7303 Ontology A systematic account of Existence. (From philosophy) An explicit formal specification of how to represent the objects, concepts and other entities that are assumed to exist in some area of interest and the relationships that hold among them. The hierarchical structuring of knowledge about things by subcategorising them according to their essential (or at least relevant and/or cognitive) qualities. This is an extension of the previous senses of "ontology" (above) which has become common in discussions about the difficulty of maintaining subject indices. The philosophy of indexing everything in existence?

4 2015-9-4EECS 7304 Aristotele’s (384-322 BC) Ontology Substance plants, animals,... Quality Quantity Relation Where When Position Having Action Passion

5 2015-9-4EECS 7305 Ontology and -informatics In information sciences, ontology is better defined as: “a domain of knowledge, represented by facts and their logical connections, that can be understood by a computer”. (J. Bard, BioEssays, 2003) “Ontologies provide controlled, consistent vocabularies to describe concepts and relationships, thereby enabling knowledge sharing” (Gruber, 1993)

6 2015-9-4EECS 7306 Information Exchange in Bio- sciences Basic challenges: Definition, definition, definition What is a name? What is a function?

7 2015-9-4EECS 7307 Cell

8 2015-9-4EECS 7308 Cell

9 2015-9-4EECS 7309 Cell

10 2015-9-4EECS 73010 Cell

11 2015-9-4EECS 73011 Cell Image from http://microscopy.fsu.edu

12 2015-9-4EECS 73012 What’s in a name? The same name can be used to describe different concepts

13 2015-9-4EECS 73013 What’s in a name? Glucose synthesis Glucose biosynthesis Glucose formation Glucose anabolism Gluconeogenesis All refer to the process of making glucose from simpler components

14 2015-9-4EECS 73014 What’s in a name? The same name can be used to describe different concepts A concept can be described using different names  Comparison is difficult – in particular across species or across databases

15 2015-9-4EECS 73015 Function (what) Process (why) Drive nail (into wood) Carpentry Drive stake (into soil) Gardening Smash roach Pest Control Clown’s juggling object Entertainment What is Function? The Hammer Example

16 2015-9-4EECS 73016 Information Explosion

17 2015-9-4EECS 73017 Entering the Genome Sequencing Era Eukaryotic Genome Sequences YearGenome# Genes Size (Mb) Yeast ( S. cerevisiae )1996 12 6,000 Worm ( C. elegans )1998 97 19,100 Fly ( D. melanogaster )2000 120 13,600 Plant ( A. thaliana )2001 125 25,500 Human ( H. sapiens, 1st Draft )2001 ~3000~35,000

18 2015-9-4EECS 73018 A Common Language for Annotation of Genes from Yeast, Flies and Mice What is the Gene Ontology? …and Plants and Worms …and Humans …and anything else!

19 2015-9-4EECS 73019 http://www.geneontology.org/

20 2015-9-4EECS 73020 What is the Gene Ontology? Gene annotation system Controlled vocabulary that can be applied to all organisms Organism independent Used to describe gene products proteins and RNA - in any organism

21 2015-9-4EECS 73021 Molecular Function = elemental activity/task the tasks performed by individual gene products; examples are carbohydrate binding and ATPase activity Biological Process = biological goal or objective broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions Cellular Component = location or complex subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and RNA polymerase II holoenzyme The 3 Gene Ontologies

22 2015-9-4EECS 73022 Cellular Component where a gene product acts

23 2015-9-4EECS 73023 Cellular Component

24 2015-9-4EECS 73024 Cellular Component

25 2015-9-4EECS 73025 Cellular Component Enzyme complexes in the component ontology refer to places, not activities.

26 2015-9-4EECS 73026 Molecular Function insulin binding insulin receptor activity

27 2015-9-4EECS 73027 Molecular Function activities or “ jobs ” of a gene product glucose-6-phosphate isomerase activity

28 2015-9-4EECS 73028 Molecular Function A gene product may have several functions; a function term refers to a single reaction or activity, not a gene product. Sets of functions make up a biological process.

29 2015-9-4EECS 73029 Biological Process a commonly recognized series of events cell division

30 2015-9-4EECS 73030 Biological Process transcription

31 2015-9-4EECS 73031 Biological Process Metabolism: degradation or synthesis of biomelecules

32 2015-9-4EECS 73032 Biological Process Development: how a group of cell become a tissue

33 2015-9-4EECS 73033 Biological Process social behavior

34 2015-9-4EECS 73034 Ontology applications Can be used to: Formalise the representation of biological knowledge Standardise database submissions Provide unified access to information through ontology-based querying of databases, both human and computational Improve management and integration of data within databases. Facilitate data mining

35 2015-9-4EECS 73035 Gene Ontology Structure Ontologies can be represented as directed acyclic graphs (DAG), where the nodes are connected by edges Nodes = terms in biology Edges = relationships between the terms is-a part-of

36 2015-9-4EECS 73036 Parent-Child Relationships Chromosome Cytoplasmic chromosome Mitochondrial chromosome Plastid chromosome Nuclear chromosome A child is a subset or instances of a parent’s elements

37 2015-9-4EECS 73037 Parent-Child Relationships cell membrane chloroplast mitochondrial chloroplast membrane is-a part-of

38 2015-9-4EECS 73038 Annotation in GO A gene product is usually a protein but can be a functional RNA An annotation is a piece of information associated with a gene product A GO annotation is a Gene Ontology term associated with a gene product

39 2015-9-4EECS 73039 Terms, Definitions, IDs Term: MAPKKK cascade (mating sensu Saccharomyces) Goid: GO:0007244 Definition: OBSOLETE. MAPKKK cascade involved in transduction of mating pheromone signal, as described in Saccharomyces. Evidence code: how annotation is done Definition_reference: PMID:9561267

40 2015-9-4EECS 73040 Annotation Example GO Term Gene Product nek2 centrosome GO:0005813 Reference PMID: 11956323 Evidence Code IDA Inferred from Direct Assay

41 2015-9-4EECS 73041 GO Annotation

42 2015-9-4EECS 73042 GO Annotation

43 2015-9-4EECS 73043 GO Annotation

44 2015-9-4EECS 73044 Evidence Code Indicate the type of evidence in the cited source that supports the association between the gene product and the GO term http://www.geneontology.org/GO.evidence.html

45 2015-9-4EECS 73045 Types of evidence codes Types of evidence code Experimental codes - IDA, IMP, IGI, IPI, IEP Computational codes - ISS, IEA, RCA, IGC Author statement - TAS, NAS Other codes - IC, ND Two types of annotation  Manual Annotation  Electronic Annotation

46 2015-9-4EECS 73046 Beyond GO – Open Biomedical Ontologies Orthogonal to existing ontologies to facilitate combinatorial approaches Share unique identifier space Include definitions

47 2015-9-4EECS 73047 Gene Ontology and Text Mining Derive ontology from text data More general goal: understand text data automatically

48 2015-9-4EECS 73048 Finding GO terms In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response… Process: response to wounding GO:0009611 Function: protein serine/threonine kinase activity GO:0004674 Component: integral to plasma membrane GO:0005887 …for B. napus PERK1 protein (Q9ARH1) PubMed ID: 12374299

49 2015-9-4EECS 73049 Mining Text Data Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Loanee: Frank Rizzo Lender: MWF Agency: Lake View Amount: $200,000 Term: 15 years ) Frank Rizzo bought his home from Lake View Real Estate in 1992. He paid $200,000 under a15-year loan from MW Financial. Frank Rizzo Bought this home from Lake View Real Estate In 1992.... Loans($200K,[map],...) (Taken from ChengXiang Zhai, CS 397cxz, UIUC, CS – Fall 2003)

50 2015-9-4EECS 73050 Bag-of-Tokens Approaches Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or … nation – 5 civil - 1 war – 2 men – 2 died – 4 people – 5 Liberty – 1 God – 1 … Feature Extraction Loses all order-specific information! Severely limits context! Documents Token Sets

51 2015-9-4EECS 73051 Natural Language Processing A dog is chasing a boy on the playground DetNounAuxVerbDetNounPrepDetNoun Noun Phrase Complex Verb Noun Phrase Prep Phrase Verb Phrase Sentence Dog(d1). Boy(b1). Playground(p1). Chasing(d1,b1,p1). Semantic analysis Lexical analysis (part-of-speech tagging) Syntactic analysis (Parsing) A person saying this may be reminding another person to get the dog back… Pragmatic analysis (speech act) Scared(x) if Chasing(_,x,_). + Scared(b1) Inference

52 2015-9-4EECS 73052 General NLP — Too Difficult! Word-level ambiguity “design” can be a noun or a verb (Ambiguous POS) “root” has multiple meanings (Ambiguous sense) Syntactic ambiguity “natural language processing” (Modification) “A man saw a boy with a telescope.” (PP Attachment) Anaphora resolution “John persuaded Bill to buy a TV for himself.” (himself = John or Bill?) Presupposition “He has quit smoking.” implies that he smoked before. Humans rely on context to interpret (when possible). This context may extend beyond a given document!

53 2015-9-4EECS 73053 Reference for GO Gene ontology teaching resources: http://www.geneontology.org/GO.teaching.resources.s html

54 2015-9-4EECS 73054 References for Text Ming 1.C. D. Manning and H. Schutze, “Foundations of Natural Language Processing”, MIT Press, 1999. 2.S. Russell and P. Norvig, “Artificial Intelligence: A Modern Approach”, Prentice Hall, 1995. 3.S. Chakrabarti, “Mining the Web: Statistical Analysis of Hypertext and Semi- Structured Data”, Morgan Kaufmann, 2002. 4.G. Miller, R. Beckwith, C. FellBaum, D. Gross, K. Miller, and R. Tengi. Five papers on WordNet. Princeton University, August 1993. 5.C. Zhai, Introduction to NLP, Lecture Notes for CS 397cxz, UIUC, Fall 2003. 6.M. Hearst, Untangling Text Data Mining, ACL’99, invited paper. http://www.sims.berkeley.edu/~hearst/papers/acl99/acl99-tdm.html http://www.sims.berkeley.edu/~hearst/papers/acl99/acl99-tdm.html 7.R. Sproat, Introduction to Computational Linguistics, LING 306, UIUC, Fall 2003. 8.A Road Map to Text Mining and Web Mining, University of Texas resource page. http://www.cs.utexas.edu/users/pebronia/text-mining/http://www.cs.utexas.edu/users/pebronia/text-mining/ 9.Computational Linguistics and Text Mining Group, IBM Research, http://www.research.ibm.com/dssgrp/ http://www.research.ibm.com/dssgrp/

55 2015-9-4EECS 73055 Acknowledge Some slides are taken from http://www.tulane.edu/~wiser/cells/. http://www.tulane.edu/~wiser/cells/


Download ppt "EECS 730 Introduction to Bioinformatics Function Luke Huan Electrical Engineering and Computer Science"

Similar presentations


Ads by Google