Analysis Environments For Functional Genomics Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

Slides:



Advertisements
Similar presentations
NCBI/WHO PubMed/Hinari Course NCBI Literature Databases: PubMed Background.
Advertisements

Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Searching Patient Data: A Role for Librarians in the Improvement of Healthcare Margaret Henderson, MLIS, AHIP Tompkins-McCaw Library.
Gene Ontology John Pinney
NATIONAL LIBRARY OF MEDICINE The PubMed ID and Entrez, PubMed and PubMed Central Edwin Sequeira National Center for Biotechnology Information June 21,
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Bioinformatics Director Lecture University of Michigan Medical School February 7, 2000 Building Analysis Environments Beyond the Genome and the Web Bruce.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Michigan Life Sciences Corridor Bioinformatics, University of Michigan March 14, 2001 Building Analysis Environments Beyond the Genome and the Web Bruce.
Archives and Information Retrieval
1 Using Gene Ontology. 2 Assigning (or Hypothesizing About) Biological Meaning to Clusters What do you want to be able to to? –Identify over-represented.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Lecture 2.21 Retrieving Information: Using Entrez.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Class Projects. Future Work and Possible Project Topic in Gene Regulatory network Learning from multiple data sources; Learning causality in Motifs; Learning.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
B IOMEDICAL T EXT M INING AND ITS A PPLICATION IN C ANCER R ESEARCH Henry Ikediego
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
BeeSpace: An Interactive Environment for Analyzing Nature and Nurture in Societal Roles Bruce Schatz Institute for Genomic Biology University of Illinois.
Drosophila melanogaster
Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Bioinformatics Seminar Department of Computer Science, UIUC February 25, 2005 Analysis Environments For Functional Genomics Bruce R. Schatz CANIS Laboratory.
BeeSpace Informatics Research: From Information Access to Knowledge Discovery ChengXiang Zhai Nov. 7, 2007.
Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies Joyce A. Mitchell, Ph.D. National Library of Medicine University of Missouri.
University of Illinois at Urbana-Champaign INSTITUTE FOR GENOMIC BIOLOGY BeeSpace: An Interactive Environment for Functional Analysis of Social Behavior.
Outline Quick review of GS Current problems with GS Our solutions Future work Discussion …
International Conference on Digital Libraries November 16, 2000 Kyoto, Japan Digital Libraries of Community Knowledge: The Coming World of the Interspace.
IEEE Knowledge Media Networking KMN’02 Keynote Address, CRL, Kyoto Japan, July 11, 2002 Concept Switching in the Interspace: Networking Infrastructure.
Automatically Generating Gene Summaries from Biomedical Literature (To appear in Proceedings of PSB 2006) X. LING, J. JIANG, X. He, Q.~Z. MEI, C.~X. ZHAI,
CNI Spring Meeting April 26, 1999 Washington, DC THE NET OF THE 21st CENTURY: Concepts across the Interspace Bruce Schatz CANIS Laboratory Graduate School.
The Gene Ontology: a real-life ontology, progress and future. Jane Lomax EMBL-EBI.
Cell Signaling Ontology Takako Takai-Igarashi and Toshihisa Takagi Human Genome Center, Institute of Medical Science, University of Tokyo.
Department of Computer Science seminar University of Illinois, February 14, 2005 The Evolution of the Net: Predicting Global Infrastructure Bruce R. Schatz.
University of Illinois at Urbana-Champaign BeeSpace Navigator v4.0 and Gene Summarizer beespace.uiuc.edu `
BeeSpace: An Interactive Environment for Analyzing Nature and Nurture in Societal Roles Bruce Schatz Institute for Genomic Biology University of Illinois.
The Gene Ontology and its insertion into UMLS Jane Lomax.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
RESEARCH – DOING AND ANALYSING Gavin Coney Thomson Reuters May 2009.
BeeSpace Informatics Research: From Information Access to Knowledge Discovery ChengXiang Zhai Nov. 14, 2007.
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
CODE (Committee on Digital Environment) July 26, 2000 Rice University THE NET OF THE 21st CENTURY: Concepts across the Interspace Bruce Schatz CANIS Laboratory.
12/7/2015Page 1 Service-enabling Biomedical Research Enterprise Chapter 5 B. Ramamurthy.
Mining the Biomedical Research Literature Ken Baclawski.
Workshop on The Transformation of Science Max Planck Society, Elmau, Germany June 1, 1999 TOWARDS INFORMATIONAL SCIENCE Indexing and Analyzing the Knowledge.
Graduate School of Informatics Kyoto University, November 21, 2001 Technologies of the Interspace Peer-Peer Semantic Indexing Bruce Schatz CANIS Laboratory.
Bioinformatics and Computational Biology
Revolutionary System Models, The Net, & The Public Interest The Interspace Prototype ( ) Digital Libraries Initiative ( ) Worm Community.
Revolution & Kids: Building the Future of the Net & Understanding the Structures of the World Bruce R. Schatz CANIS - Community Systems Laboratory University.
BeeSpace Informatics: Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Annotating Gene List From Literature Xin He Department of Computer Science UIUC.
BeeSpace: An Interactive Environment for Functional Analysis of Social Behavior Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
University of Illinois at Urbana-Champaign. BeeSpace Project 5-year NSF-funded project Project Goals  Develop open bioinformatics resources  Support.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
Joined up ontologies: incorporating the Gene Ontology into the UMLS.
Graduate School of Informatics Kyoto University, November 14, 2001 Functions of the Interspace Infrastructure for Concept Spaces Bruce Schatz CANIS Laboratory.
Towards a unified MOD resource: An Overview
Annotating with GO: an overview
Biological Databases By: Komal Arora.
Applications of the Interspace Analysis for Community Repositories
Semantic Processing with Context Analysis
Mental Functioning and the Gene Ontology
Department of Genetics • Stanford University School of Medicine
Functional Annotation of the Horse Genome
Genomes and Their Evolution
Introduction to Bioinformatics
Presentation transcript:

Analysis Environments For Functional Genomics Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign Informatics Research First Annual BeeSpace Workshop June 6, 2005

What are Analysis Environments Functional Analysis Find the underlying Mechanisms Of Genes, Behaviors, Diseases Comparative Analysis Top-down data mining (vs Bottom-up) Multiple Sources especially literature

Building Analysis Environments Manual by Humans Interactionuser navigation Classificationcollection indexing Automatic by Computers Federationsearch bridges Integrationresults links

Needles and Haystacks Genes Honey Bees have 13K genes Perhaps 100 have known functions Paths Perhaps 30K protein families exist KEGG has 200 known pathways Statistical Clustering for Interactive Discovery Across Two Orders of Magnitude!

Trends in Analysis Environments Central versus Distributed Viewpoints The 90s Pre-Genome Entrez (NIH NCBI) versus WCS (NSF Arizona) The 00s Post-Genome GO (NIH curators) versus BeeSpace (NSF Illinois)

Pre-Genome Environments Focused on Syntax pre-Web WCS (Worm Community System) Search words across sources Follow links across sources Words automatic, Links manual Towards Uniform Searching

Post-Genome Environments Focused on Semantics post-Web BeeSpace (Honey Bee Inter Space) Navigate concepts across sources Integrate data across sources Concepts automatic, Links automatic Towards Question Answering

Worm Community System WCS Information: Literature BIOSIS, MEDLINE, newsletters, meetings Data Genes, Maps, Sequences, strains, cells WCS Functionality Browsingsearch, navigation Filteringselection, analysis Sharinglinking, publishing WCS: 250 users at 50 labs across Internet (1991)

WCS Molecular

WCS Cellular

WCS invokes gm

WCS vis-à-vis acedb

from Objects to Concepts from Syntax to Semantics Infrastructure is Interaction with Abstraction Internet is packet transmission across computers Interspace is concept navigation across repositories Towards the Interspace

THE THIRD WAVE OF NET EVOLUTION PACKETS OBJECTS CONCEPTS

Technology Engineering Electrical FORMAL INFORMAL (manual) (automatic) IEEE communities groups individuals LEVELS OF INDEXES

Navigation in MEDSPACE For a patient with Rheumatoid Arthritis Find a drug that reduces the pain (analgesic) but does not cause stomach (gastrointestinal) bleeding Choose Domain

Concept Search

Concept Navigation

Retrieve Document

Navigate Document

Post-Genome Informatics I Comparative Analysis within the Dry Lab of Biological Knowledge Classical Organisms have Genetic Descriptions. There will be NO more classical organisms beyond Mice and Men, Worms and Flies, Yeasts and Weeds. Must use comparative genomics on classical organisms Via sequence homologies and literature analysis.

Post-Genome Informatics II Functional Analysis within the Dry Lab of Biological Knowledge Automatic annotation of genes to standard classifications, e.g. Gene Ontology via homology on computed protein sequences. Automatic analysis of functions to scientific literature, e.g. concept spaces via text extractions. Thus must use functions in literature descriptions.

Conceptual Navigation in BeeSpace

BeeSpace Analysis Environment Build Concept Space of Biomedical Literature for Functional Analysis of Bee Genes -Partition Literature into Community Collections -Extract and Index Concepts within Collections -Navigate Concepts within Documents -Follow Links from Documents into Databases Locate Candidate Genes in Related Literatures then follow links into Genome Databases

Question Answering BehaviourOrganismGene Molecular Function Reference Foraging Rover vs sitter phenotypeDrosophila melanogasterforProtein kinase G8 Roamer vs dweller phenotypeC. elegansegl-4Protein kinase G16 Division of labour: age at onset of foraging Apis melliferaforProtein kinase G9 Division of labour: age at onset of foraging Apis melliferamlvMn transporter19 Division of labour: foraging-related?Apis melliferaperTranscription cofactor68 Division of labour: foraging-related?Apis melliferaache Acetylcholine esterase 69 Division of labour: foraging-related?Apis melliferaIP(3)KInositol signaling70 Foraging specialization: nectar vs. pollen Apis melliferapkcProtein kinase C71 Social feedingDrosophila melanogasterdpnf Neuropeptide Y (NPY) homolog 21 Social feeding (aggregation)C. elegansnpr-1Receptor for NPY22, 23

Functional Phrases encodes Sokolowski and colleagues demonstrated in Drosophila melanogaster that the foraging gene (for) encodes a cGMP dependent protein kinase (PKG). The dg2 gene encodes a cyclic guanosine monophosphate (cGMP)- dependent protein kinase (PKG). affects/causes Thus, PKG levels affected food-search behavior. cGMP treatment elevated PKG activity and caused foraging behavior. regulates Amfor, an ortholog of the Drosophila for gene, is involved in the regulation of age at onset of foraging in honey bees. This idea is supported by results for malvolio (mvl), which encodes a manganese transporter and is involved in regulating Drosophila feeding and age at onset of foraging in honey bees.

Data Integration (FlyBase Gene) D. melanogaster gene foraging, abbreviated as for, is reported here. It has also been known in FlyBase as BcDNA:GM08338, CG10033 and l(2) It encodes a product with cGMP-dependent protein kinase activity (EC: ) involved in protein amino acid phosphorylation which is a component of the cellular_component unknown. It has been sequenced and its amino acid sequence contains an eukaryotic protein kinase, a protein kinase C-terminal domain, a tyrosine kinase catalytic domain, a serine/Threonine protein kinase family active site, a cAMP- dependent protein kinase and a cGMP-dependent protein kinase. It has been mapped by recombination to 2-10 and cytologically to 24A2--4. It interacts genetically with Csr. There are 27 recorded alleles : 1 in vitro construct (not available from the public stock centers), 25 classical mutants ( 3 available from the public stock centers) and 1 wild-type. Mutations have been isolated which affect the larval nerve terminal and are behavioral, pupal recessive lethal, hyperactive, larval neurophysiology defective and larval neuroanatomy defective. for is discussed in 80 references (excluding sequence accessions), dated between 1988 and These include at least 6 studies of mutant phenotypes, 2 studies of wild-type function, 3 studies of natural polymorphisms and 7 molecular studies. Among findings on for function, for activity levels influence adult olfactory trap response to a food medium attractant. Among findings on for polymorphisms, the frequency of for R and for s strains in three natural populations are studied to determine the contribution of the local parasitoid community to the differences in for R and for s frequencies.cGMP-dependent protein kinase activity(EC: )protein amino acid phosphorylationcellular_component unknown sequencedamino acid sequenceeukaryotic protein kinaseprotein kinase C-terminal domaintyrosine kinase catalytic domainserine/Threonine protein kinase family active sitecAMP- dependent protein kinasecGMP-dependent protein kinase24A2--4allelesnerve terminalreferences

BeeSpace Information Sources Biomedical Literature - Medline (medicine) - Biosis (biology) - Agricola, CAB Abstracts, Agris (agriculture) Model Organisms (heredity) -Gene Descriptions (FlyBase, WormBase) Natural Histories (environment) -BeeKeeping Books (Cornell, Harvard)

Medical Concept Spaces (1998) Medical Literature (Medline, 10M abstracts) Partition with Medical Subject Headings (MeSH) Community is all abstracts classified by core term 40M abstracts containing 280M concepts computation is 2 days on NCSA Origin 2000 Simulating World of Medical Communities 10K repositories with > 1K abstracts (1K with > 10K)

Biological Concept Spaces (2006) Compute concept spaces for All of Biology BioSpace across entire biomedical literature 50M abstracts across 50K repositories Use Gene Ontology to partition literature into biological communities for functional analysis GO same scale as MeSH but adequate coverage? GO light on social behavior (biological process)

Concept Switching In the Interspace… each Community maintains its own repository Switching is navigating Across repositories use your specialty vocabulary to search another specialty

CONCEPT SWITCHING “Concept” versus “Term” set of “semantically” equivalent terms Concept switching region to region (set to set) match term Semantic region Concept Space

Biomedical Session

Categories and Concepts

Concept Switching

Document Retrieval

Interactive Functional Analysis BeeSpace will enable users to navigate a uniform space of diverse databases and literature sources for hypothesis development and testing, with a software system beyond a searchable database, using literature analyses to discover functional relationships between genes and behavior. Genes to Behaviors Behaviors to Genes Concepts to Concepts Clusters to Clusters Navigation across Sources

BeeSpace Information Sources General for All Spaces: Scientific Literature -Medline, Biosis, Agricola, Agris, CAB Abstracts -partitioned by organisms and by functions Model Organisms -Gene Descriptions (FlyBase, WormBase, MGI, OMIM, SCD, TAIR) Special Sources for BeeSpace: -Natural History Books (Cornell Library, Harvard Press)

XSpace Information Sources Organize Genome Databases (XBase) Compute Gene Descriptions from Model Organisms Partition Scientific Literature for Organism X Compute XSpace using Semantic Indexing Boost the Functional Analysis from Special Sources Collecting Useful Data about Natural Histories e.g. CowSpace Leverage in AIPL Databases

Towards the Interspace The Analysis Environment technology is GENERAL ! BirdSpace? BeeSpace? PigSpace? CowSpace? BehaviorSpace? BrainSpace? BioSpace … Interspace

Prototype System Overall Architecture and Interface -- Todd Littell Language Parsing and Entity Recognition – Jing Jiang Normalization and Theme Clustering – Qiaozhu Mei Concept Navigation and Switching – Azadeh Shakery Gene Summarization and Linking – Xu Ling Collection Development and Navigation – Xin He Specialty Systems Question Answering – Eugene Grois Annotation Pipeline – Pouya Kheradpour