Labeling and Enhancing Life Science Links S. Heymann*, F. Naumann*, L. Raschid +, P. Rieger * * Humboldt Universität zu Berlin + University of Maryland.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Bioinformatics growth curves Medline records Computer power DNA sequences 3-D structures.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
On line (DNA and amino acid) Sequence Information Lecture 7.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
Ontology annotation: mapping genomic regions biological function Paul D Thomas, Huaiyu Mi and Suzanna Lewis.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
OntoBlog: Informal Knowledge Management by Semantic Blogging Aman Shakya 1, Vilas Wuwongse 2, Hideaki Takeda 1, Ikki Ohmukai 1 1 National Institute of.
Interoperation of Molecular Biology Databases Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International Menlo Park, CA
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Archives and Information Retrieval
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Protein Databases EBI – European Bioinformatics Institute
August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.
The Protein Data Bank (PDB)
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Toward Making Online Biological Data Machine Understandable Cui Tao Data Extraction Research Group Department of Computer Science, Brigham Young University,
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
1 Ontology Generation Based on a User-Specified Ontology Seed Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University.
An Introduction to Bioinformatics Molecular Biology Databases.
1 iProLINK: An integrated protein resource for literature mining and literature-based curation 1. Bibliography mapping - UniProt mapped citations 2. Annotation.
Edinburgh,UKBNCOD21 Heterogeneous Association Rules Mining Badr Al-Daihani School of Computer Science Cardiff University.
On line (DNA and amino acid) Sequence Information
Bioinformatics Timothy Ketcham Union College Gradutate Seminar 2003 Bioinformatics.
Automated Explanation of Gene-Gene Relationships Wacek Kuśnierczyk.
Bioinformatics for biomedicine
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski Hugh Nicholas
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
Grant Number: IIS Institution of PI: Arizona State University PIs: Zoé Lacroix Title: Collaborative Research: Semantic Map of Biological Data.
Information Systems: Databases Define the role of general information systems Describe the elements of a database management system (DBMS) Describe the.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
3231 Software Engineering By Germaine Cheung Hong Kong Computer Institute Lecture 12.
Open Biomedical Ontologies. Open Biomedical Ontologies (OBO) An umbrella project for grouping different ontologies in biological/medical field –a repository.
Enhancing the Semantics of Links and Paths in Life Sciences Data Sources Louiqa Raschid University of Maryland Collaborators: Felix Naumann, S. Heymann,
Biological Databases By : Lim Yun Ping E mail :
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
SRI International Bioinformatics 1 Recent Developments in Pathway Tools GMOD Workshop November ‘07 Suzanne Paley Bioinformatics Research Group SRI International.
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
IL Step 3: Using Bibliographic Databases Information Literacy 1.
University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.
Protein Information Resource Protein Information Resource, 3300 Whitehaven St., Georgetown University, Washington, DC Contact
Using Several Ontologies for Describing Audio-Visual Documents: A Case Study in the Medical Domain Sunday 29 th of May, 2005 Antoine Isaac 1 & Raphaël.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Structural Models Lecture 11. Structural Models: Introduction Structural models display relationships among entities and have a variety of uses, such.
Exploring and Exploiting the Biological Maze Zoé Lacroix Arizona State University.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
RDF based on Integration of Pathway Database and Gene Ontology SNU OOPSLA LAB DongHyuk Im.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Entrez, dbSNP, GEO, OMIM & LinkOut JanPlan Entrez Distributed by NCBI in 1991 on CD-ROM Included linked nodes: GenBank & PDB Translated GenBank,
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
MATLAB Distributed, and Other Toolboxes
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
Department of Genetics • Stanford University School of Medicine
Functional Annotation of the Horse Genome
PIR: Protein Information Resource
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Labeling and Enhancing Life Science Links S. Heymann*, F. Naumann*, L. Raschid +, P. Rieger * * Humboldt Universität zu Berlin + University of Maryland Existing Life Science LinksWhy Enrich Links?How to Enrich Links We propose to enrich the current link implementation, so as to support more meaningful queries over enh-links. Enrichment should include semantic labels descriptors (matching an appropriate ontology), and a more precise identification of the link's source and target elements (within a data entry). One can then traverse paths and perform a comparison of paths that is meaningful to the biologist. An abundance of Web-accessible life sciences data sources contain data about scientific entities such as genes, sequences, proteins and citations. The sources are diverse in content and computational capability, they are richly interconnected to each other, and they have varying levels of overlap. The scientific exploration of relationships between objects involves the traversal of links and paths (concatenation of links). Existing links are poor with respect to both syntax and semantics. Links are syntactically poor since the origin and the target of the link are specified only at the level of the database entry or object. Links are semantically poor since they carry no explicit meaning. Acknowledgements: This research is partially supported by NSF Grants IIS and EIA (LR), and by DFG Grant FR1142/1-3 (SH). References: (1) T. Etzold, A. Ulyanov, P. Argos: SRS: Information Retrieval System for Molecular Biology Data Banks. Methods in Enzymology 266: , (2) S. Heymann, K. Tham, A. Kilian, G. Wegner, P. Rieger, D. Merkel, J.C. Freytag: Viator - A Tool Family for Graphical Networking and Data View Creation 28th International Conference on Very Large Data Bases 2002 Hong Kong, Proceedings pp Locuslink UniProt OMIM BRENDA LCOMPOUND Is translated to Has disease-related mutations Is an enzyme Is a substrate Locuslink UniProt OMIM BRENDA LCOMPOUND Existing Links: No Link Labels Links Enhanced with Link Labels enh-links: Links enhanced with Label, Origin of Link and Target of Link. Model and query language for Labeled Life Science Links  A data model to capture enriched link semantics will include: LT: A set of link types LS: A set of published links implemented in the sources LL: Pairs of link types that represent a meaninful link concatenation LE: Link and path equivalencies  Tools that support semi-automatic annotation and enrichment of existing links.  A query language for a scientist to exploit LT, LS and LL in expressing navigational queries.  A scientist friendly interface To specify properties LS, LL and LE. To rank the paths that satisfy some query. Links are added for various reasons: Represents the result of an experiment protocol to test a hypothesis. Data curators may add links following domain specific conventions. A link may have been predicted by some software. Biologists can usually infer the meaning of a link but search engines and mediators cannot. Current links cannot capture or differentiate these desirable properties. Mapping from logical classes/categories to physical Web accessible collections Primary repository of sequences>>GenBank, EMBL, DDBJ Annotated genome data>>ENSEMBL Hand curated protein sequences>>UniProt (=SwissProt  PIR) Hand curated hereditary diseases>>OMIM Frames of reference>>GO, Taxonomy, HSAGENES...>>... Four NCBI data sources (red arrows) being nodes in a staedily growing convolute of coarsely cross-referenced primary and secondary Life Science data compilations (1). Interactive Navigation Aid: GeneViator Tool - available upon request (2).