Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
A centre of expertise in digital information management Tools for the Trade? Supporting Multidisciplinary Research Dr Liz Lyon, Director.
Distributed search for complex heterogeneous media Werner Bailer, José-Manuel López-Cobo, Guillermo Álvaro, Georg Thallinger Search Computing Workshop.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
Standardizing Metadata Associated with NIAID Genome Sequencing Center Projects Richard H. Scheuermann, Ph.D. Department of Pathology Division of Biomedical.
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
Overview of Biomedical Informatics Rakesh Nagarajan.
Turning Biologists into Bioinformaticists – A Practical Approach Charlie Whittaker Bioinformatics and Computing Core Facility David H. Koch Institute for.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
1 CIS607, Fall 2006 Semantic Information Integration Instructor: Dejing Dou Week 10 (Nov. 29)
1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui.
Class Projects. Future Work and Possible Project Topic in Gene Regulatory network Learning from multiple data sources; Learning causality in Motifs; Learning.
Algorithms in Computational Biology Tanya Berger-Wolf Compbio.cs.uic.edu/~tanya/teaching/CompBio January 13, 2006.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering Jing Zhao University of Southern California Sep 19 th,
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
SCIENCE-DRIVEN INFORMATICS FOR PCORI PPRN Kristen Anton UNC Chapel Hill/ White River Computing Dan Crichton White River Computing February 3, 2014.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
Tae-Hyung Kim 1 Gil-Mi Ryu 1,2 InSong Koh 2 Jong Park 3 1.
Parallel Processing CS453 Lecture 2.  The role of parallelism in accelerating computing speeds has been recognized for several decades.  Its role in.
Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies Joyce A. Mitchell, Ph.D. National Library of Medicine University of Missouri.
Bioinformatics and medicine: Are we meeting the challenge?
Databases, Ontologies and Text mining Session Introduction Part 1 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Phillip.
Bioinformatics: Theory and Practice – Striking a Balance (a plea for teaching, as well as doing, Bioinformatics) Practice (Molecular Biology) Theory: Central.
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
EMBL-EBI EMBL-EBI EMBL-EBI What is the EBI's particular niche? Provides Core Biomolecular Resources in Europe –Nucleotide; genome, protein sequences,
Bioinformatics Core Facility Guglielmo Roma January 2011.
Protein Information Resource Protein Information Resource, 3300 Whitehaven St., Georgetown University, Washington, DC Contact
Proof of concept study of the Socio-Ecological Research and Observation oNTOlogy (SERONTO) for integrating multiple ecological databases. Introduction.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in.
Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.
Data Integration and Management A PDB Perspective.
Association of variations in I kappa B-epsilon with Graves' disease using classical and my Grid methodologies Peter Li School of Computing Science University.
10/24/09CK The Open Ontology Repository Initiative: Requirements and Research Challenges Ken Baclawski Todd Schneider.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
ACGT: Open Grid Services for Improving Medical Knowledge Discovery Stelios G. Sfakianakis, FORTH.
Bioinformatics and Comparative Genome Analyses Course Course web page: EMBO Bioinformatics and Comparative.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Bioinformatics lectures at Rice University Li Zhang Lecture 11: Networks and integrative genomic analysis-3 Genomic data
Mining the Biomedical Research Literature Ken Baclawski.
A collaborative tool for sequence annotation. Contact:
Opportunities for Text Mining in Bioinformatics (CS591-CXZ Text Data Mining Seminar) Dec. 8, 2004 ChengXiang Zhai Department of Computer Science University.
Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
Module 5: Future 1 Canadian Bioinformatics Workshops
Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.
High throughput biology data management and data intensive computing drivers George Michaels.
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
Presenter: Bradley Green.  What is Bioinformatics?  Brief History of Bioinformatics  Development  Computer Science and Bioinformatics  Current Applications.
Genomic Medicine Grid Juan Pedro Sánchez Merino Instituto de Salud Carlos III
National Cancer Institute Uma Mudunuri ABCC, NCI-Frederick ISRCE Monthly Meeting, Nov 9th 2010 bioDBnet The biological DataBase network.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
Databases, Ontologies and Text mining Session Introduction Part 2
Development of the Amphibian Anatomical Ontology
Data challenges in the pharmaceutical industry
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Functional Annotation of the Horse Genome
Data Warehousing and Data Mining
Presentation transcript:

Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip Bourne, SDSC/UCSD, USA

UniProt The Gene Ontology Ontologies Databases Applications and Mining Bioinformatics LocusLink Text mining Knowledge mining Resources in Bioinformatics

UniProt Databases Bioinformatics LocusLink Resources in Bioinformatics

What perspective do I bring?

Preface A review of the state and needs of the field from the perspective of a user of biological databases…. … the p53 core domain structure consists of a ß sandwich that serves as a scaffold for two large loops and a loop-sheet- helix motif Science Vol.265, p346 1TSR Corresponding structure from the PDB ? Oops! ß sandwich? Where? Large loop? Which one?? Loop-sheet-helix???

Preface A review of the state and needs of the field from the perspective of a developer of biological databases….

What are the current biological databases and what does this tell us?

Large Growth in the Number of Biological Databases

Resources are Becoming More Diverse NAR 2004 – Division by Resource Type

NAR 2004 – A Closer Look Genome scale databases have proliferated Traditional sequence databases are now a small part Databases around new specific data types are emerging Pathway and disease orientated databases are emerging

The Future - ISMB04 Poster Distribution ISMB04

What Does ISMB04 Tell Us About New Biological Databases? Microarray data resources are hot Genotypic – phenotypic resources are emerging Surprisingly pathway resources are not growing fast Disease and species based resources are increasing – notably plants Human genome related resources are increasing

What About Data in These Databases?

Data are Becoming More Plentiful and More Complex

Note: Redundancy at 30% Sequence Identity Data are Becoming More Redundant

So the amount and complexity of data are increasing across biological scales – what are the challenges?

A Major Challenge 12:00 We suffer from the “high noon syndrome” Those who can gain and contribute most to biological databases are frequently NOT the users We need to lower the cost:benefit ratio

How Do We Lower this Barrier? Better support of complex data types e.g., networks, images, graphs Associated optimized query languages Associated ontologies Better handling of uncertainty and inconsistency More and automated data curation Large scale data integration

How Do We Lower this Barrier? Better support of complex data types e.g., networks, images, graphs Associated optimized query languages Associated ontologies Better handling of uncertainty and inconsistency More and automated data curation Large scale data integration

How Do We Lower this Barrier? Support of data provenance Support for rapid data and associated schema evolution Support for temporal data Better integration of data and methods Usability engineering

How Do We Lower this Barrier? Support of data provenance Support for rapid data and associated schema evolution Support for temporal data Better integration of data and methods Usability engineering We need more work in these other areas

A Note on Data Provenance

Further Reading Jagadish and Olken (2003) Omics 7(1) Data Management for Life Sciences Research Maojo and Kulikowski (2003) J. of AMIA Bioinformatics and Medical Informatics – Collaborations on the Road to Genomic Medicine?

GeneXPress: A Visualization and Statistical Analysis Tool for Gene Expression and Sequence Data Segal, Kaushal, Yelensky, Pham, Regev, Koller, Friedman Data Query & Analysis Biological Results Curation Usability Integration Assign biological meaning to gene expression data through post- processing and visualization

Filtering Erroneous Protein Annotation Wieser, Kretschmann and Apweiler Data Query & Analysis Biological Results Curation Usability Integration Automated detection of annotation errors using a decision tree approach based upon the C4.5 data mining algorithm

Selecting Biomedical Data Sources According to User Preferences Cohen-Boulakia, Lair, Stransky, Graziani, Radvanyi, Barillot and Froidevaux Data Query & Analysis Biological Results Curation Usability Integration Understand the characteristics of biological data Present a selection of resources relevant to a user query Framework for the multiple parametric analysis of cancer

Integration of Biological Data from Web Resources: Management of Multiple Answers through Metadata Retrieval Devignes, Smail Data Query & Analysis Biological Results Curation Usability Integration Same question – different answers from different resources – How can this be understood? Semantic integration based on domain ontologies

Critically-based Task Composition in Distributed Bioinformatics Systems Karasavvas, Baldock, Burger Data Query & Analysis Biological Results Curation Usability Integration Task composition in workflow systems requires decision support Provision of data providing providence information provides that support

ENJOY !!