An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California.

Slides:

Advertisements

Similar presentations

DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.

Advertisements

CVRG Presenter Disclosure Information Tahsin Kurc, PhD Center for Comprehensive Informatics Emory University CardioVascular Research Grid Core Infrastructure.

CHOICE Pathology Informatics 2010 Boston, Massachusetts DataReady ® : A Deployable Data Management and Integration System for Large-scale Cancer Repositories.

Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.

The Experience Factory May 2004 Leonardo Vaccaro.

Building on the BIRN Workshop BIRN Systems Architecture Overview Philip Papadopoulos – BIRN CC, Systems Architect.

1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.

Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.

Peoplesoft: Building and Consuming Web Services

User Group 2015 Version 5 Features & Infrastructure Enhancements.

Secure Data Management in BIRN Karl Helmer Massachusetts General Hospital September 6, 2011 For the BIRN Consortium.

Submitted by: Madeeha Khalid Sana Nisar Ambreen Tabassum.

Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.

Database System Development Lifecycle © Pearson Education Limited 1995, 2005.

SCIENCE-DRIVEN INFORMATICS FOR PCORI PPRN Kristen Anton UNC Chapel Hill/ White River Computing Dan Crichton White River Computing February 3, 2014.

Trimble Connected Community

PROJECT NAME: DHS Watch List Integration (WLI) Information Sharing Environment (ISE) MANAGER: Michael Borden PHONE: (703) extension 105.

Copyright 2002 Prentice-Hall, Inc. Chapter 1 The Systems Development Environment 1.1 Modern Systems Analysis and Design.

1 Copyright © 2004, Oracle. All rights reserved. Introduction to Oracle Forms Developer and Oracle Forms Services.

Communication & Web Presence David Eichmann, Heather Davis, Brian Finley & Jennifer Laskowski Background: Due to its inherently complex and interdisciplinary.

Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.

Uniting Cultures, Technology & Applications A Case Study University of New Hampshire.

A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.

Information Integration José Luis Ambite, Ph.D. Project Leader, Information Sciences Institute Research Assistant Professor, Computer Science University.

BIRN Update Carl Kesselman Professor of Industrial and Systems Engineering Information Sciences Institute Fellow Viterbi School of Engineering University.

Integrated Querying Across Disparate Data Sources José Luis Ambite & Gully APC Burns Information Sciences Institute University of Southern California.

Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.

Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion.

Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK

Taverna and my Grid Open Workflow for Life Sciences Tom Oinn

PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.

Crux flexible, structured data reporting for funding agencies.

Data Management BIRN supports data intensive activities including: – Imaging, Microscopy, Genomics, Time Series, Analytics and more… BIRN utilities scale:

OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.

1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.

Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,

GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.

© Paradigm Publishing Inc. 5-1 Chapter 5 Application Software.

Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.

NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.

The Biomedical Informatics Research Network Carl Kesselman BIRN Principal Investigator Professor of Industrial and Systems Engineering Information Sciences.

GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.

Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,

Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.

OWL Representing Information Using the Web Ontology Language.

BIRN Knowledge Engineering Working Group Chair: Gully APC Burns.

Knowledge Engineering Start with the question: “What is an ‘atom’ of scientific knowledge?”

Knowledge Engineering “Knowledge Engineering is an engineering discipline that involves integrating knowledge into computer systems in order to solve complex.

NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.

Chapter 4 Automated Tools for Systems Development Modern Systems Analysis and Design Third Edition 4.1.

Earth System Curator and Model Metadata Discovery and Display for CMIP5 Sylvia Murphy and Cecelia Deluca (NOAA/CIRES) Hannah Wilcox (NCAR/CISL) Metafor.

ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.

Cyberinfrastructure Overview of Demos Townsville, AU 28 – 31 March 2006 CREON/GLEON.

Viewpoint Modeling and Model-Based Media Generation for Systems Engineers Automatic View and Document Generation for Scalable Model- Based Engineering.

All Hands Meeting 2005 BIRN-CC: Building, Maintaining and Maturing a National Information Infrastructure to Enable and Advance Biomedical Research.

The Virtual Observatory and Ecological Informatics System (VOEIS): Using RESTful architecture and an extensible data model to provide a unique data management.

Tony Pan, Stephen Langella, Shannon Hastings, Scott Oster, Ashish Sharma, Metin Gurcan, Tahsin Kurc, Joel Saltz Department of Biomedical Informatics The.

XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.

International Planetary Data Alliance Registry Project Update September 16, 2011.

BIRN: Where We Have Been, Where We are Going. Carl Kesselman BIRN Principal Investigator Professor of Industrial and Systems Engineering Information Sciences.

Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.

Tools and Services Workshop

Modern Systems Analysis and Design Third Edition

Joslynn Lee – Data Science Educator

Modern Systems Analysis and Design Third Edition

An ecosystem of contributions

Modern Systems Analysis and Design Third Edition

Project Information Management Jiwei Ma

SDMX IT Tools SDMX Registry

Presentation transcript:

An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

This Presentation The BIRN Vision Project History Communities Specific Capabilities – Data Management – BIRN Pathology – Information Integration – BIRN Mediator – Knowledge Engineering – ‘KEfED’

The Purpose of BIRN “to advance biomedical research through data sharing and online collaboration”

Challenges … Opportunities What is the main challenge we face? Not just size of datasets … Not just communication bandwidth … The challenge is heterogeneity. In 2010, NIH funded 31,232 RO1 grants How do we provide practical data sharing and collaboration solutions at that scale?

Use Cases and Capabilities Focus on the things our users want Develop Capabilities – i.e., our ability to: – Publish data / Deploy tools – Rapidly develop new lightweight systems tailored for specialized use cases – Reconcile data across disparate sources – Provide security (credentials, encryption) – Develop and enable scientific reasoning systems

Background and History Funded by National Center for Research Resources in Three neuroimaging testbeds initially provided use cases for developed technology. Reorganized in 2008 to provide software-based solutions for data sharing in the biosciences. No longer focused on neuroimaging (or even imaging). One can no longer ‘log into BIRN’. New model of providing modular capabilities that users can fit into existing toolset.

Sites and Collaborators

Technical Approach Bottom up, not top down Focus on user requirements - what they want to do Create solutions - factor out common requirements – Capability model includes software and process – Avoid “Big Design up Front” (BDUF)

Capability Model Software and services are only useful in bioinformatics if they do things that scientists need. – What is the problem? – How do the tools address the problem? BIRN’s capabilities are defined in terms of problems and solutions, not in terms of software and services. – This is true from beginning (definition and documentation) to end (quality assurance and assessment).

Dissemination Models Different problems require different kinds of solutions. – BIRN operates services for all users; e.g., user registration service – BIRN provides kits for project teams to deploy services for their members; e.g., data sharing – BIRN provides downloadable tools for individuals to use on their own; e.g., pathology databases, integration systems, reasoning systems, workflows. Understanding deployment needs is part of defining the problem to be solved.

Working Groups New capabilities are defined, designed, and disseminated by BIRN Working Groups – Operations – Security – Data Management – Information Integration – Knowledge Engineering – Workflow Tools – Genomics

Projects are vetted by BIRN Steering Committee for appropriateness and resource planning Collaborators are expected to be actively involved in development and deployment. Collaborators are not expected to use complete BIRN capability kit but should pick and choose to suit their own needs. Working with BIRN

Some of our Communities … FBIRN – Schizophrenic neuroimaging, fMRI data Nonhuman Primates Research Consortium – Colony management, pathology, genetics Cardiovascular Research Grid Kinetics Foundation – Parkinson’s disease therapeutics … and others

Data Management BIRN supports data intensive activities including: – Imaging, Microscopy, Genomics, Time Series, Analytics and more… BIRN utilities scale: – TB data sets – 100 MB – 2 GB Files – Millions of Files

Data Service Deployment

BIRN PATHOLOGY Project Overview

Project History The BIRN Pathology project began in early 2010, with the objective of creating a new shared resource for the research community – an online collection of pathology data, including very high resolution microscopy images. The initial user group consists of pathologists, with ISI developing the infrastructure. Early phases of the project involved rapid prototyping and iteration with the user representatives to build out the core data model, user workflows and user experience, and investigation of microscopy tools. In January 2011: packaged (RPM), documented (wiki), released (YUM repo) the version 1.0 of the system. Ongoing enhancements to the data model and major work on microscopy image support still underway, as well as refactoring of the system into reusable BIRN components.

BIRN Pathology Technology ‘Stack’ LAMP Database Development Virtual Microscopy Search Application (e.g., Pathology) Data Integration Security File Transfer Data Management “Capabilities” Data Management “Capabilities” Other BIRN “Capabilities” 3 rd Party Tools Application Specific key

Database Development In a nutshell: Django + Extensions Features Django Web Framework MVC, ORM, HTML Templates Auto-generated data forms! RESTful interface routing Spreadsheet batch importer Crowd integration (users/groups/SSO) Data object access control (ACLs) Javascript UI widgets Scheduled database backup System Requirements Linux, Apache, MySQL or Postgres, Python, Django Use Cases Scientists need to create custom web-accessible databases to share data within and between research groups. Scientists store data on spreadsheets, need to manage this data (DBMS), and expose it programmatically (WS REST)

Virtual Microscopy Image Processing Tools Accepts images in numerous proprietary microscopy formats (Bio-Formats) Creates pyramidal, tiled image sets in open format (JPEG) used by Zoomify image viewer Extracts proprietary graphical annotation formats and converts them to Zoomify annotation format Image Server + Database (Built on our Database Development tools) Monitors an image upload directory, automatically processes images, and updates the image database Stores and serves annotations according to associated ACLs Use Cases Scientists need to share high resolution (100X) images over relatively low bandwidth networks. Scientists need to annotate and share annotations on images.

Search Features Single-site, full-text indexing of databases and text files. Search results returned as Python objects based on Django ORM mappings. ‘Advanced search’; i.e., multiple fields, and logical expressions. Easy to synchronize index with changes to Django-based object model. Technology Based on Xapian and Djapian Use Case Scientists need to search their databases using full- text queries and field- based “advanced” search queries.

Pathology Application Features Pathology Data Model Subject, Case, Specimen, Tissue, Image entities Species, Disease, Etiology domain tables Pathology data entry workflows UI forms tailored to the data entry workflow of pathologists. Microscopy support for proprietary formats: Aperio SVS (supported) Hammamatsu NDPI (partially supported) Olympus VSI (under development) And, of course, all of the features from the underlying software and services Database Development + Virtual Microscopy + Search Data Integration using BIRN Mediator Capability to run federated queries across sites Use Cases Pathologists create images in proprietary microscopy formats, annotate the images, and need to upload them to a web-accessible database for sharing among collaborating sites. Pathologists associate metadata and supplementary files with their microscopy images.

Screenshots

Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS, XML, HTML, Excel, XML, SOAP Use Schema & Data – Source Modeling & Record Linkage Infrastructure capabilities – Security, Efficient Query Execution, SQL-like syntax across multiple sources. Decision SupportApplication Programs Mediator Knowledge Bases DatabasesComputer Programs The Web

Information Mediator Virtual Integration Architecture: – Virtual organization: community of data providers and consumers that want to share data for a specific purpose – Autonomous sources: data, control remains at sources; no change to access methods, schemas; data accessed real-time in response to user queries – Mediator: integrator defines domain schema and describes source contents Domain schema: agreed upon view of the domain preferred by the virtual organization Source descriptions: logical formulas relating source and domain schemas

BIRN MEDIATOR Project Overview

Project History The BIRN Mediator project began in … The initial user group consists of … Early phases of the project involved … In : Ongoing enhancements to …

Information Mediator Query Answering – User writes query in domain schema – Mediator: Determines sources relevant to user query Rewrites query in sources schemas Breaks query into sub-queries for sources Optimizes query evaluation plan Combines answers from sources – Efficient query evaluation Streaming dataflow

31 BIRN Grid-based Virtual Data Integration Architecture Relational DB Ex: HID XML DB Ex: eXist-db Web Portal BIRN Gateway OGSA-DQP/DAI Internet Internal to Organization Internet Internal to Organization Grid Security Infrastructure (TLS + PKI) OGSA-DAI ISI-Mediator Web Service Ex: XNAT Logical Source Descriptions Reconcile Semantics/ Query Rewriting Client Program Query Optimization/ Execution Source Wrappers Security: Encryption, Authentication, Authorization,

The Information Mediator User Queries / Web Portal / Services Security Information Integration ‘Capabilities’ Information Integration ‘Capabilities’ Other BIRN ‘Capabilities’ Other BIRN ‘Capabilities’ Application Specific key Execution Engine Optimizer Reformulation Wrapper Logical Source Descriptions Data Sources

Database Consolidation Construct a ‘virtual organization’ A community of data providers and consumers sharing data for common specific purpose All sources are autonomous data, control remains at sources no change to access methods, schemas; data accessed real-time in response to user queries Work consists of modeling domain schema and source contents Domain schema = agreed upon view of the domain preferred by the virtual organization Source descriptions = logical formulas relating source and domain schemas Implemented in multiple domains: fMRI : Ashish, et al. (2010) “Neuroscience Data Integration through Mediation: An (F)BIRN Case Study.” Front. Neuroinf. 4:118 Cardiovascular Research Grid Non-Human Primate Research Consortium Child Neurodevelopmental Disorders Use Cases Scientists from different groups want to query across two databases with different schema Databases may be completely different (i.e., one group uses Excel spreadsheets, another uses Filemaker Pro and a third uses Oracle)

Database Extension Reapply the mediator technology to sources from different subjects e.g., link genetics data to imaging data. Not dependent on a universal, global ontology, but a locally- defined model specific for the application Develop models using Datalog that are then processed by the BIRN Mediator, and OGSA-DAI / OGSA-DQP technology. Use Cases Scientists want to bring together data from different sources into a a single, common domain model Sources will require linkage at the level of schema and data

EZ-Mediator This is not a particular difficult task to manage by hand but automating it is even simpler within our framework Small improvements can lead to large gains! Based on the same mediator technology as previously described Use Cases Can we, with the absolute minimum of effort, run queries across separate databases that have the same schema?

Automatic Interface Generation How does this work? Use Cases The mediator technology generates a user interface for a specific task automatically

Screenshots

Knowledge Engineering Start with the question: “What is an ‘atom’ of scientific knowledge?”

Scientific assertions as ‘Computable, citable elements’ There are very large number of statements like ‘mice like cheese’ – semantics at this level are complicated! For example: – “Novel neurotrophic factor CDNF protects midbrain dopamine neurons in vivo” [Lindholm et al 2007] – “Hippocampo-hypothalamic connections: origin in subicular cortex, not ammon's horn.” [Swanson & Cowan 1975] – “Intravenous 2-deoxy-D-glucose injection rapidly elevates levels of the phosphorylated forms of p44/42 mitogen-activated protein kinases (extracellularly regulated kinases 1/2) in rat hypothalamic parvicellular paraventricular neurons.” [Khan & Watts 2004] Assertions vary in their levels of reliability, specificity. Can we introduce a generalized formalism that could support automated reasoning?

Cycles of Scientific Investigation (‘CoSI’)

e.g., ‘CDNF protects nigral dopaminergic neurons in-vivo’ This statistically- significant effect is the experimental basis for the findings of this study. Our ontology engineering approach is based on experimental variables from Lindholm, P. et al. (2007), Nature, 448(7149): p. 73-7

Knowledge Engineering from Experimental Design (‘KEfED’) Khan et al. (2007), J. Neurosci. 27: [expt 2]

KNOWLEDGE ENGINEERING FROM EXPERIMENTAL DESIGN ‘KEfED’ Project Overview

Project History The KEfED formalism has been under formulation since 2006 and received it’s first active funding in It has been initially developed in a demonstration project based on neural connectivity and has been developed for the Michael J Fox and Kinetics Foundations for Parkinson’s research. The initial user group consists of laboratory-based neuroanatomists and neuroendocrinologists. Early phases of the project involved development of initial prototypes to capture the design of a well-understood experimental design and to generate a knowledge base for experimental data from that design. We have developed numerous prototypes but have deployed a working system from the website in March, Ongoing enhancements to the system include (a) ontology support, (b) the representation of statistical relations and correlations, (c) coordination with the data management and information integration working groups.

BioScholar Application Develop a knowledge base framework for observations and interpretations from experiments. Scientists manually curate data by hand from publications into generic database driven by KEfED model Can reuse designs for multiple experiments Design process is intuitive, can build a database without informatics training Ideal for non-computational biologists. Java / Flex Web application, one click install Use Cases Scientists want to develop a generic knowledge base driven from a corpus of PDF files stored locally within a specific laboratory

Crux Application Scientists within a disease foundation must plot a whole research program How to keep track of hypotheses, experimental results and outcomes to plan the next phase of the project? System is just about to start year 2 of funding geared towards curation of raw data (not from publications). System can permit scientists with no computational expertise develop scientific data repositories. Project driven by the Kinetics Foundation (with initial funding from M.J. Fox Foundation). Use Cases Decision makers at a disease foundation want to store raw data generated a generic knowledge base driven from a corpus of PDF files stored locally within a specific laboratory

Screenshots

Summary BIRN provides capability kits and consulting to facilitate internal and external data sharing and other aspects of collaboration. BIRN takes a bottom-up, modular, capability- based approach that is driven by end user needs. End users are viewed as collaborators and are actively involved in development process.

BIRN Program Announcements Provides funds to work with BIRN on projects relating to data sharing (PAR ) and the associated ontology (PAR ) for the data. Mechanism to fund the leveraging of BIRN capabilities BIRN provides assistance in framing proposals and developing projects.

Contact Information General: – Project Manager: – Joe Ames: Outreach: – Karl Helmer: Web: – –

Acknowledgements Executive Team – Carl Kesselman – Joe Ames – Karl Helmer Data Management – Ann Chervenak – Rob Schuler – David Smith – Laura Pearlman Information Integration – Jose Luis Ambite – Gowri Gumaraguruparan – Maria Muslea – Naveen Ashish Knowledge Engineering – Jessica Turner – Tom Russ Security – Rachana Ananthakrishnan Communities – NonHuman Primate Research Consortium – FBIRN – CVRG – Mouse BIRN – Mouse Genome Informatics – Brain Architecture USC And Many Many More…