An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California.

An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

This Presentation The BIRN Vision Project History Communities Specific Capabilities – Data Management – BIRN Pathology – Information Integration – BIRN Mediator – Knowledge Engineering – ‘KEfED’ http://www.birncommunity.org

The Purpose of BIRN “to advance biomedical research through data sharing and online collaboration”

Challenges … Opportunities What is the main challenge we face? Not just size of datasets … Not just communication bandwidth … The challenge is heterogeneity. In 2010, NIH funded 31,232 RO1 grants How do we provide practical data sharing and collaboration solutions at that scale?

Use Cases and Capabilities Focus on the things our users want Develop Capabilities – i.e., our ability to: – Publish data / Deploy tools – Rapidly develop new lightweight systems tailored for specialized use cases – Reconcile data across disparate sources – Provide security (credentials, encryption) – Develop and enable scientific reasoning systems

Background and History Funded by National Center for Research Resources in 2001. Three neuroimaging testbeds initially provided use cases for developed technology. Reorganized in 2008 to provide software-based solutions for data sharing in the biosciences. No longer focused on neuroimaging (or even imaging). One can no longer ‘log into BIRN’. New model of providing modular capabilities that users can fit into existing toolset.

Sites and Collaborators

Technical Approach Bottom up, not top down Focus on user requirements - what they want to do Create solutions - factor out common requirements – Capability model includes software and process – Avoid “Big Design up Front” (BDUF)

Capability Model Software and services are only useful in bioinformatics if they do things that scientists need. – What is the problem? – How do the tools address the problem? BIRN’s capabilities are defined in terms of problems and solutions, not in terms of software and services. – This is true from beginning (definition and documentation) to end (quality assurance and assessment).

Dissemination Models Different problems require different kinds of solutions. – BIRN operates services for all users; e.g., user registration service – BIRN provides kits for project teams to deploy services for their members; e.g., data sharing – BIRN provides downloadable tools for individuals to use on their own; e.g., pathology databases, integration systems, reasoning systems, workflows. Understanding deployment needs is part of defining the problem to be solved.

Working Groups New capabilities are defined, designed, and disseminated by BIRN Working Groups – Operations – Security – Data Management – Information Integration – Knowledge Engineering – Workflow Tools – Genomics

Projects are vetted by BIRN Steering Committee for appropriateness and resource planning Collaborators are expected to be actively involved in development and deployment. Collaborators are not expected to use complete BIRN capability kit but should pick and choose to suit their own needs. Working with BIRN

Some of our Communities … FBIRN – Schizophrenic neuroimaging, fMRI data Nonhuman Primates Research Consortium – Colony management, pathology, genetics Cardiovascular Research Grid Kinetics Foundation – Parkinson’s disease therapeutics … and others

Data Management BIRN supports data intensive activities including: – Imaging, Microscopy, Genomics, Time Series, Analytics and more… BIRN utilities scale: – TB data sets – 100 MB – 2 GB Files – Millions of Files

Data Service Deployment

BIRN PATHOLOGY Project Overview

Project History The BIRN Pathology project began in early 2010, with the objective of creating a new shared resource for the research community – an online collection of pathology data, including very high resolution microscopy images. The initial user group consists of pathologists, with ISI developing the infrastructure. Early phases of the project involved rapid prototyping and iteration with the user representatives to build out the core data model, user workflows and user experience, and investigation of microscopy tools. In January 2011: packaged (RPM), documented (wiki), released (YUM repo) the version 1.0 of the system. Ongoing enhancements to the data model and major work on microscopy image support still underway, as well as refactoring of the system into reusable BIRN components.

BIRN Pathology Technology ‘Stack’ LAMP Database Development Virtual Microscopy Search Application (e.g., Pathology) Data Integration Security File Transfer Data Management “Capabilities” Data Management “Capabilities” Other BIRN “Capabilities” 3 rd Party Tools Application Specific key

Database Development In a nutshell: Django + Extensions Features Django Web Framework MVC, ORM, HTML Templates Auto-generated data forms! RESTful interface routing Spreadsheet batch importer Crowd integration (users/groups/SSO) Data object access control (ACLs) Javascript UI widgets Scheduled database backup System Requirements Linux, Apache, MySQL or Postgres, Python, Django Use Cases Scientists need to create custom web-accessible databases to share data within and between research groups. Scientists store data on spreadsheets, need to manage this data (DBMS), and expose it programmatically (WS REST)

Virtual Microscopy Image Processing Tools Accepts images in numerous proprietary microscopy formats (Bio-Formats) Creates pyramidal, tiled image sets in open format (JPEG) used by Zoomify image viewer Extracts proprietary graphical annotation formats and converts them to Zoomify annotation format Image Server + Database (Built on our Database Development tools) Monitors an image upload directory, automatically processes images, and updates the image database Stores and serves annotations according to associated ACLs Use Cases Scientists need to share high resolution (100X) images over relatively low bandwidth networks. Scientists need to annotate and share annotations on images.

Search Features Single-site, full-text indexing of databases and text files. Search results returned as Python objects based on Django ORM mappings. ‘Advanced search’; i.e., multiple fields, and logical expressions. Easy to synchronize index with changes to Django-based object model. Technology Based on Xapian and Djapian Use Case Scientists need to search their databases using full- text queries and field- based “advanced” search queries.

Pathology Application Features Pathology Data Model Subject, Case, Specimen, Tissue, Image entities Species, Disease, Etiology domain tables Pathology data entry workflows UI forms tailored to the data entry workflow of pathologists. Microscopy support for proprietary formats: Aperio SVS (supported) Hammamatsu NDPI (partially supported) Olympus VSI (under development) And, of course, all of the features from the underlying software and services Database Development + Virtual Microscopy + Search Data Integration using BIRN Mediator Capability to run federated queries across sites Use Cases Pathologists create images in proprietary microscopy formats, annotate the images, and need to upload them to a web-accessible database for sharing among collaborating sites. Pathologists associate metadata and supplementary files with their microscopy images.

Screenshots

Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS, XML, HTML, Excel, XML, SOAP Use Schema & Data – Source Modeling & Record Linkage Infrastructure capabilities – Security, Efficient Query Execution, SQL-like syntax across multiple sources. Decision SupportApplication Programs Mediator Knowledge Bases DatabasesComputer Programs The Web

Information Mediator Virtual Integration Architecture: – Virtual organization: community of data providers and consumers that want to share data for a specific purpose – Autonomous sources: data, control remains at sources; no change to access methods, schemas; data accessed real-time in response to user queries – Mediator: integrator defines domain schema and describes source contents Domain schema: agreed upon view of the domain preferred by the virtual organization Source descriptions: logical formulas relating source and domain schemas

BIRN MEDIATOR Project Overview

Project History The BIRN Mediator project began in … The initial user group consists of … Early phases of the project involved … In : Ongoing enhancements to …

Information Mediator Query Answering – User writes query in domain schema – Mediator: Determines sources relevant to user query Rewrites query in sources schemas Breaks query into sub-queries for sources Optimizes query evaluation plan Combines answers from sources – Efficient query evaluation Streaming dataflow

31 BIRN Grid-based Virtual Data Integration Architecture Relational DB Ex: HID XML DB Ex: eXist-db Web Portal BIRN Gateway OGSA-DQP/DAI Internet Internal to Organization Internet Internal to Organization Grid Security Infrastructure (TLS + PKI) OGSA-DAI ISI-Mediator Web Service Ex: XNAT Logical Source Descriptions Reconcile Semantics/ Query Rewriting Client Program Query Optimization/ Execution Source Wrappers Security: Encryption, Authentication, Authorization,

The Information Mediator User Queries / Web Portal / Services Security Information Integration ‘Capabilities’ Information Integration ‘Capabilities’ Other BIRN ‘Capabilities’ Other BIRN ‘Capabilities’ Application Specific key Execution Engine Optimizer Reformulation Wrapper Logical Source Descriptions Data Sources

Database Consolidation Construct a ‘virtual organization’ A community of data providers and consumers sharing data for common specific purpose All sources are autonomous data, control remains at sources no change to access methods, schemas; data accessed real-time in response to user queries Work consists of modeling domain schema and source contents Domain schema = agreed upon view of the domain preferred by the virtual organization Source descriptions = logical formulas relating source and domain schemas Implemented in multiple domains: fMRI : Ashish, et al. (2010) “Neuroscience Data Integration through Mediation: An (F)BIRN Case Study.” Front. Neuroinf. 4:118 Cardiovascular Research Grid Non-Human Primate Research Consortium Child Neurodevelopmental Disorders Use Cases Scientists from different groups want to query across two databases with different schema Databases may be completely different (i.e., one group uses Excel spreadsheets, another uses Filemaker Pro and a third uses Oracle)

Database Extension Reapply the mediator technology to sources from different subjects e.g., link genetics data to imaging data. Not dependent on a universal, global ontology, but a locally- defined model specific for the application Develop models using Datalog that are then processed by the BIRN Mediator, and OGSA-DAI / OGSA-DQP technology. Use Cases Scientists want to bring together data from different sources into a a single, common domain model Sources will require linkage at the level of schema and data

EZ-Mediator This is not a particular difficult task to manage by hand but automating it is even simpler within our framework Small improvements can lead to large gains! Based on the same mediator technology as previously described Use Cases Can we, with the absolute minimum of effort, run queries across separate databases that have the same schema?

Automatic Interface Generation How does this work? Use Cases The mediator technology generates a user interface for a specific task automatically

Screenshots

Knowledge Engineering Start with the question: “What is an ‘atom’ of scientific knowledge?”

Scientific assertions as ‘Computable, citable elements’ There are very large number of statements like ‘mice like cheese’ – semantics at this level are complicated! For example: – “Novel neurotrophic factor CDNF protects midbrain dopamine neurons in vivo” [Lindholm et al 2007] – “Hippocampo-hypothalamic connections: origin in subicular cortex, not ammon's horn.” [Swanson & Cowan 1975] – “Intravenous 2-deoxy-D-glucose injection rapidly elevates levels of the phosphorylated forms of p44/42 mitogen-activated protein kinases (extracellularly regulated kinases 1/2) in rat hypothalamic parvicellular paraventricular neurons.” [Khan & Watts 2004] Assertions vary in their levels of reliability, specificity. Can we introduce a generalized formalism that could support automated reasoning?

Cycles of Scientific Investigation (‘CoSI’)

e.g., ‘CDNF protects nigral dopaminergic neurons in-vivo’ This statistically- significant effect is the experimental basis for the findings of this study. Our ontology engineering approach is based on experimental variables from Lindholm, P. et al. (2007), Nature, 448(7149): p. 73-7

Knowledge Engineering from Experimental Design (‘KEfED’) Khan et al. (2007), J. Neurosci. 27:7344-60 [expt 2]

KNOWLEDGE ENGINEERING FROM EXPERIMENTAL DESIGN ‘KEfED’ Project Overview

Project History The KEfED formalism has been under formulation since 2006 and received it’s first active funding in 2007. It has been initially developed in a demonstration project based on neural connectivity and has been developed for the Michael J Fox and Kinetics Foundations for Parkinson’s research. The initial user group consists of laboratory-based neuroanatomists and neuroendocrinologists. Early phases of the project involved development of initial prototypes to capture the design of a well-understood experimental design and to generate a knowledge base for experimental data from that design. We have developed numerous prototypes but have deployed a working system from the http://www.bioscholar.org website in March, 2011.http://www.bioscholar.org Ongoing enhancements to the system include (a) ontology support, (b) the representation of statistical relations and correlations, (c) coordination with the data management and information integration working groups.

BioScholar Application Develop a knowledge base framework for observations and interpretations from experiments. Scientists manually curate data by hand from publications into generic database driven by KEfED model Can reuse designs for multiple experiments Design process is intuitive, can build a database without informatics training Ideal for non-computational biologists. Java / Flex Web application, one click install Use Cases Scientists want to develop a generic knowledge base driven from a corpus of PDF files stored locally within a specific laboratory

Crux Application Scientists within a disease foundation must plot a whole research program How to keep track of hypotheses, experimental results and outcomes to plan the next phase of the project? System is just about to start year 2 of funding geared towards curation of raw data (not from publications). System can permit scientists with no computational expertise develop scientific data repositories. Project driven by the Kinetics Foundation (with initial funding from M.J. Fox Foundation). Use Cases Decision makers at a disease foundation want to store raw data generated a generic knowledge base driven from a corpus of PDF files stored locally within a specific laboratory

Screenshots

Summary BIRN provides capability kits and consulting to facilitate internal and external data sharing and other aspects of collaboration. BIRN takes a bottom-up, modular, capability- based approach that is driven by end user needs. End users are viewed as collaborators and are actively involved in development process.

BIRN Program Announcements Provides funds to work with BIRN on projects relating to data sharing (PAR-07- 426) and the associated ontology (PAR- 07-425) for the data. Mechanism to fund the leveraging of BIRN capabilities BIRN provides assistance in framing proposals and developing projects.

Contact Information General: – info@birncommunity.org Project Manager: – Joe Ames: jdames@uci.edu Outreach: – Karl Helmer: helmer@nmr.mgh.harvard.edu Web: – http://www.birncommunity.org – https://wiki.birncommunity.org:8443/

Acknowledgements Executive Team – Carl Kesselman – Joe Ames – Karl Helmer Data Management – Ann Chervenak – Rob Schuler – David Smith – Laura Pearlman Information Integration – Jose Luis Ambite – Gowri Gumaraguruparan – Maria Muslea – Naveen Ashish Knowledge Engineering – Jessica Turner – Tom Russ Security – Rachana Ananthakrishnan Communities – NonHuman Primate Research Consortium – FBIRN – CVRG – Mouse BIRN – Mouse Genome Informatics – Brain Architecture Group @ USC And Many Many More…

An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California.

Similar presentations

Presentation on theme: "An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California.

Similar presentations

Presentation on theme: "An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California."— Presentation transcript:

Similar presentations

About project

Feedback