All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos
What is BIRN? 150,000LONIharvardncrr.nih
Biomedical Informatics & Research Biocomplexity Discovery and Systems research approaches complement Hypothesis-based research Integrative, multidisciplinary team approach adapted for complex queries versus focused approach for hypothesis-driven research Team approach more dependent on advanced technologies and instrumentation which generate large data sets Information management at core of biomedical research for 21 st century and beyond Biocomplexity Discovery and Systems research approaches complement Hypothesis-based research Integrative, multidisciplinary team approach adapted for complex queries versus focused approach for hypothesis-driven research Team approach more dependent on advanced technologies and instrumentation which generate large data sets Information management at core of biomedical research for 21 st century and beyond
Overview of the BIRN-CC Roadmap Deliver and maintain a robust and scalable PRODUCTION Grid for the collaborative sharing, analysis and interrogation of biomedical data Provide system integration to bring user applications into BIRN Provide a consistent and scalable software delivery mechanism Facilitate the use of advancing information technologies by biomedical scientists - “Cyberinfrastructure” and the “Grid” Be the biomedical applications driver framing requirements for the rapidly evolving GRID infrastructure “Enforce the AEIOU’s – Accessibility, Extensibility, Interoperability, Openness, Usability, Scalability”
Hardware Integrated Cyberinfrastructure System meeting the needs of multiple communities Source: Dr. Deborah Crawford, Chair, NSF CyberInfrastructure Working Group Grid Services & Middleware Development Tools & Libraries Applications Environmental Science High Energy Physics Biomedical Informatics Geoscience Domain- specific Cybertools (software) Shared Cybertools (software) Distributed Resources (computation, communication storage, etc.) Education and Training Discovery & Innovation
BIRN Core Software Infrastructure Distributed Resources BIRN builds on evolving community standards for middleware Adds new capabilities required by projects Does System Integration of domain-specific tools building a distributed infrastructure Utilizes commodity hardware and stable networks for baseline connectivity
BIRN Core Software Infrastructure Distributed Resources BIRN builds on evolving community standards for middleware Adds new capabilities required by projects Does System Integration of domain-specific tools building a distributed infrastructure Utilizes commodity hardware and stable networks for baseline connectivity Grid Services & Middleware Development Tools & Libraries Shared Tools for Multiple Science Domains Distributed Computing, Instruments and Data Resources Your Specific Tools & User Apps. Friendly Work Facilitating Portals Authentication - Authorization - Auditing - Workflows - Visualization - Analysis
BIRN Core Software Infrastructure Distributed Resources BIRN builds on evolving community standards for middleware Adds new capabilities required by projects Does System Integration of domain-specific tools building a distributed infrastructure Utilizes commodity hardware and stable networks for baseline connectivity Grid Services & Middleware Development Tools & Libraries Shared Tools Science Domains Distributed Computing, Instruments and Data Resources Your Specific Tools & User Apps. Friendly Work Facilitating Portals Authentication - Authorization - Auditing - Workflows - Visualization - Analysis Biomedical Informatics “BIRN” High Enegy Pysics GriPhyN Geosciences “GEON” Bays and Rivers (Moore Found.) Earthquake “NEES” Ocean Observing “Looking”
BIRN is Pioneering: We are Making Unique and Fundamental Contributions to Establish Working GRIDs BIRN is setting an example for other Grid project deployments [i.e. use of Rocks and automated distribution mechanisms] GEON, etc…GEON, etc… BIRN is a driver application for other major GRID initiatives Common security APIs being used within BIRN, Telescience, GEONCommon security APIs being used within BIRN, Telescience, GEON OptIPuter - research into next generation networking - BIRN is the Bioscience DriverOptIPuter - research into next generation networking - BIRN is the Bioscience Driver Drives requirements to the Global Grid Forum and Internet 2 development effortsDrives requirements to the Global Grid Forum and Internet 2 development efforts
Grid Infrastructure in Action The Grid is already having an impact… Many projects in many subjects:Many projects in many subjects: Life sciences Medicine Environment Engineering Materials Chemistry Physics BIRN embodies the most innovative use of data, metadata & portalsBIRN embodies the most innovative use of data, metadata & portals BIRN cited as successful model of grid computing.
The Grid is becoming the backbone for collaborative science and data sharing
BIRN Infrastructure Provides… high performance connectivity between distributed resources (computation and data storage) JHU utilizing TeraGrid resources pulling data from SRBJHU utilizing TeraGrid resources pulling data from SRB secure access to large volumes of distributed data distributed high performance computing resources BIRN just received an NSF Large Resource Allocation Committee award (450,000 service units, i.e. processor hours)BIRN just received an NSF Large Resource Allocation Committee award (450,000 service units, i.e. processor hours) frameworks (standards, APIs, services) for the integration and interoperation of tools, users, data and computing resources Improved high level “wrapper” toolsImproved high level “wrapper” tools common authentication protocolcommon authentication protocol
Intuitive user interfaces to access grid based computational analyses Transparent access to distributed data found within the BIRN Data Grid Access to Grid Resources Case Study: JHU - LDDMM grid computing launched from BIRN Portal Semi Automatic Shape Analysis study utilizing compute intensive analyses (i.e. Large Deformation Diffeomorphic Metric Mapping)
BIRN has the Advantage of having Developed an “End-to-End” Infrastructure BIRN has the Advantage of having Developed an “End-to-End” Infrastructure in the context of distributed biomedical research projects. Consists of all the components required to effectively share and collaboratively explore data The BIRN Rack (BIRN site infrastructure)The BIRN Rack (BIRN site infrastructure) The BIRN Virtual Data GridThe BIRN Virtual Data Grid The BIRN Mediation InfrastructureThe BIRN Mediation Infrastructure The BIRN PortalThe BIRN Portal The system integration, development, deployment and management of this infrastructure is the main focus of activities within the BIRN Coordinating Center
Continually improve the BIRN software infrastructure (i.e. performance, robustness, end- to-end integration, and interoperability) Standardize the software delivery process by providing twice yearly scheduled software releases – April & October Develop internal processes for alpha, beta and production releasesDevelop internal processes for alpha, beta and production releases Instantiate robust development, staging and production environmentsInstantiate robust development, staging and production environments Improved documentation and tutorials for all componentsImproved documentation and tutorials for all components Provide automated deployment mechanisms Improving the BIRN Environment
BIRN Portal Updated BIRN Portal with new and improved features currently in production Worked with test beds to improve the usability and performance of the BIRN Portal Improved Performance Updated Portal API for more robust operation Implemented guest pages and accounts Enhanced security and integration with the BIRN Authentication infrastructure Updated look and feel for improved usability Providing online documentation & tutorials
… is that you rely on the integrity of the gatekeeper The problem with portals …
Benefits of a Data Grid Uniform interface for connecting to heterogeneous distributed data resources Allows for any “grid enabled” tool to interact with data no matter where it is located or what it is located on Allows for the seamless creation and management of distributed data sets Distributed data appear as a single managed collection both to users and tools Access is Managed using GRID Authentication through BIRN Portal
Security: Access and Audit Intuitive interfaces to core infrastructure (e.g. the BIRN Virtual Data Grid) and services (e.g. full auditing on BIRN data or image viewing)
Google is not a portal……… Google is not a portal……… Carrot juice cures piles A result? From Ken Peach, Rutherford Labs UK
If you dig deep enough you may get what you want (but perhaps not exactly what you need) Carrot juice cures piles 1,680 Drink a juice of turnip leaves, spinach, water cress and carrots (equal quantity)
Example of Data Mediation within BIRN Find all joint projects between UCSD and Duke w/ relevance to Lewy Body Disease
Benefits of Data Mediation Provide means to locate, access and interrogate data contained in distributed databases Can add new resources without modifying existing data resources Promote flexible views on top of the data Semantically and spatially integrate multi-scale and multi-modal data
BIRN Data Mediation Version 2.0 of the BIRN mediator is currently in alpha testing (i.e. as a core component of the BIRN 2.0 release) Improvements to the new release Enhanced query performance Updated registration, query and view building tools Support for PostgreSQL databases Integrated with BIRN authentication infrastructure BIRN-CC is exploring additional data mediation approaches with collaborators Yale - Query Integrator System (QIS) GEON - IBM Information Integrator
From Vision to Reality “It’s all in the software” “It’s not a bug, it’s a feature” “That will be in the next version” “When is the next version?” “I just want to open a file” “I need to monitor and control who accesses my data” “How do I locate data of interest to me?” “I need a boatload of computing, how do I find it?” “Why the heck isn’t this easier?”
New sites and collaborative projects are being added BIRN Grid Testbed Sites
We Began with Standard Hardware This Jumpstarted BIRN for functionality Software footprint is managed from the BIRN Coordinating Center Integration of domain tools, middleware, OS, updates, and more BIRN expansion/upgrade of existing sites must have a more generic (and less expensive) hardware footprint
BIRN CC Software Concerns & Operations Deploy/Manage/Update Common Services Portal/Website Security Infrastructure Metadata Catalog – SRB MCAT Mediator Registry Source code repository Java Application Servers Deploy/Manage/Update Site Racks Enterprise Linux Databases and Data Grid Clients Mediated Data Resources BIRN applications (e.g. LONI, 3D-Slicer, FreeSurfer, …)
“It’s all in the Software” Critical Issues What is the BIRN Software Stack? When is it updated? What Services are supported? Integrated releases of all BIRN software Defining components: Input/SW from all of BIRN Candidate software – 3 months prior to release Alpha phase (functionality freeze) – 2 monts prior to release Defined Schedule April/October releases - 1 month beta cycle Pre-alpha is defined now – Part of this meeting should be to prioritize components for April ’05.
More on Software Releases Defined release cycles is intended to Provide software stability for users and developers Allow everyone to plan on when system changes will occur As a whole, BIRN will need to prioritize what goes into a release There are limited people and testing resources Transferring software is not a trivial task Packaging uncovers system assumptions We use Rocks to define “appliances” 100% automated configuration of endpoints and services BIRN tools need to be transferable to other NIH projects
I Just Want to Open a File … BIRN has been built upon data collections Data was copied in/out of data grid Meta data allows transparent location/querying Requires scripts/changes to code Distributed File System Layer Experimenting with AFS Feasibility/performance of developing SRBFS not clear BIRN Application Workflow Mediated Data Data Collection DB Distributed File System Local/NFS File System Mediator – under development (v 2.0) Oracle/Postgres - SRB Standard OS
I Need to Monitor Access to My Files Authentication (Identification) GSI Certificates Managed transparently by the Portal – Username/Passwd Have developed a Java Class to encapsulate GSI functionality to ease the development of GSI-aware SW Access control already built in to Data Collection Management (authorization) As we introduce other data modalities, we need to develop a vocabulary that is useful Translate to specific software systems Eg. SRB, Oracle/Postgres Table Security, AFS, GridFTP, … There is a dearth of community tools to build upon here BIRN can help drive the community
I need to locate data of interest to me Two ways now: Meta data attached to collection-managed data “Retrieve all DAT-KO MRI images” Data Mediator Gives the illusion of a single database New relationships among separate database. What about distributed file systems? You get pathnames, only You can VI (or emacs) a file – that is read/write/open/close works as expected It is reasonable to look at a DFS as a step stone Very useful as community working directories where metadata is less important, but access control is critical
I need a boatload of computing … JHU has been experimenting with using Teragrid and loading data into the BIRN Data Grid Their storage resources are at 90+% Condor is deployed on Racks, but We need to look at Use cases and utility. Automated data management (Move my data to the computing) is still clumsy at best Pathfinder applications help to more crisply define the software stack
Why the heck isn’t this easier? It really hasn’t been done before A significant number of dimensions Application usage Security requirements Scale of data and of distributed systems Software is evolving to be more robust Cyberinfrastructure architecture has converged to services-based Implementation Grid Services -> Web Services (within the year) We’ve needed a Software rallying point Regular release schedule should help provide the pacing that we need
BIRN Core Software Infrastructure Distributed Resources BIRN builds on evolving community standards for middleware Adds new capabilities required by projects Does System Integration of domain-specific tools building a distributed infrastructure Utilizes commodity hardware and stable networks for baseline connectivity
All Hands Meeting 2004
~2000 years old and still readable without technology The Forest of Stones, Xi’an
Evolution of the Computational Infrastructure Source: Dr. Deborah Crawford Chair, NSF CyberInfrastructure Working Group (CIWG) Supercomputer Centers PACI Terascale | | | | | | NPACI and Alliance SDSC, NCSA, PSC, CTC TCS, DTF, ETF Cyberinfrastructure Prior Computing Investments NSF Networking Mosaic - Web Browser GRID Term Coined ~ Metacomputing A timeline from the Computational Infrastructure Division of the US National Science Foundation Telescience: Access to Remote Resources