Dan Crichton April 2010. Topics Introduction – who am I? Architecture – what is means to me Challenges in Developing Architectures Reference Architecture.

Slides:



Advertisements
Similar presentations
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Advertisements

Ch:8 Design Concepts S.W Design should have following quality attribute: Functionality Usability Reliability Performance Supportability (extensibility,
Building an Operational Enterprise Architecture and Service Oriented Architecture Best Practices Presented by: Ajay Budhraja Copyright 2006 Ajay Budhraja,
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Building Enterprise Applications Using Visual Studio ®.NET Enterprise Architect.
DCS Architecture Bob Krzaczek. Key Design Requirement Distilled from the DCS Mission statement and the results of the Conceptual Design Review (June 1999):
Software Engineering Module 1 -Components Teaching unit 3 – Advanced development Ernesto Damiani Free University of Bozen - Bolzano Lesson 2 – Components.
Course Instructor: Aisha Azeem
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Training of master Trainers Workshop 10 – 15 November 2012 e-Services Design and Delivery Module VI Emilio Bugli Innocenti.
A Software Architecture for Highly Data-Intensive Systems Chris A. Mattmann USC Center for Software Engineering Annual Research Review.
Software Engineering Muhammad Fahad Khan
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse.
SCIENCE-DRIVEN INFORMATICS FOR PCORI PPRN Kristen Anton UNC Chapel Hill/ White River Computing Dan Crichton White River Computing February 3, 2014.
1 CCSDS Information Architecture Working Group SEA Plenary Daniel J. Crichton, Chair NASA/JPL 12 September 2005.
International Workshop on Web Engineering ACM Hypertext 2004 Santa Cruz, August 9-13 An Engineering Perspective on Structural Computing: Developing Component-Based.
An Introduction to Software Architecture
DISTRIBUTED COMPUTING
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
Architecting Web Services Unit – II – PART - III.
SAMANVITHA RAMAYANAM 18 TH FEBRUARY 2010 CPE 691 LAYERED APPLICATION.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
EMI INFSO-RI SA2 - Quality Assurance Alberto Aimar (CERN) SA2 Leader EMI First EC Review 22 June 2011, Brussels.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion.
Interfacing Registry Systems December 2000.
Linked-data and the Internet of Things Payam Barnaghi Centre for Communication Systems Research University of Surrey March 2012.
1 A National Virtual Specimen Database for Early Cancer Detection June 26, 2003 Daniel Crichton NASA Jet Propulsion Laboratory Sean Kelly NASA Jet Propulsion.
Model-Driven Analysis Frameworks for Embedded Systems George Edwards USC Center for Systems and Software Engineering
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
1 Introduction to Middleware. 2 Outline What is middleware? Purpose and origin Why use it? What Middleware does? Technical details Middleware services.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Web Services Based on SOA: Concepts, Technology, Design by Thomas Erl MIS 181.9: Service Oriented Architecture 2 nd Semester,
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Enterprise Architecture, Enterprise Data Management, and Data Standardization Efforts at the U.S. Department of Education May 2006 Joe Rose, Chief Architect.
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
Chapter 10 Analysis and Design Discipline. 2 Purpose The purpose is to translate the requirements into a specification that describes how to implement.
SOFTWARE DESIGN AND ARCHITECTURE LECTURE 05. Review Software design methods Design Paradigms Typical Design Trade-offs.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
FDT Foil no 1 On Methodology from Domain to System Descriptions by Rolv Bræk NTNU Workshop on Philosophy and Applicablitiy of Formal Languages Geneve 15.
Distribution and components. 2 What is the problem? Enterprise computing is Large scale & complex: It supports large scale and complex organisations Spanning.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Chapter 6 – Architectural Design Lecture 1 1Chapter 6 Architectural design.
MODEL-BASED SOFTWARE ARCHITECTURES.  Models of software are used in an increasing number of projects to handle the complexity of application domains.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Djc -1 Daniel J. Crichton NASA/JPL 9 May 2006 CCSDS Information Architecture Working Group.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
National Geospatial Enterprise Architecture N S D I National Spatial Data Infrastructure An Architectural Process Overview Presented by Eliot Christian.
E ARTHCUBE C ONCEPTUAL D ESIGN A Scalable Community Driven Architecture Overview PI:
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
1 Steve Hughes Daniel J. Crichton NASA/JPL January 16, 2007 CCSDS Information Architecture Working.
A Perspective on the Electronic Geophysical Year Raymond J. Walker UCLA Presented at eGY General Meeting Boulder, Colorado March 13, 2007.
CIMA and Semantic Interoperability for Networked Instruments and Sensors Donald F. (Rick) McMullen Pervasive Technology Labs at Indiana University
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting GHRSST GDAC and EOSDIS PO.DAAC.
Cyberinfrastructure Overview of Demos Townsville, AU 28 – 31 March 2006 CREON/GLEON.
IPDA Architecture Project International Planetary Data Alliance IPDA Architecture Project Report.
National Aeronautics and Space Administration 1 CCSDS Information Architecture Working Group Daniel J. Crichton NASA/JPL 24 March 2005.
IPDA Registry Definitions Project Dan Crichton Pedro Osuna Alain Sarkissian.
Service Oriented Architecture (SOA) Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Architecting Scientific Data Systems in the 21st Century
Domain Specific Software Architectures for Science Lecture for Software Architectures USC 578 Dan Crichton April 2010.
Model-Driven Analysis Frameworks for Embedded Systems
The Movement Towards Grid Architectures in Planetary Science
Service-centric Software Engineering
An Introduction to Software Architecture
Presentation transcript:

Dan Crichton April 2010

Topics Introduction – who am I? Architecture – what is means to me Challenges in Developing Architectures Reference Architecture vs Domain Specific Software Architectures Experience in Science Lessons Learned Q&A

Who am I? Employed by Jet Propulsion Laboratory since 1995; prior software engineering positions at Hughes Aircraft Company and in private industry MS in Computer Science, USC; 20+ years of experience Program Manager & Principal Computer Scientist for Planetary Data System Engineering in Solar System Exploration Directorate Data Systems and Technology in Earth and Technology Directorate Principal Investigator for Informatics Center, Early Detection Research Network, National Cancer Institute Facilitating Integration of NASA and Earth System Grid, NASA Object Oriented Data Technology Several co-Investigator Tasks

Architecture: why do I care? Architecture is a game changer in our business Enable scientific discovery, novel engineering, etc Coordination across multiple enterprises Data system costs per mission, project, investigation, etc is high Technology infusion is limited Experience and knowledge reuse

But, there are challenges Lack of true architects Most think of point solutions or confuse architecture and implementation Abstracting is difficult Governance is often at a project level; little view at an enterprise level Limited planning and understanding of the reference requirements

Architects: what are they? Effective Architects have… Years of experience Holistic view of domain – Look at both aesthetics and practical details – Variable technical depth Lifecycle roles – Strong involvement up-front – May oversee development – Chooses stable steps in development Effective Architects are not… Lone inventors or scientists – The architect is a good communicator and politician -- architectures must be sold and explained and their integrity maintained – Architecting is not a science, but depends on science Purely technologists Architecture is a strategy “Top level only” designers – Details are often critical Collaborators – A coherent vision is critical; they drive it

Architecture: what is it? The fundamental organization of a system embodied in its components, their relationships to each other, and to the environment, and the principles guiding its design and evolution. (ANSI/IEEE Std )

Communicating an architecture A good architecture is one that can be communicated to the stakeholders A good architecture presents viewpoints of the system that address stakeholder concerns A good architecture uses models and descriptions that are relevant to the stakeholders Different models may be used to present different viewpoints (e.g., A UML model of the system may be appropriate for some but not all stakeholders)

Viewpoints and views 9 A viewpoint is a template for constructing a view Enterprise, Functional, Informational, etc A view is a description of the entire system from the perspective of a set of related concerns. A view is composed of one or more models. A model is an abstraction or representation of some aspect of a thing Examples: RM-ODP, FEAF, TOGAF, etc The viewpoint is where you look from The view is what you see (Project Managers, Engineers, Scientists, Business Analysts, …)

Reference Architectures Show components, functions, and interfaces at a high level of abstractions Likewise, we consider information models to also be part of a reference architecture (at a sufficient abstract level) In observing systems, the information model patterns are highly compatible as a reference information model Implementation neutral; architectural frameworks can be useful in defining a structure for a reference architecture We use Reference Architectures to give us a strategic advantage as well as improve enterprise scale software

Domain Specific Software Architectures* Domain model Leverage experts who have the “holistic” view and can drive the need for product lines An unambiguous view is critical (in fact, this has been a problem in science arenas) Reference requirements Drives the reference architecture However, it is critical to map domain models to reference requirements in order to understand the solution space Reference architecture Satisfies an abstracted set of functions from the reference requirements It’s engineered for the “ilities” reusability, extensibility and configurability It demonstrates the separation of functional elements of the architecture * Tracz, Will, Domain-Specific Software Architecture, ACM SIGSOFT, 1995

RAs vs DSSAs in Science In science data systems, construction of multiple architecture viewpoints of a system is critical Process/Enterprise Information/Data Technology We find the “viewpoints” are similar, but models can be domain specific This is the opportunity to develop a reusable reference architecture if the “patterns” can be extracted

Scientific data systems Covers a wide variety of disciplines Solar system exploration Astrophysics Earth science Biomedicine etc Each has its own communities, standards and systems But, there is an underlying reference architecture and discipline software architectures in each!

The “e-science” trend Highly distributed, multi-organizational systems Systems are moving towards loosely coupled systems or federations in order to solve science problems which span center and institutional environments Sharing of data and services which allow for the discovery, access, and transformation of data Systems are moving towards publishing of services and data in order to address data and computationally-intensive problems Infrastructures which are being built to handle future demand Address complex modeling, inter-disciplinary science and decision support needs Need a dynamic environment where data and services can be used quickly as the building blocks for constructing predictive models and answering critical science questions Changing the way in which data analysis is performed Moving towards analysis of distributed data to increase the study power Enabling greater collaboration across centers

Context: Space data systems DJC-15 External Science Community Data Acquisition and Command Mission Operations Instrument /Sensor Operations Science Data Archive Science Data Processing Data Analysis and Modeling Science Information Package Science Team Relay Satellite Spacecraft / lander Spacecraft and Scientific Instruments Primitive Information Object Simple Information Object Telemetry Information Package Science Information Package Instrument Planning Information Object Science Information Package Science Products - Information Objects Planning Information Object Science Information Package Common Meta Models for Describing Space Information Objects Common Data Dictionary end-to-end

Earth Science Data Systems Science Processing Center 1 Science Processing Center 2 Archive & Distributio n (DAAC 1) Archive & Distributio n (DAAC 1) Archive & Distributio n (DAAC 2) Archive & Distributio n (DAAC 2) Distributed Data Analysis (Subsetting, Gridding, Transformation,Modeling) Distributed Data Analysis (Subsetting, Gridding, Transformation,Modeling) Other Data Sources (e.g. NOAA) DS Mission #1 DS Mission #2 Users SMAP, Desdyni PO.DAAC Infrastructure to support Analysis of Distributed Data

Cancer research

Patterns in scientific data systems Instrument and Spacecraft Commands Instruments that capture observations Generation of Engineering and Science Data Products Data Processing Data Management Data Distribution Distributed Facilities Data Movement

Finding the reference architecture Simple SOA-style pattern Data/Information Architecture Components, middleware, and communication NOTE: Process is implicit here

“Ilities” in science data systems DJC-20 Usability Diversity within the domain Scalability Reliability Portability NOTE: Our reference architecture must address these ilities long term

Specialization within domains Domain information models Planetary Science Ontology Cancer Biomarker Ontology Etc Specific services and domain implementations are derived from the reference architecture Reference Architecture->Domain Specific Software Architecture-> Domain Implementations In these science domains, the architectures need to be long-lived (20+ years)

Derived Planetary Data System Architecture

Software product lines This is about strategy more than technology Goal is a software product line that Implements our reference architecture Allows for construction of core software components that can be reused across projects and science disciplines Can demonstrate sufficient cost and schedule benefits without sacrificing flexibility in meeting requirements and adapting to technology change Extensions can be applied at the discipline level

Object Oriented Data Technology Represents both a reference architecture AND a software product line for science data systems Exploits common patterns Delivers reusable software components as building blocks for construction of higher order data systems Applied to multiple science disciplines Funded originally back in 1998; runner up for NASA Software of the Year in 2003 Heavily used by NASA and NIH projects DJC-24

Architectural principles* Separate the technology and the information architecture Encapsulate the messaging layer to support different messaging implementations Encapsulate individual data systems to hide uniqueness Provide data system location independence Require that communication between distributed systems use metadata Define a model for describing systems and their resources Provide scalability in linking both number of nodes and size of data sets Allow systems using different data dictionaries and metadata implementations to be integrated Leverage existing software, where possible (e.g., open source, etc)` DJC-25 * Crichton, D, Hughes, J. S, Hyon, J, Kelly, S. “Science Search and Retrieval using XML”, Proceedings of the 2 nd National Conference on Scientific and Technical Data, National Academy of Science, Washington DC, 2000.

Architectural focus Consistent distributed capabilities Resource discovery (data, metadata, services, etc), “grid-ing” loosely coupled science system, workflow management On-demand, shared services (E.g. processing, translation, etc) Processing Translation Deploy high throughput data movement mechanisms End-to-end capabilities across the science environment Reduce local software solutions that do not scale Increasing importance in developing an “enterprise” approach with common services Build value-added services and capabilities on top of the infrastructure DJC-26

Exploiting common patterns How data is managed (registry/repository, information objects themselves)… How data is generated, captured, etc (e.g., workflow and data processing)… How data is accessed (metadata, data)… How information is discovered … How data is distributed (e.g., transformed)… How data is visualized…

What does OODT do? Tie together loosely coupled distributed heterogeneous data systems into a virtual data grid Support critical functions Data Production and workflow Data Distribution Data Discovery (including query optimization across highly distributed systems) Data Access An architectural approach first, an implementation second Adapt to different distributed computing deployments Promotes a REST-style architectural pattern for search and retrieval Scalability in linking together large, distributed data sets

OODT data architecture focus On types of and relationships among a software system’s data Decomposition of data within a software system to its logical components and interactions Components: Data Elements, Data Dictionary, Data Models of individual data sources Interactions: Mappings between Data Dictionary to Data Models, Data Element structural comparison Some standards currently exist for data architecture ISO: ISO Standardization and Specification of Data Elements Dublin Core Metadata Initiative: Dublin Core Data Elements to describe any electronic resource Specifications for the Data Architecture Common XML schema for managing information about data resources Common XML schema for messaging between distributed services Methods for integrating existing domain models within architecture

OODT data architecture models Resource Metadata Model Request/Response Model Based on ISO/IEC Based on Dublin Core

OODT software components Profile Service – A server-based registry that is able to either serve local XML profiles or plug-into an existing catalog. This component provides resource discovery. Product Service – A server component that plugs into existing repositories and serves products. This includes translation serves, etc Catalog and Archive Service – Transaction-based server that catalogs and archives products providing profile and product servers for discovery and distribution Query Service – Provides query management across distributed services to enable discovery.

Distributed architecture DJC Repositories for storing and retrieving many types of data 1. Science data tools and applications use “APIs” to connect to a virtual data repository Visualization Tools Analysis Tools OODT Reusable Data Grid Framework OODT Reusable Data Grid Framework Mission Data Repositories Mission Data Repositories OODT API OODT API 2. Middleware creates the data grid infrastructure connecting distributed heterogeneous systems and data Biomedical Data Repositories Biomedical Data Repositories Engineering Data Repositories Engineering Data Repositories Web Search Tools OODT API OODT API OODT API OODT API

Technology architecture Common Meta Models for Describing Space Information Objects Common Data Dictionary end-to-end Query Integration Node 1 Profile Server XML Request Information Object XML Request Info Object XML Request Repository Product Server Information Object Web I/F Desktop I/F XML Request Information Object Name Server Repository Product Server Node 1 Profile Server Node 1 Profile Server Registry Server Repository/Archive Server … Name Server Service Registry XML Request Information Object WSDL Product Catalogs Science Products Science Products Science Products

OODT software implementation OODT is Open Source Developed using open source software (i.e. Java/J2EE and XML) Implemented reusable, extensible Java-based software components Core software for building and connecting data management systems Provided messaging as a “plug-in” component that can be replaced independent of the other core components. Messaging components include: CORBA, Java RMI, JXTA, Web Services, etc REST seems to have prevailed Provided client APIs in Java, C++, HTTP, Python, IDL Simple installation on a variety of platforms (Windows, Unix, Mac OS X, etc) Used international data architecture standards ISO/IEC – Specification and Standardization of Data Elements Dublin Core Metadata Initiative W3C’s Resource Description Framework (RDF) from Semantic Web Community DJC-34

EDRN Knowledge Environment EDRN has been a pioneer in the use of informatics technologies to support biomarker research EDRN has developed a comprehensive infrastructure to support biomarker data management across EDRN’s distributed cancer centers Twelve institutions are sharing data Same architectural framework as planetary science It supports capture and access to a diverse set of information and results Biomarkers Proteomics Biospecimens Various technologies and data products (image, micro-satellite, …) Study Management DJC-35

Deployed EDRN System

Application to planetary science DJC-37 Often unique, one of a kind missions –Can drive technological changes Instruments are competed and developed by academic, industry and industrial partners –Highly distributed acquisition and processing across partner organizations –Highly diverse data sets given heterogeneity of the instruments and the targets (i.e. solar system) Missions are required to share science data results with the research community requiring: –Common domain information model used to drive system implementations –Expert scientific help to the user community on using the data –Peer-review of data results to ensure quality –Distribution of data to the community Planetary science data from NASA (and some international) missions is deposited into the Planetary Data System

Earth Science Data Systems Other Data Systems Catalogs Distributed Data Analysis Airborne Instruments Local Storage (Models, Data, etc) Local Storage (Models, Data, etc) Multi-mission Policies & Rules Multi-mission Policies & Rules Data Acquisition/Ingest ion Special Product Processing Environment / Computational Infra Web Portal Data Production/Proce ssing Data Integration Modeling and Visualization Facility Surface Instruments (Testbed and Operational Deployed Environments)

Application to Climate Research Highly distributed modeling and observational systems Heterogeneous implementations Different purposes But, brought together as a virtual system, provides new science discovery opportunities (Observations) (Models)

NASA & Earth System Grid

Lessons Learned A reference architecture is critical for driving a strategy and support large-scale/enterprise systems However, limited experience in organizations to build reference architectures Useful ways to represent the architecture can be tough! How detailed to make the reference architecture is an art! (Don’t let the implementation drive the RA) Products lines are useful to providing reusable components based on the reference architecture

More Lessons Learned…. Distributed service architectures Not anything new (my experience with them goes back to the early 1990s) But, often, newer technologies and approaches are seen as a panacea Technology is not a replacement for a conceptual architecture My experience is that definition of the architecture independent of technology is critical The goal should be stability in the architecture model; the selection of appropriate technology will change over time This is why an architect is much more of a strategist than a technologist

Final Thoughts Software architecture in science is critical to Reducing cost of building science data systems Building virtual organizations Constructing software product lines Driving standards Supporting new paradigms in mission operations and scientific research Science is still learning how to best leverage technology in a collaborative discovery environment, but significant progress is being made!

Resources (1) Tracz, Will. Domain-Specific Software Architecture. ACM SIGSOFT, (2) D. Crichton, S. Kelly, C. Mattmann, Q. Xiao, J. S. Hughes, J. Oh, M. Thornquist, D. Johnsey, S. Srivastava, L. Esserman, and B. Bigbee. A Distributed Information Services Architecture to Support Biomarker Discovery in Early Detection of Cancer. In Proceedings of the 2nd IEEE International Conference on e-Science and Grid Computing, pp. 44, Amsterdam, the Netherlands, December 4th-6th, 2006.A Distributed Information Services Architecture to Support Biomarker Discovery in Early Detection of Cancer (3) C. Mattmann, D. Crichton, N. Medvidovic and S. Hughes. A Software Architecture-Based Framework for Highly Distributed and Data Intensive Scientific Applications. In Proceedings of the 28th International Conference on Software Engineering (ICSE06), pp , Shanghai, China, May 20th-28th, 2006.A Software Architecture-Based Framework for Highly Distributed and Data Intensive Scientific Applications

Backup

EDRN’s Ontology Model EDRN has developed a High level ontology model for biomarker research which provides standards for the capture of biomarker information across the enterprise Specific models are derived from this high level model Model of biospecimens Model for each class of science data EDRN is specifically focusing on a granular model for annotating biomarkers, studies and scientific results EDRN has a set of EDRN Common Data Elements which is used to provide standard data elements and values for the capture and exchange of data DJC-46 EDRN Biomarker Ontology Model EDRN CDE Tools