Presentation is loading. Please wait.

Presentation is loading. Please wait.

System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace.

Similar presentations


Presentation on theme: "System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace."— Presentation transcript:

1 System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace

2 Other USA Nodes International Nodes DataSpace High-Level Architecture Global Network (Web) Local Network Metadata Repository for Scientific Data Multiple Scientific Data Repositories (DataSpace Native Architecture) Interface to Legacy Scientific Data Repositories... Distributed Data Management Services: Security, Replication, Administration Policy Management, Workflow Services Additional Data User Services : Data Analytics Data Visualization Basic Data User Services: Discovery, Quality, Conversion, Integration Data Curation Services: Process, Catalog, Annotate, Preserve DataSpace Services MIT Node... Scientist Curator User Provides data, preliminary metadata Process and ingests data, complete metadata, and policies (e.g. retention) Searches (meta)data, accesses/integrates data, analyzes/visualizes data (via DataSpace data services or 3 rd party data services) Basic Workflow DataSpace 3 rd par 3 rd Party Specialized Data Services 2

3 PLATFORM ARCHITECTURE 2/8/2010NSF Site Visit to MIT DataSpace3 DataSpace

4 Platform Architecture Version 0.1 Version 1.0 2/8/20104NSF Site Visit to MIT DataSpace

5 2/8/20105NSF Site Visit to MIT DataSpace

6 Federated Architecture 2/8/20106NSF Site Visit to MIT DataSpace

7 Multiple Implementations 2/8/20107NSF Site Visit to MIT DataSpace

8 Federated Model Data can be widely distributed; Web-based Services can be centralized or federated – e.g. centralized, domain-specific search service that harvests metadata from relevant archives (“google for biological oceanography”) – e.g. real-time data integration across small sets of archives identified via subject search DataSpace will develop some, but more importantly create an ecosystem that others can contribute to (e.g. technology & scientific companies, universities, researchers, labs) February 8, 2010NSF Site Visit to MIT DataSpace8

9 Development Methodology Behavior-Driven Development model Continuous Integration Process – iteratative research prototyping and production implementation phases Small centralized development team to start Institutional partners add developers in years 1-2 Transparent, open source process Close collaboration with Data Conservancy 2/8/20109NSF Site Visit to MIT DataSpace

10 OPERATIONS 2/8/2010NSF Site Visit to MIT DataSpace10 DataSpace

11 Local Operations – MIT Example Scientists – data production, early-stage curation – lots of domain expertise, little or no curation expertise Libraries – outreach and recruitment (e.g. HMI study) – later-stage data curation, ingest – some domain expertise, lots of curation expertise IS&T – identifying, operating hardware & system – Enterprise systems management expertise – lots of IT expertise, some curation expertise 2/8/201011NSF Site Visit to MIT DataSpace

12 Project-Wide Operations Platform governance – distributed open source software model – transparent decision-making process Service model(s) for each institutional partner – including all data curation activities – including CI templates (e.g. hardware, cloud) – associated cost model for each service model 2/8/201012NSF Site Visit to MIT DataSpace

13 Project-Wide Operations Ongoing usability studies with researchers, students, public audiences Develop certification strategy for TDRs using DataSpace (.arc domain) 2/8/201013NSF Site Visit to MIT DataSpace

14 Data Curation Lifecycle Highlights Deposit workflows for researchers based on locally- produced data (interactive and batch) Data Curators – outreach, marketing, data recruitment – metadata creation and data ontology application – curatorial policies developed, applied – tailored preservation strategies (local, consortial, outsourced) Direct access to data creators and boots on the ground support services 2/8/2010NSF Site Visit to MIT DataSpace14

15 Data Curation Lifecycle Highlights Novel distributed, standards-based policy management strategy based on emerging Semantic Web standards and TRAC Semantic Web standards (e.g. RDF) to support improved data integration and interoperability Separation of access layer (discovery, use) from curation layer, in support of broad federation, distributed tool development 2/8/2010NSF Site Visit to MIT DataSpace15


Download ppt "System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace."

Similar presentations


Ads by Google