Overview – NSF Site Visit 8 February 2010 1 (v8) DataSpace.

Slides:



Advertisements
Similar presentations
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Advertisements

The Future of Scholarship in the Digital Age: The Role of Institutional Repositories Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
Near East Plant Protection Network for Regional Cooperation & Knowledge Sharing Food and Agriculture Organization of the United Nations An Overview on.
Joint CASC/CCI Workshop Report Strategic and Tactical Recommendations EDUCAUSE Campus Cyberinfrastructure Working Group Coalition for Academic Scientific.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
The Digital Preservation Network at UT Austin Chris Jordan Texas Advanced Computing Center.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
Riding the Wave: a Perspective for Today and the Future APA Conference, November 2011 Monica Marinucci EMEA Director for Research, Oracle.
DPN Digital Preservation Network. Digital Preservation.
An Overview of eResearch Activities in Australia Paul Davis, GrangeNet Jane Hunter, Uni of Qld.
January 2006DSpace User Group Meeting, Sydney, Australia DSpace development from MIT's Digital Library Research Program MacKenzie Smith Associate Director.
MacKenzie Smith Associate Director for Technology MIT Libraries.
Biological Oceanography Scientific Domain Ed DeLong MIT Department of Biological Engineering Department of Civil and Environmental Engineering DataSpace.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
Project Management NSF DataNet site visit to MIT February 8, 2010 DataSpace February NSF Site Visit to MIT DataSpace.
Institutional Perspective on Credit Systems for Research Data MacKenzie Smith Research Director, MIT Libraries.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
Australian Partnership for Sustainable Repositories AUSTRALIAN PARTNERSHIP FOR SUSTAINABLE REPOSITORIES Caul Meeting 2005/2 Brisbane 15.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Tyler O. Walters, Associate Director, Technology & Resource Services Library & Information Center, Georgia Institute of Technology For NSF Site Visit to.
John McDougall, President 10 th Annual Re$earch Money Conference, 11 May 2011.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
DuraCloud A service provided by Sandy Payette and Michele Kimpton.
DuraCloud Managing durable data in the cloud Michele Kimpton, Director DuraSpace.
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
A Proposal for a Distributed Earth Observation Data Network Matthew B Jones UC Santa Barbara National Center for Ecological Analysis and Synthesis (NCEAS)
Designing the Microbial Research Commons: An International Symposium Overview National Academy of Sciences Washington, DC October 8-9, 2009 Cathy H. Wu.
Research Data Management Services Katherine McNeill Social Sciences Librarians Boot Camp June 1, 2012.
Session Chair: Peter Doorn Director, Data Archiving and Networked Services (DANS), The Netherlands.
Preserving Digital Collections for Future Scholarship Oya Y. Rieger Cornell University
ESIP Federation: Connecting Communities for Advancing Data, Systems, Human & Organizational Interoperability November 22, 2013 Carol Meyer Executive Director.
Internet2 Middleware Initiative. Discussion Outline  What is Middleware why is it important why is it hard  What are the major components of middleware.
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
National Science Foundation Revolutionizing science and engineering research though cyberinfrastructure by David G. Messerschmitt Member, NSF Blue Ribbon.
SHARE (SHared Access Research Ecosystem) Tyler Walters Co-Chair, SHARE Steering Group (a joint committee of the ARL, the AAU, and the APLU) Eric Celeste.
HPC Centres and Strategies for Advancing Computational Science in Academic Institutions Organisers: Dan Katz – University of Chicago Gabrielle Allen –
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Symposium on Global Scientific Data Infrastructures Panel Two: Stakeholder Communities in the DWF Ann Wolpert, Massachusetts Institute of Technology Board.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
A Data Centre for Science and Industry Roadmap. INNOVATION NETWORKING DATA PROCESSING DATA REPOSITORY.
Research Information Management: Continuity, Change and Impact Michael Jubb Research Information Network UUK Workshop 5 December 2007.
Marv Adams Chief Information Officer November 29, 2001.
UKOLN is supported by: Introduction to UKOLN Dr Liz Lyon, Director UKOLN, University of Bath, UK Grand Challenge Meeting, June a centre.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
ARL Workshop on New Collaborative Relationships: The Role of Academic Libraries in the Digital Data Universe September 26-27, 2006 ARL Prue.
DuraCloud Open technologies and services for managing durable data in the cloud Michele Kimpton, CBO DuraSpace.
Chaitan Baru Senior Advisor for Data Science CISE Directorate National Science Foundation NIEHS Webinar October 27, 2015 Image Credit: Exploratorium. Integrating.
Institutional Repositories: the DSpace Experience Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
April 14, 2005MIT Libraries Visiting Committee Libraries Strategic Plan Theme III Work to shape the future MacKenzie Smith Associate Director for Technology.
System Development & Operations NSF DataNet site visit to MIT February 8, /8/20101NSF Site Visit to MIT DataSpace DataSpace.
California Digital Library Managing and Federating e-Print Repositories: UC’s eScholarship Initiatives CNI Fall Task Force Meeting December 1999 John Ober.
National Geospatial Enterprise Architecture N S D I National Spatial Data Infrastructure An Architectural Process Overview Presented by Eliot Christian.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI strategy and Grand Vision Ludek Matyska EGI Council Chair EGI InSPIRE.
LYRASIS Leadership Forum 2016 Welcome wifi MoMIEducation, MuseumEducation Your hosts: Robert Miller, Jay Schafer, Michele Kimpton, John Herbert, Celeste.
FROM PRINCIPLE TO PRACTICE: Implementing the Principles for Digital Development Perspectives and Recommendations from the Practitioner Community.
NETWORKS OF EXCELLENCE KEY ISSUES David Fuegi
Human Social Dynamics: Interoperability Strategies for Scientific Cyberinfrastructure: The Comparative Interoperability Project ( ) initiates a.
Data Sources & Using VIVO Data Visualizing Science VIVO provides network analysis and visualization tools to maximize the benefits afforded by the data.
NSF DataNet Initiative
LYRASIS Leadership Forum 2016 Welcome wifi code:
OceanDocs Digital Repository of Marine Science Research Outputs
DataNet Collaboration
DRIVER Digital Repository Infrastructure Vision for European Research
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Bird of Feather Session
Wrap-Up – NSF Site Visit 8 February 2010
Australian and New Zealand Metadata Working Group
Presentation transcript:

Overview – NSF Site Visit 8 February (v8) DataSpace

Vision “To bring the dramatic benefits of the Web to scientists – comparable to the benefits the Web has had to commerce and other areas”... Not just in the impact to science, but also a similar distributed federated ecosystem for: – Technology Infrastructure – Organizational Responsibilities View: Research data-generating institutions and their libraries should play an active role in curating their researchers’ data – Financial and Technical Sustainability – Openness: 3 rd party extension and Open Source development Support research across all domains, but initially: – Neuroscience – Biological Oceanography 2

Other USA Nodes International Nodes DataSpace High-Level Architecture Global Network (Web) Local Network Metadata Repository for Scientific Data Multiple Scientific Data Repositories (DataSpace Native Architecture) Interface to Legacy Scientific Data Repositories... Distributed Data Management Services: Security, Replication, Administration Policy Management, Workflow Services Additional Data User Services : Data Analytics Data Visualization Basic Data User Services: Discovery, Quality, Conversion, Integration Data Curation Services: Process, Catalog, Annotate, Preserve DataSpace Services MIT Node... Scientist Curator User Provides data, preliminary metadata Process and ingests data, complete metadata, and policies (e.g. retention) Searches (meta)data, accesses/integrates data, analyzes/visualizes data (via DataSpace data services or 3 rd party data services) Basic Workflow DataSpace 3 rd par 3 rd Party Specialized Data Services 3

Initial Scientific Domains Chosen Neuroscience and Biological Oceanography – Sciences with complex interdisciplinary sub-domains – Different and diverse types of scientific data Though some aspects of overlap (genetic data) – Faced with challenges related to Data expression, encoding, sharing, integration, visualizing, and preserving Difficult to perform research that crosses sub-domains or requires multi-source data – Can build on existing collections and collaborations But must also address technical, social, and legal issues Will bring in additional domains over time 4

DataSpace Organizational Structure (preliminary)

Some Key Goals for First Year Complete hiring and staffing Design and development of DataSpace v1 (Interim architecture) – Build on existing software base (DSpace, Fedora) – Addition of initial DataSpace middleware Ingest of initial Neuroscience and Biological Oceanography data – Selection/development of ontologies – Recording of metadata (including preservation policies, etc.) Establish operational DataSpace v1 – Service models defined with partner nodes Design of DataSpace long-term architecture – Initial results from research groups for v2 Initial results of Business Development Management Team Educational and Outreach efforts (i-schools, OCW) 6

Sustainability Approach Core to Financial Sustainability – Provide maximum value to science – Minimize cost to any one organization by broad distribution Can actually reduce costs by eliminating duplication and inefficiencies – Build on the long-standing role and sustainability of libraries – Follows Web/Internet value (to both large and small orgs) Worldwide infrastructure, costs widely shared Technological Sustainability – Open Source software, multiple implementations possible, and encourage 3 rd party augmentation – Participation of commercial technology company partners Some Resources: DataSpace Federation, Partner experiences, Business Development Management Team (working with MIT Entrepreneurship Center, E&I students, etc.) 7

Some Key Features of the DataSpace Proposal Distributed federated infrastructure for accessibility & long-term preservation – Address privacy, property and data rights, etc. with legal and policy framework Builds on successful Dspace/Fedora platform Proposes new top-level internet domain (".arc") Addresses need for “temporal semantics” and other advanced metadata Risk mitigation: Research risk: Personnel with extensive experience. Operational risk and sustainability: Distributed design and federated approach. Public/Private Partnership: Corporate partners help build more sustainable ecosystem and ensure sustainability, MIT Entrepreneurship Center, etc. Expert Advisory Board: Diverse fields (i.e. science, law, business, technology, libraries, and digital preservation) advise and promote the project Advances scholarly communications through data/publication integration Advances educational technology through data/courseware integration Outreach to minority and pre-college student, underserved small and medium research groups. DataSpace will be a truly transformational project 8

Multi-disciplinary team of Principal Investigators Hal Abelson, MIT Computer Science & Artificial Intelligence Laboratory (CSAIL) Ed DeLong, MIT Departments of Civil and Environment Engineering and Biological Engineering John Gabrieli, MIT Department of Brain and Cognitive Sciences Stuart Madnick, MIT Sloan School of Management & School of Engineering MacKenzie Smith, MIT Libraries Marilyn T. Smith, Director, MIT Information Systems & Technology (IS&T) [replaces Jerry Grochow] 9

Diverse and Experienced Senior Personnel Timothy Berners-Lee (W3C, WSRI) Alon Halevy (Google) Geneva Henry (Rice University) Mei Hsu (HP) David Karger (MIT) Michele Kimpton (DSpace Foundation) Thomas Malone (MIT) Dejan Milojicic (HP) [replaces John Erickson] Joe Pato (HP) 10 Terry Reese (Oregon State University) Michael Siegel (MIT) Stephen Todd (EMC) Tyler Walters (Georgia Tech) Danny Weitzner (W3C, WSRI) Steve White (Microsoft) [addition to team] John Wilbanks (Science Commons) Wei Lee Woon (MIST, Abu Dhabi)

Advisory Board Christine L. Borgman, Department of Information Studies, Graduate School of Education and Information Science, UCLA Randy Buckner, Psychology, Harvard University Scott Doney, Marine Chemistry & Geochemistry, Woods Hole Oceanographic Institution Keith Jeffery, European Research Consortium of Informatics and Mathematics (ERCIM) and UK Rutherford Appleton Laboratory Liz Lyon, UKOLN and UK Digital Curation Centre Ed Roberts, Management of Technology, MIT Sloan School of Management and MIT Entrepreneurship Center Pam Samuelson, School of Information and School of Law, UC Berkeley Dan Schutzer, Financial Services Technical Consortium (FSTC) Andrew Treloar, ARCHER Project, Australian National Data Service, Monash University, Australia Wanda Orlikowski, Information Technologies and Organization Studies, MIT Sloan School of Management 11

12

Backup Slides 13

1. New types of science enabled Enhance scientific interdisciplinarity and innovation via standards-based data architecture and broad adoption Disciplines: Neuroscience and Biological Oceanography (a) Science and education goals help – Library and Computer Science goals: minimize duplication of effort, maximize access to prior work, improve interoperation and quality – Education goals: disseminated through multiple means (OCW) to enable semantic tagging of data and reuse of data (b) Metrics of Success – Usage: number groups contributing and using, amount and diversity of data shared and used, etc. – Impact: Publications, discoveries 14

Neuroscience Domain Address questions, such as “Variation of cognitive and emotions traits due to age?” Future requires access to large datasets, but – Broadly distributed across many organizations – Diverse types: DTI, fMRI, structural MRI, VBM – Difficult to aggregate and annotate Initial organizations include – Martinos Imaging Center (at MIT) – Center for Advanced Brain Imaging (Georgia Tech) – Collaboration with Microsoft 15

Biological Oceanography Domain Address questions such as “How does change in ocean current cause proliferation of microbial groups that, in turn, influence flux of carbon into and out of the sea?” Need to interrelate diverse datasets – Scale: from genome to biomes – Types: 4D physical and biological oceanographic, satellite, genomic, metagenomic, taxonomic, nutrient analysis, bio-optical – From diverse sources DataSpace will enable research not possible today 16

2. Value to Previous Investments For selected domains: Resources to organize, annotate, archive, and publish existing data – Curated by partnership with library data curators – Improve collaborations, e.g., C-MORE (interrelate difficult) – Address complex legal, political, and social realities – Sustainability by providing significant new value to scientists (e.g. ease of search, data integration, reuse) (a) What data contributors gain from DataSpace – More efficiently archive and reutilize their own data – Able to utilize vast amounts of data from other sources – Over time, will be respected academic achievement (citations) (b) Investment utilized and enhanced – Significant prior R&D by team members, e.g., Dspace, temporal semantics, data quality and provenance, policy and legal, etc 17

3. Barriers to Implementation and Adoption In past, scientists often don’t participate because: – Insufficient time and expertise (which we address via better functionality and assistance from curators) – Insufficient value back (which we address through re-use, etc) Some points: – Demonstrable ease-of-use and value Especially sciences that are struggling with these problems Examples from Neuroscience and Biological Oceanography – Dedicated data curators – DataSpace Federation to represent collective needs – Openness: encourages scientific innovation and evolution – Support for policy and legal issues – Team has experience evolving systems (W3C, Dspace, etc.) 18

4. Cyberinfrastructure, Technical Sustainability Much of DataSpace cyberinfrastructure builds on prior work (e.g. Dspace) and adds: (a) archive, (b) annotate to enable discovery and re-use, (c) interoperate with Ed Tech, “citizen science,” etc. Technical sustainability: Software free and open source – establish architecture and standards – Project will provide at least one reference implementation – Enable multiple implementations (including commercial) Will develop cost and service models as exemplars – Institutions already expand large amounts – DataSpace will streamline, rationalize, distribute costs – Libraries have stood the test of time – Additional business models 1 st Year Goal: Initial system and ingest of data, test interop 19

5. Manage Program, Providers, International Experience with highly distributed projects (Dspace) Management – see organization chart – Multiple levels and multiple sub-groups – Public/private partnership to insure industrial adoption and relevance to other sectors – Added management and data expertise from Advisory Board Data providers and assured participation – Data initially from partners (Georgia Tech, MIT, OSU, Rice) Already communicating with scientists – Then extend more broadly, initially to the DSpace community International Counterparts: (1) direct collaboration (DuraSpace), (2) International partner (MIST), (3) International corporations (EMC, Google, HP, Microsoft), (4) Advisory Board, (5) indirect collaborations (C-MORE) 20