Data Conservancy: A Blueprint for Libraries in the Data Age Sayeed Choudhury Johns Hopkins University

Slides:



Advertisements
Similar presentations
The Data Conservancy: A Digital Research and Curation Virtual Organization D4Science World User Meeting November 25, 2009.
Advertisements

Data Conservancy and the US NSF DataNet Initiative 2010 JISC/CNI Conference July 1, 2010 Sayeed Choudhury Johns Hopkins University.
Maines Sustainability Solutions Initiative (SSI) Focuses on research of the coupled dynamics of social- ecological systems (SES) and the translation of.
Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
Libraries in the New Research Environment Joyce Ray NAS/BRDI Symposium Associate Deputy for Libraries June 3, 2010.
DataNet Federation: Data Conservancy Research Data Access and Preservation Summit April 9, 2010.
Using Sakai to Support eScience Sakai Conference June 12-14, 2007 Sayeed Choudhury Tim DiLauro, Jim Martino, Elliot Metsger, Mark Patton and David Reynolds.
Data Conservancy: A Life Sciences Perspective Sayeed Choudhury Johns Hopkins University
The "Earth Cube” Towards a National Data Infrastructure for Earth System Science Presentation at WebEx Meeting July 11, 2011.
Long-Term Preservation of Astronomical Research Results Robert Hanisch US National Virtual Observatory Space Telescope Science Institute Baltimore, MD.
Active Data Curation in Libraries: Issues and Challenges ASEE ELD Presentation June 27, 2011 William H. Mischo & Mary C. Schlembach.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
A4: Bringing the Library to the User: The Practice David Lindahl University of Rochester Libraries
The NIH Roadmap for Medical Research
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Next Generation Science Standards Update Cheryl Kleckner Education Specialist.
Supporting the CCSS in the Science Classroom through the Science and Engineering Practices of the Next Generation Science Standards (NGSS) John Spiegel.
Transforming Data-Driven Publications and Decision Support Joan L. Aron, Ph.D. Consultant Federal Big Data Working Group COM.BigData 2014.
Advances in Cyberinfrastructure with a Focus on Data: a U.S. National Science Foundation Overview Alliance for Permanent Access to Records of Science in.
The Natural Resources Digital Library Needs, Partners, and Challenges Bonnie Avery, Janine Salwasser, & Janet Webster Oregon State University.
Global Earth Observations and Scripps Institution of Oceanography Charles F. Kennel Director August 2004.
The Data Conservancy: Lessons from Astronomy Third Workshop on Data Preservation and Long Term Analysis in HEP December 7, 2009.
The Data Conservancy: A Digital Research and Curation Virtual Organization Karon Kelly National Center for Atmospheric Research – NCAR Library Special.
2005 UCAR Office of Program Annual Report Jack Fellows,UOP Director Open House. Not going over the Annual Report -- I’ll be summarizing UOP and its programs.
Library as Partner in Creating Curriculum for Sustainability Bonnie J. Smith University of Florida Libraries Maria A. Jankowska UCLA Research Library.
Computational Scientometrics Studying science by scientific means Dr. Katy Börner Cyberinfrastructure for Network Science Center, Director Information.
Illinois MSP Program Goals  To increase the content expertise of mathematics and science teachers; 4 To increase teaching skills through access to the.
1 Common Challenges Across Scientific Disciplines Laurence Field CERN 18 th November 2013.
A River Runs Through It ARL Membership Meeting Sayeed Choudhury Sheridan Libraries, Johns Hopkins October 15, 2009.
Session Chair: Peter Doorn Director, Data Archiving and Networked Services (DANS), The Netherlands.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
Education and Outreach Goals Increase Audience Awareness Facilitate Audience Engagement Along a User-Contributor Continuum Support Audience Needs.
Large Scientific Databases. Large scientific datasets are those which are systematically collected and organized and which stretch the technical capabilites.
Business Models and Economics of Sustainable Data Infrastructures Patricia Cruse University of California Curation Center California Digital Library.
Education and Outreach Overview Susan Van Gundy Core Integration NSDL Central Office, UCAR.
S2I2: Enabling grand challenge data intensive problems using future computing platforms Project Manager: Shel Swenson (USC & GATech)
Data Curation Issues and Challenges ARL/CNI Fall Forum 2008 Sayeed Choudhury
Block 7: Reports Back to Plenary Group on CE and CI Working Group Activities Tasks and Activities -- October 22 DataONE Kick-off Meeting October 20-22,
“A Library outranks any other one thing a community can do to benefit its people.” --Andrew Carnegie.
Site-Based Data Curation at Yellowstone National Park PI: Carole L. Palmer, GSLIS, CIRSS Co-PIs: Bruce Fouke, Geology, Microbiology, Institute for Genomic.
GEOSCIENCE NEEDS & CHALLENGES Dogan Seber San Diego Supercomputer Center University of California, San Diego, USA.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Open Access from Digital Library Viewpoint Berlin 7 Conference Sayeed Choudhury December 4, 2009.
ARL Workshop on New Collaborative Relationships: The Role of Academic Libraries in the Digital Data Universe September 26-27, 2006 ARL Prue.
Implementing a National Data Infrastructure: Opportunities for the BIO Community Peter McCartney Program Director Division of Biological Infrastructure.
April 14, 2005MIT Libraries Visiting Committee Libraries Strategic Plan Theme III Work to shape the future MacKenzie Smith Associate Director for Technology.
Internet2 Applications Group: Renater Group Presentation T. Charles Yun Internet2 Program Manager, Applications Group 30 October 2001.
Data Conservancy and the US NSF DataNet Initiative Fourth Workshop on Data Preservation and Long-Term Analysis in HEP Sayeed Choudhury Johns Hopkins University.
Preliminary Findings Baseline Assessment of Scientists’ Data Sharing Practices Carol Tenopir, University of Tennessee
Project number: ENVRI and the Grid Wouter Los 20/02/20161.
Forging the eXtremeDigital (XD) Program Barry I. Schneider Program Director, Office of CyberInfrastructure January 20, 2011.
CNI Task Force Meeting April 7, 2008 OAI-ORE Project Briefing David Reynolds Tim DiLauro Sayeed Choudhury Library Digital Programs Sheridan Libraries Johns.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
The Global Scene Wouter Los University of Amsterdam The Netherlands.
Data Infrastructure Building Blocks (DIBBS) NSF Solicitation Webinar -- March 3, 2016 Amy Walton, Program Director Advanced Cyberinfrastructure.
Cyberinfrastructure Overview of Demos Townsville, AU 28 – 31 March 2006 CREON/GLEON.
Institutional Repositories: The Beginning of the Journey Sayeed Choudhury Utah State IR Conference September 30, 2009.
EarthCube Sustaining the Geosciences for 21 st Century Challenges Credits: from top to bottom: NOAA Okeanos Explorer Program (CC BY-SA 2.0), NASA/Kathryn.
Data Sources & Using VIVO Data Visualizing Science VIVO provides network analysis and visualization tools to maximize the benefits afforded by the data.
Joslynn Lee – Data Science Educator
PV 2009 December 3, 2009 The Data Conservancy: Building Sustainable Infrastructure for Interdisciplinary Scientific Data Curation and Preservation.
NSDL: A New Tool for Teaching and Learning.
Packaging Specification Package Ingest Service
DataNet Collaboration
Mentoring the Next Generation of Science Gateway Developers and Users
Research on Data Curation and Repositories
ESciDoc Introduction M. Dreyer.
BCoN Data Integration Workshop, University of Kansas, Feb 13-14, 2018
Presentation transcript:

Data Conservancy: A Blueprint for Libraries in the Data Age Sayeed Choudhury Johns Hopkins University

Data Conservancy: A Blueprint for Libraries in the Data Age Sayeed Choudhury Johns Hopkins University

Data Conservancy One of two current awards through the National Science Foundation DataNet program Other award is DataONE led by William Michener at University of New Mexico Each award is $20 million, 5 year award with multiple partners

Data Curation The Data Conservancy embraces a shared vision: data curation is a means to collect, organize, validate and preserve data so that scientists can find new ways to address the grand research challenges that face society.

Goal The overarching goal of DC is to support new forms of inquiry and learning to meet these challenges through the creation, implementation, and sustained management of an integrated and comprehensive data curation strategy.

Partner institutions Johns Hopkins University (Lead institution) Cornell University DuraSpace Marine Biological Laboratory National Center for Atmospheric Research National Snow and Ice Data Center Portico Tessella, Inc. University of California Los Angeles University of Illinois at Urbana-Champaign

…not a rigid road map but principles of navigation. There is no one way to design cyberinfrastructure, but there are tools we can teach the designers to help them appreciate the true size of the solution space – which is often much larger than they may think, if they are tied into technical fixes for all problems.

Principles Our strategy focuses on connection of systems into infrastructure through a program informed by user- centered design and research, sustained through a portfolio of funding streams, and managed through a shared, coordinated governance structure. Build on existing exemplar scientific projects, communities and virtual organizations that have deep engagement with citizen scientists and extensive experience with large-scale, distributed system development

Objectives Infrastructure research and development – Technical requirements Information science and computer science research – Scientific or user requirements Broader impacts – Educational requirements Sustainability – Business requirements

Pixel data collected by telescope Sent to Fermilab for processing Beowulf Cluster produces catalog Loaded in a SQL database Data Flow (Levels of Data) 10

Domain coverage/methods Multi-site user research methods are a blend of: – Case study & domain comparisons – Depth & breadth – Local & global AstronomyEarth SciencesLife SciencesSocial Sciences UCAR Task-based design and usability testing  Use cases, data requirements, system recommendations UCAR UCLAEthnography, virtual ethnography, oral histories  Use cases, data requirements Interviews, Surveys, Worksheets, Content analysis  Curation requirements, taxonomy, metadata/provenance framework UIUC

Information science research

Data Framework Start with a common conceptualization that applies across scientific domains Exploit semantic technologies Leverage existing work Prototype the framework in target communities – Iteratively refine, learn from experience – Demonstrate success, measured in terms of new science

Common Conceptualization Observations are the foundation of all scientific studies, and are the closest approximation to facts. Wiens, J. A. (1992). Cambridge studies in ecology: The ecology of bird communities. Foundations and Patterns, 1; Processes and Variations, 2

Emergence Emergence: The Connected Lives of Ants, Brains, Cities, and Software by Steven Johnson The movement from low-level rules to higher- level sophistication is what we call emergence.

Data Model using OAI-ORE

Acknowledgements Carole Palmer (information science slides) Carl Lagoze (Data Framework slides) Alex Szalay (Data Flow) Tim DiLauro (OAI-ORE) NLG grant award LG Office of Cyberinfrastructure DataNet Award #