Data Curation Issues and Challenges ARL/CNI Fall Forum 2008 Sayeed Choudhury

Slides:



Advertisements
Similar presentations
Introduction to Transportation Systems. PART I: CONTEXT, CONCEPTS AND CHARACTERIZATION.
Advertisements

Workshop goals Promote learning: –exchange info; stimulate ideas for cooperation; add to collective knowledge base Help NDIIPP/JISC plan the future: –Bring.
Data Conservancy and the US NSF DataNet Initiative 2010 JISC/CNI Conference July 1, 2010 Sayeed Choudhury Johns Hopkins University.
A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.
ARL Membership Meeting Infrastructure, Services, and Collections Sayeed Choudhury Johns Hopkins University.
Database Architectures and the Web
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Discovery: Implementing a vision for a 'virtuous' flow of metadata across the Web Joy Palmer Mimas, University of Manchester.
DANS is een instituut van KNAW en NWO Data Archiving and Networked Services The Front Office-Back Office model: supporting research data management in.
Information Types and Registries Giridhar Manepalli Corporation for National Research Initiatives Strategies for Discovering Online Data BRDI Symposium.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation,
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
SUMMARY Jane Russell Perot Systems Corp & NASA/GSFC
California Freight Advisory Committee May 14, 2014.
Data Conservancy: A Life Sciences Perspective Sayeed Choudhury Johns Hopkins University
CC 2007, 2011 attribution - R.B. Allen Information System Architectures and Services.
Overview Distributed vs. decentralized Why distributed databases
H-1 Network Management Network management is the process of controlling a complex data network to maximize its efficiency and productivity The overall.
A Robust Health Data Infrastructure P. Jon White, MD Director, Health IT Agency for Healthcare Research and Quality
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
DuraCloud Managing durable data in the cloud Michele Kimpton, Director DuraSpace.
Training Workshop Windows Azure Platform. Presentation Outline (hidden slide): Technical Level: 200 Intended Audience: Developers Objectives (what do.
Data Conservancy: A Blueprint for Libraries in the Data Age Sayeed Choudhury Johns Hopkins University
The Data Conservancy: Lessons from Astronomy Third Workshop on Data Preservation and Long Term Analysis in HEP December 7, 2009.
Managing the Record of Research At the Smithsonian Using SIdora SAA Research Forum August 12, 2014.
The International Higher Education University Research Performance Forum April 2013 – Pan Pacific Orchard, Singapore Case Study – 2.00pm – 2.45pm.
K E Y : SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Application Provider Visualization Access Analytics Curation Collection.
Distributed Visualization and Data Resources Enabling Remote Interaction and Analysis Scott A. Friedman UCLA Institute for Digital.
A River Runs Through It ARL Membership Meeting Sayeed Choudhury Sheridan Libraries, Johns Hopkins October 15, 2009.
U.S. Department of the Interior U.S. Geological Survey Next Generation Data Integration Challenges National Workshop on Large Landscape Conservation Sean.
Content in the Cloud Scalability NOVEMBER 9, :00 – 10:30 AM Conference B: Infrastructure for the CLOUD Scalability Daniel Kenyon Vice President Equilibrium.
ESIP Federation: Connecting Communities for Advancing Data, Systems, Human & Organizational Interoperability November 22, 2013 Carol Meyer Executive Director.
Large Scientific Databases. Large scientific datasets are those which are systematically collected and organized and which stretch the technical capabilites.
Informal Learning, Cyberlearning and Innovative Education Diana G. Oblinger, Ph.D.
1 Introduction to Middleware. 2 Outline What is middleware? Purpose and origin Why use it? What Middleware does? Technical details Middleware services.
Interoperability Grids, Clouds and Collaboratories Ruth Pordes Executive Director Open Science Grid, Fermilab.
Breakout #2 Generic Classes of Issues Hardware –big iron (capability, not just capacity) Network –last-mile problem –computational grid Software/frameworks.
GRID ARCHITECTURE Chintan O.Patel. CS 551 Fall 2002 Workshop 1 Software Architectures 2 What is Grid ? "...a flexible, secure, coordinated resource- sharing.
K E Y : SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Transformation Provider Visualization Access Analytics Curation Collection.
Asia Pacific Regional Council OCLC Record Use Policy Some Recent Developments OCLC Asia Pacific Regional Council Meeting National Library of Australia.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Interoperability from the e-Science Perspective Yannis Ioannidis Univ. Of Athens and ATHENA Research Center
EBSCO Information Services The Changing Nature of Collection Management in the Digital Environment: From Independence to Interdependence Dan Tonkery VP.
CSE 102 Introduction to Computer Engineering What is Computer Engineering?
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Open Access from Digital Library Viewpoint Berlin 7 Conference Sayeed Choudhury December 4, 2009.
Big Data: Industry Needs Data Scientists Data Analysts Data Infrastructure Engineers Developers (all kinds) 2-3:30, August 10, 2015 Room 261 RSC.
DuraCloud Open technologies and services for managing durable data in the cloud Michele Kimpton, CBO DuraSpace.
Distributed Data for Science Workflows Data Architecture Progress Report December 2008.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
E-COMMERCE & MOBILE COMPUTING. On Technicals… Considerations for evaluating platform Ecommerce Applications Development Process Integration Options Middlewares.
K E Y : DATA SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Hardware (Storage, Networking, etc.) Big Data Framework Scalable.
Strategy: Focus on the foundation of the service catalog Strategy : Implement a personal network Strategy : Invest in tools that empower Principle: Users.
Jenn Riley Metadata Librarian Indiana University Digital Library Program DLF Fall Forum 2009.
Ggim.un.org Positioning geospatial information to address global challenges Global and National Geodetic Reference Frames: how they are connected and why.
Windows Workflow Foundation Guy Burstein Senior Consultant Advantech – Microsoft Division
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
IoT R&I on IoT integration and platforms INTERNET OF THINGS
Institutional Repositories: The Beginning of the Journey Sayeed Choudhury Utah State IR Conference September 30, 2009.
Accessing the VI-SEEM infrastructure
Jarek Nabrzyski Director, Center for Research Computing
PV 2009 December 3, 2009 The Data Conservancy: Building Sustainable Infrastructure for Interdisciplinary Scientific Data Curation and Preservation.
Research on Data Curation and Repositories
A platform for Linked Data publishing
DIGITAL LIBRARY MANAGEMENT
ESciDoc Introduction M. Dreyer.
Metadata Construction in Collaborative Research Networks
Interoperability and data for open science
Presentation transcript:

Data Curation Issues and Challenges ARL/CNI Fall Forum 2008 Sayeed Choudhury

ARL/CNI Fall Forum 2008Sayeed Choudhury Pixel data collected by telescope Sent to Fermilab for processing Beowulf Cluster produces catalog Loaded in a SQL database Data Flow (Levels of Data) Courtesy of Alex Szalay

ARL/CNI Fall Forum 2008Sayeed Choudhury Key Considerations Work with existing scientific systems Consider gateways for these systems as part of infrastructure development Focus on both human and technical components of infrastructure Human interoperability is more difficult than technical interoperability Trust

ARL/CNI Fall Forum 2008Sayeed Choudhury Questions (1) How do we transfer principles into new practices, especially given scale and complexity? What are the fundamental differences between data and collections? Human readable vs. machine readable? What about the “cloud” or the “crowd”? Can flickr help us with data curation?

ARL/CNI Fall Forum 2008Sayeed Choudhury Questions (2) How does a partnership audit data (and associated services) distributed across the network? Are audits about “completeness” or perhaps about transparency and reliability? Where are the existing data curators? Maybe we shouldn’t use the terms data librarian or data scientist or humanist.

ARL/CNI Fall Forum 2008Sayeed Choudhury Questions (3) What are the requirements? Are there common requirements, which may be most appropriate area for libraries? Are there unifying concepts or themes? “One scientist’s noise is another scientist’s signal…” What are we trying to sustain? Data? Scholarship? Our organizations?