Presentation is loading. Please wait.

Presentation is loading. Please wait.

What Makes a Data Archive Tick: Marrying Content and User Support Steven Worley National Center for Atmospheric Research Computational and Information.

Similar presentations


Presentation on theme: "What Makes a Data Archive Tick: Marrying Content and User Support Steven Worley National Center for Atmospheric Research Computational and Information."— Presentation transcript:

1 What Makes a Data Archive Tick: Marrying Content and User Support Steven Worley National Center for Atmospheric Research Computational and Information Systems Laboratory May 17-21, 2010 Summer Institute for Data Curation for Earth and Environmental Science Graduate School of Library and Information Science University of Illinois, Urbana-Champaign

2 ①How to make and keep the archive content relevant to the users? ②How to engage the users?

3 How to make and keep the archive content relevant to the users? Know your users  Define your focus community  Cannot serve everyone  Design service not to limit others  At decision points (e.g. changes in service) ask:  “Is this a significant benefit for my users?”  The case @ NCAR  Atmospheric, oceanic, and some related geo-science research  Graduate students and higher education  NCAR scientists, researchers @ universities with graduate degree programs in meteorology and oceanography  Over 50% of 6000+ unique users, annually, are outside focus group

4 Understand their science, currently, and trends  Attend seminars, symposia, meetings where they present their work  Corollary: Have science educated staff  The case @ NCAR – Research Data Archive How to make and keep the archive content relevant to the users? All have MS degrees, or greater meteorology (6) oceanography (2) computing science (1) exception – admin. (1)

5 Understand their science, currently, and trends  Routinely review journals, bulletins, and relevant news letters  Search for science strongly dependent on your data focus  Contact authors, offer data sharing service  @ NCAR How to make and keep the archive content relevant to the users?

6 Understand their science, currently, and trends  Develop close contacts with a few key users  Seek ‘honest’ opinions about your service  Make your service known – presentations, publications  @ NCAR How to make and keep the archive content relevant to the users?

7 Know how your users work  How do they prefer to handle data?  Digital files – write and run program codes to evaluate content  Digital files – specific formats that are application friendly  E.g. netCDF, GIS, WMO  ASCII text convenient for worksheets  Images of analyses (charts, line graphs, 2D/3D contoured plots)  @NCAR  Digital files are key  Some images for discovery, but not critical  Design the systems to deliver what users want How to make and keep the archive content relevant to the users?

8 Choosing the content  At decision points (e.g. adding a new dataset) ask:  “Can we handle this efficiently?”  Does it supplement or extend the central data foci?  Does it address a new need or trend?  Are the formats aligned with user preferences?  If not, can we make a cost effective conversion?  Do you have staff (data scientists / stewards) that can understand the scientific content?  @ NCAR  Atmospheric, oceanic, related geo-sciences observations or analyses derived from observations to support climate and weather research. How to make and keep the archive content relevant to the users?

9 Choosing the content  Evaluate user metrics  What datasets are most popular?  Who is using what – can you distinguish your focus group?  Are there any trends?  Caution: this is only part of the story  @ NCAR  Our user registration allows us to track this  Examples How to make and keep the archive content relevant to the users?

10 Unique Users by service path Users in four service categories  MSS to CISL HPC environment  Web to world-wide community  Orders – one off consulting assisted data preparation  TIGGE 6 thousand users annually  FY09: MSS=266, Web=5649, Orders=196, TIGGE=44

11 Amount of data by service path Users in four service categories  MSS to CISL HPC environment  Web to world-wide community  Orders – one off consulting assisted data preparation  TIGGE 162 TB in FY09  FY09: MSS=31, Web=120, Orders=9, TIGGE=2

12 User ranked popular datasets 7 May 201012NCAR-CSM Symposium on Climate and Energy Unique users FY09datasetsTitles 2878 ds082.0, ds083.2, ds083.0NCEP FNL Operational Model Global Tropospheric Analyses 924ds090.0NCEP/NCAR Global Reanalysis Products 510 ds758.0, ds759.3, ds759.2NGDC Global 2' and 5' Elevations, USGS 30 ARC-second 477 ds461.0, ds351.0 ds337.0, ds464.0,ds353.4NCEP ADP/PREPBUFR Global Surface and Upper Air Observations 358ds608.0NCEP North American Regional Reanalysis (NARR) 264ds609.2GCIP NCEP ETA model output 262 ds540.1, ds540.0International Comprehensive Ocean-Atmosphere Data Set (ICOADS) 190ds744.4QSCAT/NCEP Blended Ocean Winds 173ds277.0NCEP V2.0 OI Global SST, V3.0 Extended Reconstructed Analyses 153 ds335.0, ds336.0Unidata (IDD) Observations and Model Data 5921All DatasetsAll DSS datasets Top 10 datasets/groups FY09 ~ 6000 Unique Users Annually

13 Remain flexible – expect constant change  Be ready to take opportunities when they come along  Re-adjust priorities  Resist ‘tight’ mission control  Take advice from advisory groups, but don’t depend on them exclusively  Use holistic approach  @ NCAR, unplanned for example  Arctic System Reanalysis – NSF sponsored research critical to assess the changes happening in the Arctic  Need controlled access to first prototype data – We do this! How to make and keep the archive content relevant to the users?

14 Sustaining for the long-term  Richness and data value grow over time  Data assets tend to compliment each other – add value to many different research questions  Scientific publications lead to broader and increased interest  Definitive data citation is a work in progress  Staffing needs to be base/core funded  Grant directed funding can lead to a fractured, ad hoc, incomplete archive  Can be a major frustration for users  @ NCAR – the Research Data Archive  Began 40+ years ago  Today sustained by 9 persons How to make and keep the archive content relevant to the users?

15 Collaborations  Participate/volunteer for committees and panels that tackle data issues (all sorts)  Learn from others, share knowledge  Share efforts and data with other organizations  No one group can do it all (don’t have resources and all expertise required)  @ NCAR (conf. like SIDC for EES)  Volunteerism: NAS, AMS, NOAA, WMO, NASA  National and International data agreements with:  European Centre for Medium Range Forecasting  Japanese Meteorological Administration  U.S. National Weather Service, National Center for Environmental Prediction How to make and keep the archive content relevant to the users?

16 How to Engage the Users? Data Discovery – how can people find you? All 600+ RDA Datasets have metadata in GCMD Automatically, exported via OAI – PMH Similarly: RDA > CDP@NCAR > BADC in UK

17 How to Engage the Users? Design your portal to evolve – it will/should 2002 Search Navigation List of menus Unique layout of links Picture of people

18 How to Engage the Users? 2008 Search Two ways Navigation Links News Text People

19 How to Engage the Users?

20 7 May 201020NCAR-CSM Symposium on Climate and Energy Primary design feature for web portal Data Discovery – Find Data! How to Engage the Users? 2010 All about search Gone from top people text news

21 How to Engage the Users? Navigation once they arrive  Working principles  Uniform across web portal  Keep organizational elements out of prime visual territory  @ NCAR  Have user registration – only required to get data  All discovery metadata open – unlimited searching

22 How to Engage the Users? The complete data knowledge package, and data cycle  What is a complete data knowledge package?  Rich metadata plus the data files!  One example  http://dss.ucar.edu/datasets/ds277.0/

23 How to Engage the Users? The pieces that make rich metadata  Dataset navigation (Access, Documentation, Software)  Title  Summary

24 How to Engage the Users? The pieces that make rich metadata  Period of data record  Update cycle  Scientific parameters (Variables)  Earth reference levels

25 How to Engage the Users? The pieces that make rich metadata  Times – temporal increment  Data types – points or grids  Geo-spatial coverage  Source organizations

26 How to Engage the Users? The pieces that make rich metadata  Related Internet sites  Publications  Acknowledgement statement

27 How to Engage the Users? The pieces that make rich metadata  Volume – size of the dataset  Data formats  Related datasets in the NCAR collection  Consulting contact (email and phone)  A 2 nd pointer to Data Access

28 How to Engage the Users? The complete data knowledge package, and data cycle Data Cycle Facts  Datasets are re-published – new versions.  Datasets are corrected and extended in time or space.  Scientific analysis and publication will occur randomly along the data cycle. Data referencing is more challenging than traditional publication referencing because of the data cycle. How can you accurately trace/recover what has been used for publication ?

29 How to Engage the Users? The complete data knowledge package, and data cycle  @ NCAR  Don’t have systematic (organization-wide) way to handle the data cycle  We do not discard/delete old versions of data  Ad hoc approach  Currently, building a version tracking software  Versioning will be included in DOI implementation

30 How to Engage the Users? Consultation Critical two-way communication  1. Benefits for the user  Guidance to best available datasets  Consolidate research ideas into required data sources  Software assistance  Customized data preparation if necessary  2. Benefits to the archive stewardship  Detect ways to improve our search process  Learn about data requirement trends  Occasionally, acquire new data resources from scientific efforts  Learn about data problems we might have

31 How to Engage the Users? Provide research tool support and documentation  Provide users a starting point for data evaluation  Simple access programs – the languages used by the focus community  Pointers to applications (IDL, MatLab, NCL, NCO, etc.)  Specific example are VERY helpful!  Must maintain software/applications and documentation for the long-term.  Guarantee users will understand the meaning and have access.

32 How to Engage the Users? Provide research tool support and documentation  @ NCAR  Remain aware of proprietary software taps,  E.g. for documents  will.xls be viable 50 years from now -.xlsx is now standard? Is.pdf any better?  Prefer data file formats that define everything to the byte/bit level  Computer code could always be written to access these.  All kinds of reports, project descriptions, and documents that explain the intent of the data are vital for the long-term.  Use dedicated document directories for each datasets

33 How to Engage the Users? Follow-up aid  Notification service for significant dataset changes  If an error is corrected – should notify all users of the data  Subscription service  Inform users when new data is available  Prepare special products based on user determined template – e.g. past requests  @ NCAR  We have automated notification service  Provided users register accurately  We do not have subscription service - yet

34 ①How to make and keep the archive content relevant to the users? ②How to engage the users? http://dss.ucar.edu/


Download ppt "What Makes a Data Archive Tick: Marrying Content and User Support Steven Worley National Center for Atmospheric Research Computational and Information."

Similar presentations


Ads by Google