Presentation is loading. Please wait.

Presentation is loading. Please wait.

RAARMM Atmospheric Radiation Measurement Regional Databases and Archives: the Effects of Scale… A Presentation for “Scalable Information Networks for the.

Similar presentations


Presentation on theme: "RAARMM Atmospheric Radiation Measurement Regional Databases and Archives: the Effects of Scale… A Presentation for “Scalable Information Networks for the."— Presentation transcript:

1 RAARMM Atmospheric Radiation Measurement Regional Databases and Archives: the Effects of Scale… A Presentation for “Scalable Information Networks for the Environment Workshop” October 31, 2001 San Diego, California Raymond McCord Oak Ridge National Laboratory* *Oak Ridge National Laboratory is operated by UT-Battelle, LLC, for the U.S. Department of Energy under contract DE-AC05-00OR22725

2 RAARMM Atmospheric Radiation Measurement Credits Concepts are derived from managing data for environmental projects over the past 25 years. Variations of the concepts have been observed from these disciplines. plant community research impact assessment in marine systems national acid rain surveys Environmental monitoring and cleanup projects at DOE facilities Military land use assessment Climate change research (atmospheric research) Ideas are freely traded with Dick Olson (ORNL)

3 RAARMM Atmospheric Radiation Measurement Presentation Strategy Motivation and concerns Archive overview Definition, components, functions, why & why not, examples Archives and scale Effects of scale Mitigate scale effects Generate and manage metadata Future: Archive issues to resolve

4 RAARMM Atmospheric Radiation Measurement My Motivation & Concerns Motivation Describe observations about the effects of scale on Archives Describe remedies to minimize scale effects Minimize remedy pain Concerns Preaching to the choir!! Nothing new will happen!! Continuing unnecessary limits to future science!! The enemy is our behavior. Will we change or whine???

5 Source: American Scientist,Vol 886 p 525. You can’t keep running in here and demanding data every two years Challenge: engage scientists in the process of archiving their data and provide the mechanism for archiving. Challenge: engage scientists in the process of archiving their data and provide the mechanism for archiving.

6 RAARMM Atmospheric Radiation Measurement Archives and Scale: Presumptions Regional data live in Archives Information sharing is important The archiving can be improved Archive “neurons” are metadata Multidisciplinary data will foster broader ecological discoveries The limited number of permanent data archives for ecological data will increase

7 RAARMM Atmospheric Radiation Measurement What Is an Archive?

8 RAARMM Atmospheric Radiation Measurement What Is a Data Archive? A data archive is a permanent, electronic collection of datasets with accompanying metadata such that users of the data can acquire, understand, and use the data. More than a long-term backup More than an index or catalog with pointers to datasets stored elsewhere For more details, see Michener, W. A. and J. W. Brunt. 2000. Ecological Data: Design, Management and Processing. Blackwell Science. 180 pp.

9 RAARMM Atmospheric Radiation Measurement Components of an Archive Data and metadata Storage devices Information system Network connections Staff Data/metadata preparation and review Systems development and maintenance User support

10 RAARMM Atmospheric Radiation Measurement Archive Functions Store data Submitted by others Build catalog and structure Maintain storage across technology generations Review new data (QA, metadata) “Advertise” contents Find data for users Query and browse logic Distribute data Provide access to data References to documentation

11 RAARMM Atmospheric Radiation Measurement Data Centers at ORNL CDIAC - Carbon Dioxide Information Analysis Center ARM Archive - Atmospheric Radiation Measurement Program ORNL DAAC - Distributed Active Archive Center for Biogeochemical Dynamics NARSTO - tropospheric air pollution information for North America OREIS - Oak Ridge Environmental Information System

12 RAARMM Atmospheric Radiation Measurement Atmospheric Radiation Measurement (ARM) Program ARM research questions: What happens to all of the sunlight energy? How is light absorbed by clouds? What does partly cloudy mean? Statistically? Spatially? What types of clouds form? When and How? ARM is a ‘once in a lifetime’ research adventure for atmospheric scientists ARM research includes instrumentation, system development, data analysis, and modeling (climate and process)

13 RAARMM Atmospheric Radiation Measurement ARM Measurements Scope All data collection is highly automated -- a REAL BLAST!! Data collection is now a peer outcome with scientific discovery

14 RAARMM Atmospheric Radiation Measurement ARM Archive ARM Archive stores and provides access to the entire accumulation of data Currently 5 million files and 14,000 GB and growing The ARM data in the Archive will be accessed for research for many years (decades) Currently distributes 50-100,000 files per month (100-200 GB) More information: ARM Program www.arm.gov ARM Archivewww.archive.arm.gov

15 RAARMM Atmospheric Radiation Measurement Other ARM Systems Incoming Data Files Data Reception operations meta data backup data files catalog meta data Mass Storage System ARM Archive Schematic “Archive Input & Output” Archive web User Interface query specifications date location measurement Data Retrieval file list Requested files user copy

16 Data Flow Data Metadata User Interface Core archive functions Archive Development and Maintenance Archive support Data and Metadata Submission Data/ Metadata Ingest Backup, Security, Migration Network User Request pathways User Support User interactions

17 RAARMM Atmospheric Radiation Measurement Why Archive?? “I am doing Science. Trust me.”

18 RAARMM Atmospheric Radiation Measurement Cycles of Research “An Information View” Planning Automation and review Information review Problem Definition (Research Objectives) Analysis and modeling Planning Measurement Collection Selection and extraction Archive of Data Publications Original Observations Secondary Observations 200 yrs 20 yrs

19 RAARMM Atmospheric Radiation Measurement Why Don’t I Archive My Data? No incentives - what’s in it for me? No acknowledgment - does a dataset = paper? Give up publication rights - will somebody scoop me? Poor planning - it was not in “the Plan” No resources - who’s going to pay for it? Lack of training - what do I do first? Unsure about metadata content - how much is enough?

20 RAARMM Atmospheric Radiation Measurement Why Should I Archive My Data? (management hints!!) Career advancement (give them credit) you will get some recognition you can publish data paper in ESA Ecological Archives it may help me do science with broader scope Professional incentives (give them training) good scientific practice (create peer pressure) Institutional incentives (have expectations) required by the sponsor Technological advances (give them systems) its easier and there are more options

21 RAARMM Atmospheric Radiation Measurement Archiving Supports Science Metadata required for archiving will improve data quality Extends data usefulness Increases your information base for doing research: data volume and diversity Permits replication of results A KEY concept of Science

22 RAARMM Atmospheric Radiation Measurement The Effects of Project Scale on Archives “Metadata are archive neurons??”

23 RAARMM Atmospheric Radiation Measurement Metadata Depends on Your “World View” Investigator Doesn’t need extensive formal metadata Project Metadata needed for project integration and modeling activities Project data manager may help write metadata Data archive More detailed metadata (e.g., spatial coordinates) More standardization (e.g., keywords) to communicate clearly with future users Who writes the metadata?

24 RAARMM Atmospheric Radiation Measurement Measurement (In the beginning, was the measurement. It was formless and desolate. Without context…)

25 RAARMM Atmospheric Radiation Measurement Measurement Single Experiment View date sample ID parameter name location

26 RAARMM Atmospheric Radiation Measurement Measurement Research Project View QA flag media date sample ID parameter name location

27 RAARMM Atmospheric Radiation Measurement Measurement Long-term or Multidisciplinary View QA flag media generator method date sample ID parameter name location records Units

28 RAARMM Atmospheric Radiation Measurement Measurement Integrated System & Archive View QA flag media generator method date sample ID parameter name location records Units Sample def. type date location generator lab field Method def. words, words units method Parameter def. org.type name custodian address, etc. coord. elev. type depth Record system date words, words. QA def. Units def. GIS

29 RAARMM Atmospheric Radiation Measurement Another View of Scale

30 RAARMM Atmospheric Radiation Measurement Program Project Scale and Recorded Metadata PIMetadataGroupArchive Increasing User Scope Units Method QA flag Media Parameter name Measurement Date Sample ID Location Generator Records

31 RAARMM Atmospheric Radiation Measurement Data Maturation and Scale Individual Investigators collect data, quality assure, document, analyze, publish Groups or Science Teams collate data, enhance, synthesize, model, publish Project Information System collate data, review completeness, maintain data for project Data Distribution and Archive Center long-term archive, distribute freely to users Master Data Directory searchable index with pointers to data

32 RAARMM Atmospheric Radiation Measurement Preparing for Archiving I will not wait. I will not …

33 RAARMM Atmospheric Radiation Measurement Measurement Generic Environmental Data Model (Which Piece Is First…?) QA flag media generator method date sample ID parameter name location records Units Sample def. type date location generator lab field Method def. words, words units method Parameter def. org.type name custodian address, etc. coord. elev. type depth Record system date words, words. QA def. Units def. GIS

34 RAARMM Atmospheric Radiation Measurement Measurement Sequence of Information Birth QA flag media generator method date sample ID parameter name location records Units Sample def. type date location generator lab field Method def. words, words units method Parameter def. org.type name custodian address, etc. coord. elev. type depth Record system date words, words. QA def. Units def. GIS

35 RAARMM Atmospheric Radiation Measurement Research ~ Publishing ~ Metadata  Metadata design can be a “checklist” for research planning  Metadata preparation can be integrated with publication process  Metadata are an investment in current and future science

36 RAARMM Atmospheric Radiation Measurement Where to Archive Data?

37 RAARMM Atmospheric Radiation Measurement Archive Choices What determines your options? Sponsor requirements Repository access Metadata requirements Scalable storage Personal web pages and files Project or network data centers Federal data centers Links “transcend” storage structures Master directory Mercury

38 RAARMM Atmospheric Radiation Measurement Personal Web Page Its fun, rewarding, relatively easy, can share data quickly, can control access to data Data issues?? complete metadata QA checks Connected to basic archival center functions?? ready access to data (24 h/d, 7 d/wk) user support data available on multiple media secure, backed-up, long-term storage

39 RAARMM Atmospheric Radiation Measurement ESA Ecological Archives Publishing datasets as peer reviewed, citable papers (with volume and page numbers) Data papers are announced in abstract form in a print journal with data available electronically Citation example Esser, G., H.F.H. Lieth, J.M.O. Scurlock and R.J. Olson. 2000. Osnabrück net primary productivity data set. (Ecological Archives data paper E081-011). Ecology 81, 1177-1177. Bill Michener, Editor http://esa.sdsc.edu/esapubs/Journals_main.htm

40 RAARMM Atmospheric Radiation Measurement Master Data Directory Provides search capability and pointers to a source of the data (Center does not archive data) Maintains standard keywords/indices Collects metadata from many sources Examples Global Change Master Directory (GCMD) http://gcmd.gsfc.nasa.gov ORNL DAAC Mercury System http://mercury.ornl.gov

41 6.Data and documentation are downloaded directly from the data provider 1. The data provider uses the Metadata Editor to create a metadata file containing links to the data and documentation 5. User links to data provider’s server 2. Mercury harvests the metadata and builds an index 3. Users query the index User 4. Full metadata are returned to the user, including links back to the data provider Metadata Index NASA / ORNL Data and documentation What is Mercury? Mercury is used to assist an investigator with documenting data and making these data available to others.

42 RAARMM Atmospheric Radiation Measurement Regional Archives

43 RAARMM Atmospheric Radiation Measurement Sources of Regional Data Carbon Dioxide Information Analysis Center National Geophysical Data Center National Environmental Satellite, Data, and Information Service National Soils Data Access Facility National Water Information System Forest Inventory and Analysis Breeding Bird Survey Threatened and Endangered Species Global Change Master Directory

44 NASA EOSDIS Distributed Active Archive Centers JPL U. Alaska U. Colorado EDC ORNL GSFC LaRC Cryosphere Land Processes SEDAC Socio-economic Biogeochemical Dynamics Sea Ice and Polar Processes Atmospheric Processes Upper Atmosphere, Global Biosphere, and Geophysics Ocean Circulation And Air-sea Interaction

45 Precipitation Cloud Amount Fossil Fuel Emissions Soil Carbon Topography LW Radiation Clear-Sky Albedo Vegetation Biophysics (fPAR) Global scale, 280 parameters: surface, atmospheric, fluxes

46 RAARMM Atmospheric Radiation Measurement Future: Issues to Resolve Size, diversity, and longevity Accommodating change Teaching good practices

47 RAARMM Atmospheric Radiation Measurement Issues: Size, Diversity, Longevity Size Online vs. Offline Database vs. File structure Multiple institutions Too big for technology migration?? Diversity Increased logic and documentation for “finding data” Spatial distribution Increased potential for uniqueness conflicts Longevity Too old to explain or decode Too much evolution of methods and practices Asynchronous change in data and metadata

48 RAARMM Atmospheric Radiation Measurement Issues: Planning and Requirements Plan for archiving early and ongoing Avoids missing metadata Avoids panic Improves overall data quality and consistency Consider the timing of requirements Requirements Standards: “to be or not to be?” Documentation expectations Accessibility “Its mine!! Its my data!! You CAN’T have it!!”

49 RAARMM Atmospheric Radiation Measurement Research Implies Change … repeat… New information requirements New questions Research Discovery Not always true for other information systems

50 RAARMM Atmospheric Radiation Measurement Issues: Accommodating Change Change must be considered in the design Things that will change Access expectations Logical hierarchy of information scope New parameters New disciplines New study sites New data sources or methods

51 RAARMM Atmospheric Radiation Measurement Issues: More Changes Unpredictable variation is: no excuse!! Often used as an excuse to avoid standards Cannot avoid all of it, but try… Missing values will occur; Plan ahead Do not do: Temp, temp, t, T, temperature Be clear, avoid ambiguity Minimal observational intensity is: no excuse!! Quick study = no documentation?? The unexpected are rare and most valuable??

52 RAARMM Atmospheric Radiation Measurement Rules for Creating Databases for Archiving Unique occurrences Each type of measurement is represented in a consistent way Each measurement event is represented by only one value Identifiers Each value is associated with a parameter name Each measurement value has a quality indicator and link to a method description Place and time Each value is associated with a unique place name with a quantitatively defined location (geographic coordinates) Each value is associated with a date and time Data Storage and Transport Data are stored or managed with a database management system or equivalent

53 RAARMM Atmospheric Radiation Measurement Best Practices for Preparing Ecological and Ground-Based Data Sets to Share and Archive Best Practices include Assign descriptive file names Use consistent and stable file formats Define the parameters Use consistent data organization Perform basic quality assurance Assign descriptive data set titles Provide documentation Published: Cook et al. 2001. Ecological Bulletin http://www.daac.ornl.gov/DAAC/bestpractices.html

54 RAARMM Atmospheric Radiation Measurement Reflecting Into the Future…

55 RAARMM Atmospheric Radiation Measurement Workshop Reactions Distributed (sensor) processing Yes / No Automated QA Getting data dirty Metadata early 10X easier, scalable Differentiate standards Intentional variance only Partition / isolate exceptions when possible Look for 3, 5, 10X changes 20-30% not worthwhile

56 RAARMM Atmospheric Radiation Measurement Summary Points Archives need structure and standards Social and education solutions VERY important Metadata are the “neurons” of Archives Metadata early better than late Need to think about our choices.

57 RAARMM Atmospheric Radiation Measurement Future Thoughts Will we be able to know “Where are we?” in the information structure How many 30 KB files are on a 100 GB tape cartridge? The future limits will not be technology But our minds… We need to plan NOW how to best leverage the future

58 RAARMM Atmospheric Radiation Measurement A Future Scientist’s View I told my college-age daughter about the Japanese announcement of 1 TB of optical memory in 1 cubic centimeter. Her reply: “…We need to know how to think critically and select what kinds of projects and data we need to keep because the limiting factor will be our minds, not the technology.”

59 RAARMM Atmospheric Radiation Measurement Looking Forward to a Future With Archives!!


Download ppt "RAARMM Atmospheric Radiation Measurement Regional Databases and Archives: the Effects of Scale… A Presentation for “Scalable Information Networks for the."

Similar presentations


Ads by Google