Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Management in a Non-Bibliograpic Environment: Scientific Data Joseph A. Hourclé 2007-Nov-20 FLICC

Similar presentations


Presentation on theme: "Information Management in a Non-Bibliograpic Environment: Scientific Data Joseph A. Hourclé 2007-Nov-20 FLICC"— Presentation transcript:

1 Information Management in a Non-Bibliograpic Environment: Scientific Data Joseph A. Hourclé 2007-Nov-20 FLICC Learning@Lunch

2 About Me

3 STEREO : Solar TErrestrial RElations Observatory

4 The Virtual Solar Observatory

5  Federated Search of Solar Physics Data  14 organizations (currently)  4 more organizations being integrated  62 instruments  Hundreds of distinct data collections  10s of millions of records  Terabytes of Data

6 The data is growing …  STEREO Launched Oct 2006 Launched Oct 2006 Over 1.5 million images @ up to 8MB Over 1.5 million images @ up to 8MB  Hinode (Sunrise aka Solar-B) Launched Sept 2006 Launched Sept 2006 Over 3 million images @ up to 8 MB Over 3 million images @ up to 8 MB  SDO Scheduled to launch Aug 2008 Scheduled to launch Aug 2008 1 image per second @ 32 MB 1 image per second @ 32 MB 1.5TB/day dedicated connection 1.5TB/day dedicated connection

7 Other disciplines have even more data  NVO : US National Virtual Observatory LSST (Large Synoptic Survey Telescope) LSST (Large Synoptic Survey Telescope) Scheduled to start observing in 2012 Scheduled to start observing in 2012 7-10 TB/night, 3.2Gpix images 7-10 TB/night, 3.2Gpix images ~10 PB/yr ~10 PB/yr  EOS/DIS : Earth Observing System/Data Information System About 2TB/day, per satellite (8?) About 2TB/day, per satellite (8?) Planned to be 16 PB Planned to be 16 PB

8 … and we’re not the only one  Heliospheric  Magnetospheric  Radiation Belt  ITM (upper atmosphere)  NVO / IVOA : nighttime astronomy  PDS : planetary  EOS : earth

9 What is Scientific Data?

10 How is Scientific Data Gathered?  Scientist thinks up a problem  Scientist (and Engineers) create an instrument to conduct an investigation  The instrument collects data via sensors  Data are calibrated  Data are written into scientifically useful formats  Data are distributed to the scientists

11 But really, what is data?  There is no formal definition.  It’s as ambiguous as the term “book”  Data may be shorthand for: Data Collection Data Collection Data Series Data Series Data Set Data Set Data Product Data Product Data Granule Data Granule

12 The problem with “data”  Every investigation has different data needs  Each investigation organizes and catalogs the data to answer their scientific question  What is “good” data for one group may not be useful for another  Because data is being collected continuously, there may not be a consistent boundary on one “granule” of data  Some data is tracked as individual values, and only packaged upon request Mostly time-series data, not images Mostly time-series data, not images

13 Types of Data Archives  Instrument Archives Maintained by the PI team Maintained by the PI team Little or no consideration towards re-use Little or no consideration towards re-use  Resident Archive Maintained by a specific discipline Maintained by a specific discipline Re-use within the given discipline Re-use within the given discipline  Long-Term Archive Required for federally funded studies Required for federally funded studies Focus on preservation, not use of data Focus on preservation, not use of data

14 Active Archives  Still changing May be ingesting from an active mission May be ingesting from an active mission May still be processing their data May still be processing their data May serve multiple editions or processed states of the data May serve multiple editions or processed states of the data Final Data in “Physical Units” typically isn’t available until one or more years after the mission Final Data in “Physical Units” typically isn’t available until one or more years after the mission Not directly comparable with data from other instruments until then Not directly comparable with data from other instruments until then

15 Isn’t this just Knowledge Management?  There is no knowledge in the raw data But there is knowledge in the design of the instruments & sensors But there is knowledge in the design of the instruments & sensors What spectral range are the instruments sensitive to? What spectral range are the instruments sensitive to? What are the instrument’s possible operating modes? What are the instrument’s possible operating modes? Knowledge of the instruments & sensors affect how the scientists interpret data Knowledge of the instruments & sensors affect how the scientists interpret data  The scientists have to interpret the results to determine the knowledge May be reluctant to have others catalog their data, as it requires understanding the science May be reluctant to have others catalog their data, as it requires understanding the science

16 Multiple Operating Modes: Filters on SOHO/EIT 171Å195Å 284Å304Å

17 Known Sensor Issues: SOHO/LASCO

18 Knowledge Mgmt, cnt’d  We do have ‘Event’ and ‘Feature’ Catalogs Scientists will record when/where they think something interesting is occurring, and share with others. Scientists will record when/where they think something interesting is occurring, and share with others.

19 Data Processing : Raw Image (Linear)

20 Data Processing : Calibrated (Greyscale)

21 Data Processing : Before Calibration

22 Data Processing : Best Calibration

23 Data Processing: CCD Aging

24 CCD Calibration 171Å 195Å 284Å 304Å

25 Higher Level Data

26 The Problems …  Cross discipline translation is difficult Concepts of what makes data useful differs between disciplines Concepts of what makes data useful differs between disciplines Different disciplines use different search parameters Different disciplines use different search parameters VSO : time, spectral range, location on sun VSO : time, spectral range, location on sun Always looking at the same object Always looking at the same object VHO : location of observer, time, spectral range VHO : location of observer, time, spectral range Observatories are moving, in situ measurements Observatories are moving, in situ measurements EOS : location of object observed EOS : location of object observed NVO : direction of pointing (assumed from earth) NVO : direction of pointing (assumed from earth)

27 Problems, cnt’d.  Even when there is agreement, there are still problems  Which time is important? Start time? Start time? Average time? Average time? Spacecraft time? Spacecraft time?  Which coordinate system is used?

28 Problems, still cnt’d  Each discipline is working on solutions within their field Build systems that suit the needs of their community Build systems that suit the needs of their community Each discipline has different “first class data” Each discipline has different “first class data”  Currently working on metadata standards so data can be discovered and used by other disciplines SPASE; MMI; GEON SPASE; MMI; GEON  Some work on ontologies to help with discovery and use VSTO; SWEET; GEON; SESDI VSTO; SWEET; GEON; SESDI

29 Lots of Permutations

30 I know what you’re thinking…

31 And it mostly works

32 How does this affect libraries?  The library is a changing organism Data is relatively unanalyzed in LIS Data is relatively unanalyzed in LIS Data connects to bibliographic records, and visa-versa Data connects to bibliographic records, and visa-versa What data was used in this journal article? What data was used in this journal article? Where can I get documentation on using this data? Where can I get documentation on using this data? Has anyone published anything using this data? Has anyone published anything using this data? Data connects to other data Data connects to other data What other instruments observed a given event? What other instruments observed a given event? Is there an alternate version that better meets my needs? Is there an alternate version that better meets my needs?

33 There’s funding for research  NSF: CDI : Cyber-Enabled Discovery and Innovation CDI : Cyber-Enabled Discovery and Innovation INTEROP :Community-based Data Interoperability Networks INTEROP :Community-based Data Interoperability Networks IIS : Information and Intelligent Systems IIS : Information and Intelligent Systems DataNet : Sustainable Digital Data Preservation and Access Network Partners DataNet : Sustainable Digital Data Preservation and Access Network Partners  NASA: AISR : Advanced Info. Systems Research AISR : Advanced Info. Systems Research ACCESS : Advancing Collaborative Connections for Earth Science Access ACCESS : Advancing Collaborative Connections for Earth Science Access

34 Sunspot on 15 July 2002 from the Swedish 1-m Solar Telescope on La Palma

35 http://virtualsolar.org/ http://stereo.gsfc.nasa.gov joseph.a.hourcle@nasa.gov

36

37

38


Download ppt "Information Management in a Non-Bibliograpic Environment: Scientific Data Joseph A. Hourclé 2007-Nov-20 FLICC"

Similar presentations


Ads by Google