Download presentation
Presentation is loading. Please wait.
Published byJayson Gibson Modified over 8 years ago
1
Information Management in a Non-Bibliograpic Environment: Scientific Data Joseph A. Hourclé 2007-Nov-20 FLICC Learning@Lunch
2
About Me
3
STEREO : Solar TErrestrial RElations Observatory
4
The Virtual Solar Observatory
5
Federated Search of Solar Physics Data 14 organizations (currently) 4 more organizations being integrated 62 instruments Hundreds of distinct data collections 10s of millions of records Terabytes of Data
6
The data is growing … STEREO Launched Oct 2006 Launched Oct 2006 Over 1.5 million images @ up to 8MB Over 1.5 million images @ up to 8MB Hinode (Sunrise aka Solar-B) Launched Sept 2006 Launched Sept 2006 Over 3 million images @ up to 8 MB Over 3 million images @ up to 8 MB SDO Scheduled to launch Aug 2008 Scheduled to launch Aug 2008 1 image per second @ 32 MB 1 image per second @ 32 MB 1.5TB/day dedicated connection 1.5TB/day dedicated connection
7
Other disciplines have even more data NVO : US National Virtual Observatory LSST (Large Synoptic Survey Telescope) LSST (Large Synoptic Survey Telescope) Scheduled to start observing in 2012 Scheduled to start observing in 2012 7-10 TB/night, 3.2Gpix images 7-10 TB/night, 3.2Gpix images ~10 PB/yr ~10 PB/yr EOS/DIS : Earth Observing System/Data Information System About 2TB/day, per satellite (8?) About 2TB/day, per satellite (8?) Planned to be 16 PB Planned to be 16 PB
8
… and we’re not the only one Heliospheric Magnetospheric Radiation Belt ITM (upper atmosphere) NVO / IVOA : nighttime astronomy PDS : planetary EOS : earth
9
What is Scientific Data?
10
How is Scientific Data Gathered? Scientist thinks up a problem Scientist (and Engineers) create an instrument to conduct an investigation The instrument collects data via sensors Data are calibrated Data are written into scientifically useful formats Data are distributed to the scientists
11
But really, what is data? There is no formal definition. It’s as ambiguous as the term “book” Data may be shorthand for: Data Collection Data Collection Data Series Data Series Data Set Data Set Data Product Data Product Data Granule Data Granule
12
The problem with “data” Every investigation has different data needs Each investigation organizes and catalogs the data to answer their scientific question What is “good” data for one group may not be useful for another Because data is being collected continuously, there may not be a consistent boundary on one “granule” of data Some data is tracked as individual values, and only packaged upon request Mostly time-series data, not images Mostly time-series data, not images
13
Types of Data Archives Instrument Archives Maintained by the PI team Maintained by the PI team Little or no consideration towards re-use Little or no consideration towards re-use Resident Archive Maintained by a specific discipline Maintained by a specific discipline Re-use within the given discipline Re-use within the given discipline Long-Term Archive Required for federally funded studies Required for federally funded studies Focus on preservation, not use of data Focus on preservation, not use of data
14
Active Archives Still changing May be ingesting from an active mission May be ingesting from an active mission May still be processing their data May still be processing their data May serve multiple editions or processed states of the data May serve multiple editions or processed states of the data Final Data in “Physical Units” typically isn’t available until one or more years after the mission Final Data in “Physical Units” typically isn’t available until one or more years after the mission Not directly comparable with data from other instruments until then Not directly comparable with data from other instruments until then
15
Isn’t this just Knowledge Management? There is no knowledge in the raw data But there is knowledge in the design of the instruments & sensors But there is knowledge in the design of the instruments & sensors What spectral range are the instruments sensitive to? What spectral range are the instruments sensitive to? What are the instrument’s possible operating modes? What are the instrument’s possible operating modes? Knowledge of the instruments & sensors affect how the scientists interpret data Knowledge of the instruments & sensors affect how the scientists interpret data The scientists have to interpret the results to determine the knowledge May be reluctant to have others catalog their data, as it requires understanding the science May be reluctant to have others catalog their data, as it requires understanding the science
16
Multiple Operating Modes: Filters on SOHO/EIT 171Å195Å 284Å304Å
17
Known Sensor Issues: SOHO/LASCO
18
Knowledge Mgmt, cnt’d We do have ‘Event’ and ‘Feature’ Catalogs Scientists will record when/where they think something interesting is occurring, and share with others. Scientists will record when/where they think something interesting is occurring, and share with others.
19
Data Processing : Raw Image (Linear)
20
Data Processing : Calibrated (Greyscale)
21
Data Processing : Before Calibration
22
Data Processing : Best Calibration
23
Data Processing: CCD Aging
24
CCD Calibration 171Å 195Å 284Å 304Å
25
Higher Level Data
26
The Problems … Cross discipline translation is difficult Concepts of what makes data useful differs between disciplines Concepts of what makes data useful differs between disciplines Different disciplines use different search parameters Different disciplines use different search parameters VSO : time, spectral range, location on sun VSO : time, spectral range, location on sun Always looking at the same object Always looking at the same object VHO : location of observer, time, spectral range VHO : location of observer, time, spectral range Observatories are moving, in situ measurements Observatories are moving, in situ measurements EOS : location of object observed EOS : location of object observed NVO : direction of pointing (assumed from earth) NVO : direction of pointing (assumed from earth)
27
Problems, cnt’d. Even when there is agreement, there are still problems Which time is important? Start time? Start time? Average time? Average time? Spacecraft time? Spacecraft time? Which coordinate system is used?
28
Problems, still cnt’d Each discipline is working on solutions within their field Build systems that suit the needs of their community Build systems that suit the needs of their community Each discipline has different “first class data” Each discipline has different “first class data” Currently working on metadata standards so data can be discovered and used by other disciplines SPASE; MMI; GEON SPASE; MMI; GEON Some work on ontologies to help with discovery and use VSTO; SWEET; GEON; SESDI VSTO; SWEET; GEON; SESDI
29
Lots of Permutations
30
I know what you’re thinking…
31
And it mostly works
32
How does this affect libraries? The library is a changing organism Data is relatively unanalyzed in LIS Data is relatively unanalyzed in LIS Data connects to bibliographic records, and visa-versa Data connects to bibliographic records, and visa-versa What data was used in this journal article? What data was used in this journal article? Where can I get documentation on using this data? Where can I get documentation on using this data? Has anyone published anything using this data? Has anyone published anything using this data? Data connects to other data Data connects to other data What other instruments observed a given event? What other instruments observed a given event? Is there an alternate version that better meets my needs? Is there an alternate version that better meets my needs?
33
There’s funding for research NSF: CDI : Cyber-Enabled Discovery and Innovation CDI : Cyber-Enabled Discovery and Innovation INTEROP :Community-based Data Interoperability Networks INTEROP :Community-based Data Interoperability Networks IIS : Information and Intelligent Systems IIS : Information and Intelligent Systems DataNet : Sustainable Digital Data Preservation and Access Network Partners DataNet : Sustainable Digital Data Preservation and Access Network Partners NASA: AISR : Advanced Info. Systems Research AISR : Advanced Info. Systems Research ACCESS : Advancing Collaborative Connections for Earth Science Access ACCESS : Advancing Collaborative Connections for Earth Science Access
34
Sunspot on 15 July 2002 from the Swedish 1-m Solar Telescope on La Palma
35
http://virtualsolar.org/ http://stereo.gsfc.nasa.gov joseph.a.hourcle@nasa.gov
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.