Download presentation
Presentation is loading. Please wait.
Published byClyde Poole Modified over 8 years ago
1
July 2010Cospar10 BremenSlide 1 SDO Data Access and Distribution in Europe and the WisSDOm Data Centre in ROB, Brussels Authors David Boyes, Benjamin Mampaey, Cis Verbeeck, Veronique Delouille, Jean-François Hochedez STCE + ROB
2
July 2010Cospar10 BremenSlide 2 What will be covered Where is the data and the access architecture for the users Some basic terms User access methods – modules – basic web access – virtual observatories – simplified web access – pseudo files and other developments Interesting issues – Retention – Saved searches – Evolving calibration Neat stuff to come – Cutouts – Helioviewer – Grid integration
3
July 2010Cospar10 BremenSlide 3 Where is the data for the users Data is available from one or more data centre(s) - all are networked Some users are "close", some are "far" - distance matters All data is available somewhere Users can get data (an "export") – from the nearest centre directly – via the nearest centre from a remote centre – directly from another centre Most of this is automatic – you will see differences in e.g. delays
4
July 2010Cospar10 BremenSlide 4 How the data is accessed (a bit technical) the system is the netDRMS – created by the JSOC at Stanford files are generated by content system holds data files + metadata – SUMS + DRMS mediator is an "export" module makes your very own file – FITS, tar of FITS etc. SQL etc. is hidden from user
5
July 2010Cospar10 BremenSlide 5 Access summary... No files until you ask for them Data is referenced by content - provided as a file(s) with whatever name you want The exported files are built using stored elements, so e.g. FITS with Rice compression quite direct as AIA data is stored internally in this format Can get anything but... – you may as well ask for all metadata – the files can be large - best not to ask for 100's
6
July 2010Cospar10 BremenSlide 6 Some basic terms series – basic collection of data items with shared properties – by convention named. – all series records share a metadata format (i.e. keywords) keywords – FITS style keywords plus added metadata only keywords – correspond to columns in the metadata (DRMS) database online means – available from a disk at the site – so offline means : not yet arrived/available, deleted but can be fetched data format – whatever is stored is native (FITS, JP2000), conversion is post-processing – characterised by resolution, cadence (e.g. 4K x 4K at 10s, 1K x 1K at 90s) – naturally can't do better, but can reduce by "cutouts" in time or space data records – can be several items as a group (e.g. image + bad pixel map + alternative format) – data is SUMS plus metadata, referenced by metadata tables (DRMS) - usually one to one – each is self contained, for example cadence is not part of data
7
July 2010Cospar10 BremenSlide 7 Example series aia_test.lev1 AIA images 4Kx4K full disk full cadence aia_test.synoptic2 AIA images reduced to 1Kx1K full disk and 90s cadence hmi_test.M_45s magnetograms, 45s cadence hmi_test.v_45s dopplergrams, 45s cadence jpeg2K to come, browsing and forecasting
8
July 2010Cospar10 BremenSlide 8 User access methods Direct via “modules” – on site of data centre Query based – precursor to full data access – checks a part of the data (metadata) without having to retrieve the very large part Indirect via network – web/http based – delivers data somewhere - maybe to fetch immediately or later Direct via wrapper – on site e.g. IDL (Matlab on way)
9
July 2010Cospar10 BremenSlide 9 A practical pause - limitations Sheer size of request - even if you have a 2TB USB stick, that's only 2 days Network speed - at about 200Mb/s it takes a day to get a day's worth Search/database speed - millions of records Raw data access/retrieval speed - the basic image data takes time to get from disk Retention time - you can get anything, but you probably have to wait for a full day from 2 years ago that nobody else has ever used
10
July 2010Cospar10 BremenSlide 10 At the data centres, for example – show_series – show_info – jsoc_export_as_fits [jdb@db1 ~]$ show_info -s ds=aia_test.synoptic2 First Record: aia_test.synoptic2[2010-05-21T15:00:00.57Z][171] is first of 6 records matching first keyword, Recnum = 1 Last Record: aia_test.synoptic2[2010-07-14T11:58:41.07Z][335] is first of 2 records matching first keyword, Recnum = 445376 Last Recnum: 445377 [jdb@db1 ~]$ show_series aia_test.lev1 aia_test.synoptic2 drms.sites hmi.doptest hmi_test.m_45s hmi_test.s_720s lm_jps.lev1_test4k10s [jdb@db1 ~]$ jsoc_export_as_fits reqid=REQ_FTP expversion=0.5 rsquery=aia_test.lev1[:#209866] path=tmp method=url protocol=FITS '10552320' bytes exported. Access by : modules - the basic bricks
11
July 2010Cospar10 BremenSlide 11 Access by : basic web access System developed by JSOC : lookdata.html Online via JSOC web site, but heavily loaded Being tested at ROB Provides an easy access to an overview of all the available data Formulating a selection query does require knowledge of query syntax Provides for a wide variety of data packaging – normal user FITS or internal format (FITS with no keywords) – via web for immediate or later access, as one or more individual files or as tar – ROB working on fewer packaging options
12
July 2010Cospar10 BremenSlide 12 Access by : basic web access
13
July 2010Cospar10 BremenSlide 13 Access by : Virtual Observatories VSO – development of existing VSO – prototype for SDO running and definitive version in preparation – http://sdac.virtualsolar.org/cgi/search Soteria – demo provider made for ROB/USET, SDO provider being coded now – http://soteria-space.eu/ Uniform search paradigm Infrastructure hides efficient searches with complex syntax e.g. SQL in various flavours
14
July 2010Cospar10 BremenSlide 14 Access by : Soteria Virtual Observatory One part of an EU project Based on current web access technology The example is for the ROB USET telescope as a data provider, each SDO site will able be able to act as a provider
15
July 2010Cospar10 BremenSlide 15 Access by : simplified web access Work in progress Limited offer to direct request of tar files or individual FITS format files, front end for PFS Simplified enquiry based such as : – aia.lev1 + time + period + cadence + wavelengths Preparation is actually more complex than basic access - for example it requires decisions as to what keys are useful for what series
16
July 2010Cospar10 BremenSlide 16 Access by : pseudo files (PFS) Systematically named files in a directory tree with no real files until you access them Typically based on query covering a much wider range than you really need (or could use) Real files kept in cache so further access very cheap
17
July 2010Cospar10 BremenSlide 17 mnt `-- aia_test.lev1 `-- 2010 `-- 06 `-- 17 |-- H0000 | |-- AIA20100617_000000570000_0171.fits | |-- AIA20100617_000003570000_0304.fits | |-- AIA20100617_000009580000_94.fits | |-- AIA20100617_000018570000_1600.fits | |-- AIA20100617_000050070000_211.fits | |-- AIA20100617_000053050000_335.fits | |-- AIA20100617_000056100000_193.fits...... | |-- AIA20100617_004505070000_335.fits | |-- AIA20100617_004506570000_1600.fits | |-- AIA20100617_004508070000_193.fits | |-- AIA20100617_004509580000_94.fits | `-- AIA20100617_004511070000_131.fits |-- H0100 | |-- AIA20100617_010000580000_0171.fits | |-- AIA20100617_010002080000_211.f....... |-- AIA20100617_043008060000_193.fits |-- AIA20100617_043009550000_94.fits |-- AIA20100617_043011090000_131.fits |-- AIA20100617_043018580000_1600.fits |-- AIA20100617_044500560000_0171.fits |-- AIA20100617_044502050000_211.fits |-- AIA20100617_044503570000_0304.fits |-- AIA20100617_044505070000_335.fits |-- AIA20100617_044506570000_1600.fits |-- AIA20100617_044508070000_193.fits |-- AIA20100617_044509580000_94.fits `-- AIA20100617_044511070000_131.fits 9 directories, 160 files Access by : pseudo files (PFS) Example with 160 file names, all AIA wavelengths, 15min cadence In prototype at ROB, source downloadable
18
July 2010Cospar10 BremenSlide 18 Access by : useful methods in development Order and notify via e-mail for manual fetch Order and automatic delivery (e.g. sftp)
19
July 2010Cospar10 BremenSlide 19 Interesting issue - Retention All netDRMS sites have full information for selected series - their “subscribed” series But is it on line? – sites keep the latest, but must selectively discard Enquiry modules can tell if online, but implications (delay...) if not? You can request it, but it can take some time to obtain – for now quick, but after a year or so a record nobody has looked at will be from tape
20
July 2010Cospar10 BremenSlide 20 Interesting issue - Saved searches How to describe a selection of data Can save result as a record list for a reasonable number of records but this does not save the query – save both query and result? For both your own use and publication Saved query might give different results (e.g. online only) Relates to the issue of calibration
21
July 2010Cospar10 BremenSlide 21 Interesting issue - Evolving calibration and which data did I use? More accurate calibration will be available as time goes on and more calibration points are acquired So the newest and best data can change This done for most by applying a calibration series e.g. via Solarsoft But there can also be metadata changes The raw data is unlikely to change
22
July 2010Cospar10 BremenSlide 22 Neat stuff to come - cutouts This is well on the way again being developed by JSOC and LMSAL - for those who don't need the full 4Kx4K Very much reduced data storage requirements Closely related to event tracking and the HEK
23
July 2010Cospar10 BremenSlide 23 Neat stuff to come - Helioviewer www.helioviewer.org Existing project now being directed towards use with SDO data JPEG2000 based viewer with event marker overlay integration with JPEG2000 series rapid browsing with links to full data ROB is CoI in requested next stage
24
July 2010Cospar10 BremenSlide 24 Neat stuff to come - grid integration The data element size (10's of MB) is natural for use in a high performance grid The data already geographically distributed - variety of access routes Distributed variety of resources - large clusters, pipelines, GPU's Sites are on high performance research networks
25
July 2010Cospar10 BremenSlide 25 Thanks to JSOC at Stanford LMSAL Belnet and Geant2 for networking The enthusiastic cooperation from the partner data centres Our sister institutes at the ROB site for hosting the data centre and infrastructure
26
July 2010Cospar10 BremenSlide 26 Web addresses The main source : JSOC at jsoc.stanford.edu HEK : www.lmsal.com/hek ROB : wissdom.oma.be SAO : www.cfa.harvard.edu/sao GDS : www.mps.mpg.de/projects/seismo/GDC-SDO UCLan : www.star.uclan.ac.uk IAS : idc-medoc.ias.u-psud.fr
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.