Download presentation
Presentation is loading. Please wait.
1
Rasmus Munk Larsen / Pipeline Processing 1 JSOC Pipeline Processing Overview Rasmus Munk Larsen, Stanford University rmunk@quake.stanford.edu 650-725-5485
2
Rasmus Munk Larsen / Pipeline Processing 2 Overview Hardware overview JSOC data model Pipeline infrastructure & subsystems Pipeline modules
3
Rasmus Munk Larsen / Pipeline Processing 3 JSOC Connectivity Stanford DDS MOC LMSAL 1 Gb Private line JSOC Disk array NASA AMES “White” Net
4
Rasmus Munk Larsen / Pipeline Processing 4 JSOC Hardware configuration
5
Rasmus Munk Larsen / Pipeline Processing 5 JSOC data model: Motivation Evolved from MDI dataset concept to –Enable record level access to meta-data for queries and browsing –Accommodate more complex data models required by higher-level processing Main design features –Lesson learned from MDI: Separate meta-data (keywords) and image data No need to re-write large image files when only keywords change (lev1.8 problem) No out-of-date keyword values in FITS headers - can bind to most recent values on export –Data access through query-like dataset names All access in terms of (sets of) data records, which are the “atomic units” of a data series A dataset name is a query specifying a set of data records: – jsoc:hmi_lev1_V[#3000-#3020] (21 records from with known epoch and cadence) – jsoc:hmi_lev0_fg[t_obs=2008-11-07_02:00:00/8h][cam=‘doppler’] (8 hours worth of filtergrams) –Storage and tape management must be transparent to user Chunking of data records into storage units for efficient tape/disk usage done internally Completely separate storage unit and meta-data databases: more modular design MDI data and modules will be migrated to use new storage service –Store meta-data (keywords) in relational database Can use power of relational database to search and index data records Easy and fast to create time series of any keyword value (for trending etc.) Consequence: Data records must be well defined (e.g. have a fixed set of keywords)
6
Rasmus Munk Larsen / Pipeline Processing 6 JSOC data model JSOC Data will be organized according to a data model with the following classes Series: A sequence of like data records, typically data products produced by a particular analysis –Attributes include: Name, Owner, primary search index, Storage unit size, Storage group Record: Single measurement/image/observation with associated meta-data –Attributes include: ID, Storage Unit ID, Storage Unit Slot# –Contain Keywords, Links, Data segments –Records are the main data objects seen by module programmers Keyword: Named meta-data value, stored in database –Attributes include: Name, Type, Value, Physical unit Link: Named pointer from one record to another, stored in database –Attributes include: Name, Target series, target record id or primary index value –Used to capture data dependencies and processing history Data Segment: Named data container representing the primary data on disk belonging to a record –Attributes include: Name, filename, datatype, naxis, axis[0…naxis-1], storage format –Can be either structure-less (any file) or n-dimensional array stored in tiled, compressed file format Storage Unit: A chunk of data records from the same series stored in a single directory tree –Attributes: include: Online location, offline location, tape group, retention time –Managed by the Storage Unit Manager in a manner transparent to most module programmers
7
Rasmus Munk Larsen / Pipeline Processing 7 hmi_lev0_cam1_fg JSOC data model JSOC Data SeriesData records for series hmi_lev1_fd_V Single hmi_lev1_fd_V data record aia_lev0_cont1700 hmi_lev1_fd_M hmi_lev1_fd_V aia_lev0_FE171 hmi_lev1_fd_V#12345 hmi_lev1_fd_V#12346 hmi_lev1_fd_V#12347 hmi_lev1_fd_V#12348 hmi_lev1_fd_V#12349 hmi_lev1_fd_V#12350 hmi_lev1_fd_V#12351 … … Keywords : RECORDNUM = 12345 # Unique serial number SERIESNUM = 5531704 # Slots since epoch. T_OBS = ‘2009.01.05_23:22:40_TAI’ DATAMIN = -2.537730543544E+03 DATAMAX = 1.935749511719E+03... P_ANGLE = LINK:ORBIT,KEYWORD:SOLAR_P … Storage Unit = Directory Links: ORBIT = hmi_lev0_orbit, SERIESNUM = 221268160 CALTABLE = hmi_lev0_dopcal, RECORDNUM = 7 L1 = hmi_lev0_cam1_fg, RECORDNUM = 42345232 R1 = hmi_lev0_cam1_fg, RECORDNUM = 42345233 … Data Segments: V_DOPPLER = hmi_lev1_fd_V#12352 hmi_lev1_fd_V#12353
8
Rasmus Munk Larsen / Pipeline Processing 8 JSOC subsystems SUMS: Storage Unit Management System –Maintains database of storage units and their location on disk and tape –Manages JSOC storage subsystems: Disk array, Robotic tape library Scrubs old data from disk cache to maintain enough free workspace Loads and unloads tape to/from tape drives and robotic library –Allocates disk storage needed by pipeline processes through DRMS –Stages storage units requested by pipeline processes through DRMS –Design features: RPC client-server protocol Oracle DBMS (to be migrated to PostgreSQL) DRMS: Data Record Management System –Maintains database holding Master tables with definitions of all JSOC series and their keyword, link and data segment definitions One table per series containing record meta-data, e.g. keyword values –Provides distributed transaction processing framework for pipeline –Provides full meta-data searching through JSOC query language Multi-column indexed searches on primary index values allows for fast and simple querying for common cases Inclusion of free-form SQL clauses allows advanced querying –Provides software libraries for querying, creating, retrieving and storing JSOC series, data records and their keywords, links, and data segments Currently available in C. Wrappers (with read-only restriction?) for Fortran, Matlab and IDL are planned. –Design features: TCP/IP socket client-server protocol PostgreSQL DBMS Slony DB replication system to be added for managing query load and enabling multi-site distributed archives
9
Rasmus Munk Larsen / Pipeline Processing 9 Data Record Management Service (DRMS) Data Record Management Service (DRMS) Pipeline software/hardware architecture JSOC Disks Pipeline program “module” Record Cache (Keywords+Links+Data paths) DRMS Library OpenRecords CloseRecords GetKeyword, SetKeyword GetLink, SetLink OpenDataSegment CloseDataSegment JSOC Disks Data Record Management Service (DRMS) Storage Unit Management Service (SUMS) Robotic Tape Archive Series Tables Record Catalogs Storage Unit Tables Record Catalogs Record Tables Database Server AllocUnit GetUnit PutUnit SQL queries Storage unit transfer Data Segment I/O Storage unit transfer File I/O DRMS socket protocol JSOC Science Libraries Utility Libraries
10
Rasmus Munk Larsen / Pipeline Processing 10 JSOC Pipeline Workflow SUMS Storage Unit Management System DRMS Data Record Management service PUI Pipeline User Interface (scheduler) Pipeline processing plan Processing script, “mapfile” List of pipeline modules with needed datasets for input, output Pipeline Operato r Processing History Log DRMS Data Record Management service Module1 Module2 Module3 DRMS session
11
Rasmus Munk Larsen / Pipeline Processing 11 Analysis modules: co-I contributions and collaboration Contributions from co-I teams: –Software for intermediate and high level analysis modules –Data series definitions Keywords, links, data segments, size of storage units, primary index keywords etc. –Documentation –Test data and intended results for verification –Time Explain algorithms and implementation Help with verification Collaborate on improvements if required (e.g. performance or maintainability) Contributions from HMI team: –Pipeline execution environment –Software & hardware resources (Development environment, libraries, tools) –Time Help with defining data series Help with porting code to JSOC API If needed, collaborate on algorithmic improvements, tuning for JSOC hardware, parallelization Verification
12
Rasmus Munk Larsen / Pipeline Processing 12 HMI module status and MDI heritage Doppler Velocity Heliographic Doppler velocity maps Tracked Tiles Of Dopplergrams Stokes I,V Continuum Brightness Tracked full-disk 1-hour averaged Continuum maps Brightness feature maps Solar limb parameters Stokes I,Q,U,V Full-disk 10-min Averaged maps Tracked Tiles Line-of-sight Magnetograms Vector Magnetograms Fast algorithm Vector Magnetograms Inversion algorithm Egression and Ingression maps Time-distance Cross-covariance function Ring diagrams Wave phase shift maps Wave travel times Local wave frequency shifts Spherical Harmonic Time series Mode frequencies And splitting Brightness Images Line-of-Sight Magnetic Field Maps Coronal magnetic Field Extrapolations Coronal and Solar wind models Far-side activity index Deep-focus v and c s maps (0-200Mm) High-resolution v and c s maps (0-30Mm) Carrington synoptic v and c s maps (0-30Mm) Full-disk velocity, sound speed, Maps (0-30Mm) Internal sound speed Internal rotation Vector Magnetic Field Maps MDI pipeline modules exist Standalone “production” code routinely used Research code in use Code developed at HAO Code developed at Stanford Primary observables Intermediate and high level data products
13
Rasmus Munk Larsen / Pipeline Processing 13 Example: Global Seismology Pipeline
14
Rasmus Munk Larsen / Pipeline Processing 14 Questions to be discussed at working sessions List of standard science data products –Which data products, including intermediate ones, should be produced by JSOC to accomplish the science goals of the mission? –What cadence, resolution, coverage etc. should each data product have? –Which data products should be computed on the fly and which should be archived? –What are the challenges to be overcome for each analysis technique? Detailing each branch of the processing pipeline –What are the detailed steps in each branch? –Can some of the computational steps be encapsulated in general tools that can be shared among different branches (example: tracking)? –What are the CPU and I/O resource requirements of computational steps? Contributed analysis modules –What groups or individuals will contribute code, and incorporate it in the pipeline? –If multiple candidate techniques and/or implementations exist, which should be included in the pipeline? –What is the test plan and what data is needed to verify the approach?
15
Rasmus Munk Larsen / Pipeline Processing 15 JSOC Series Definition
16
Rasmus Munk Larsen / Pipeline Processing 16 Global Database Tables
17
Rasmus Munk Larsen / Pipeline Processing 17 Database tables for example series hmi_fd_v Tables specific for each series contain per record values of –Keywords –Record numbers of records pointed to by links –DSIndex = an index identifying the SUMS storage unit containing the data segments of a record –Series sequence counter used for generating unique record numbers
18
Rasmus Munk Larsen / Pipeline Processing 18 Pipeline batch processing A pipeline batch is encapsulated in a single database transaction: –If no module fails all data records are commited and become visible to other clients of the JSOC catalog at the end of the session –If failure occurs all data records are deleted and the database rolled back –It is possible to commit data produced up to intermediate checkpoints during sessions DRMS Service = Session Master Input data records Output data records DRMS API Register session DRMS API Module 1 DRMS API Module 2.2 … DRMS API Module N DRMS API Commit Data & Deregister Pipeline batch = atomic transaction Record & Series Database SUMS DRMS API Module 2.1
19
Rasmus Munk Larsen / Pipeline Processing 19 Example of module code: extern CmdParams_t cmdparams; /* command line args */ extern DRMS_Env_t *drms_env; /* DRMS environment */ int module_main(void) { DRMS_RecordSet_t *filtergrams, *dopplergram; int first_frame, status; char query[1024],*start,*end; start = cmdparms_getarg(&cmdparams, 1); end = cmdparms_getarg(&cmdparams, 2); sprintf(query, "hmi_lev0_fg[T_Obs=%s-%s]", start, end); filtergrams = drms_open_records(drms_env, query, "RD", &status); if (filtergrams->num_recs==0) { printf("Sorry, no filtergrams found for that time interval.\n"); return -1; } first_frame = 0; /* Start looping over record set. */ for (;;) { first_frame = find_next_framelist(first_frame, filtergrams); if (first_frame == -1) /* No more complete framelists. Exit. */ break; dopplergram = drms_create_records(drms_env, "hmi_fd_v", 1, &status); if (status) return -1; compute_dopplergram(first_frame, filtergrams, dopplergram); drms_close_records(drms_env, dopplergram); } return 0; } A module doing a (naïve) Doppler velocity calculation could look as shown below Usage: doppler DRMSSESSION=helios:33546 "2009.09.01_16:00:00_TAI" "2009.09.01_17:00:00_TAI"
20
Rasmus Munk Larsen / Pipeline Processing 20 Example continued int compute_dopplergram(int first_frame, DRMS_RecordSet_t *filtergrams, DRMS_RecordSet_t * dopplergram) { int n_rows, n_cols, tuning; DRMS_Segment_t *fg[10], *dop; short *fg_data[10]; char *pol; double *dop_data; /* Get pointers for doppler data array. */ dop = drms_open_datasegment(dopplergram->records[0], "v_doppler", "RDWR"); n_cols = drms_getaxis(dop, 0); n_rows = drms_getaxis(dop, 1); dop_data = (double *)drms_getdata(dop, 0, 0); /* Get pointers for filtergram data arrays. */ for (i=first_frame; i<first_frame+10; i++) { fg[i] = drms_open_datasegment(filtergrams->records[i], "intensity", "RD"); fg_data[i] = (short *)drms_getdata(fg, 0, 0); pol = drms_getkey_string(filtergrams->records[i], "Polarization"); tuning = drms_getkey_int(filtergrams->records[i], "Tuning"); printf(“Using filtergram (%s, %d)\n”, pol, tuning); } /* Do the actual Doppler computation.*/ calc_v(fg_data, dop_data); }
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.