Page 1JSOC Review – 17 March 2005 JSOC Pipeline Processing: Data organization, Infrastructure, and Data Products Rasmus Munk Larsen, Stanford University.

Slides:



Advertisements
Similar presentations
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Advertisements

HMI Data Analysis Software Plan for Phase-D. JSOC - HMI Pipeline HMI Data Analysis Pipeline Doppler Velocity Heliographic Doppler velocity maps Tracked.
Page 1JSOC for SDO MOR October 2007 JSOC Status Review.
Rasmus Munk Larsen / Pipeline Processing 1 JSOC Pipeline Processing Overview Rasmus Munk Larsen, Stanford University
200Mm Time-Distance Pipeline Status 11/5/ Mm Time-Distance Lead – Tom Duvall Lead – Tom Duvall Task – From Dopplergrams generate deep-focus synoptic.
Rasmus Munk Larsen / Pipeline Processing 1HMI Science Team Meeting – January, 2005 JSOC Pipeline Processing Environment Rasmus Munk Larsen, Stanford University.
Page 1LWS Teams Day JSOC Overview HMI Data Products: Plan and Status.
HMI – Synoptic Data Sets HMI Team Meeting Jan. 26, 2005 Stanford, CA.
HMI/AIA Science Team Meeting, HMI Science Goals Alexander Kosovichev & HMI Team.
Harvard University Oracle Database Administration Session 2 System Level.
Time-Distance Pipeline for the Upper 30Mm Convection Zone Status 11/5/2007.
HMI Science Analysis. Primary Goal: Origins of Solar Variability The primary goal of the Helioseismic and Magnetic Imager (HMI) investigation is to study.
Page 1JSOC Review – 17 March 2005 DRMS Core System Karen Tian
25 September 2007eSDO and the VO, ADASS 2007Elizabeth Auden Accessing eSDO Solar Image Processing and Visualisation through AstroGrid Elizabeth Auden ADASS.
Advanced Technology Center 1 HMI Rasmus Larsen / Processing Modules Stanford University HMI Team Meeting – May 2003 Processing Module Development Rasmus.
HMI Science Objectives Convection-zone dynamics and the solar dynamo  Structure and dynamics of the tachocline  Variations in differential rotation 
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Overview of Database Languages and Architectures.
Working with SQL and PL/SQL/ Session 1 / 1 of 27 SQL Server Architecture.
Selecting and Implementing An Embedded Database System Presented by Jeff Webb March 2005 Article written by Michael Olson IEEE Software, 2000.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Avro Apache Course: Distributed class Student ID: AM Name: Azzaya Galbazar
The Worlds of Database Systems Chapter 1. Database Management Systems (DBMS) DBMS: Powerful tool for creating and managing large amounts of data efficiently.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
Database System Concepts and Architecture Lecture # 2 21 June 2012 National University of Computer and Emerging Sciences.
MASSACHUSETTS INSTITUTE OF TECHNOLOGY NASA GODDARD SPACE FLIGHT CENTER ORBITAL SCIENCES CORPORATION NASA AMES RESEARCH CENTER SPACE TELESCOPE SCIENCE INSTITUTE.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
By Lecturer / Aisha Dawood 1.  You can control the number of dispatcher processes in the instance. Unlike the number of shared servers, the number of.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Chapter 4 Storage Management (Memory Management).
GONG data and pipelines: Present & future. Present data products 800x800 full-disk images, one per minute, continuous (0.87 average duty cycle) Observables:
I/O Management and Disk Structure Introduction to Operating Systems: Module 14.
AIA Core-Team Meeting April 2009 JSOC Stuff Phil Scherrer.
11 3 / 12 CHAPTER Databases MIS105 Lec15 Irfan Ahmed Ilyas.
OSes: 11. FS Impl. 1 Operating Systems v Objectives –discuss file storage and access on secondary storage (a hard disk) Certificate Program in Software.
Views In some cases, it is not desirable for all users to see the entire logical model (that is, all the actual relations stored in the database.) In some.
Page 1LWS Teams Day JSOC Overview HMI-AIA Joint Science Operations Center Science Data Processing a.k.a. JSOC-SDP Overview.
ICS 321 Fall 2010 SQL in a Server Environment (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 11/1/20101Lipyeow.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
HMI Major Science Objectives The primary goal of the Helioseismic and Magnetic Imager (HMI) investigation is to study the origin of solar variability and.
Page 1JSOC Peer Review 17Mar2005 HMI & AIA JSOC Architecture Science Team Forecast Centers EPO Public Catalog Primary Archive HMI & AIA Operations House-
Page 1JSOC Overview August 2007 HMI Status HMI is virtually done. –Virtually  similar to but not in fact –Front window issue resolved, flight window now.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Algorithm Preparation and Data Availability 1. Mullard Space Science Laboratory, University College London. 2. Physics and Astronomy Department, University.
Session 1 Module 1: Introduction to Data Integrity
IBM Global Services © 2005 IBM Corporation SAP Legacy System Migration Workbench| March-2005 ALE (Application Link Enabling)
Page 1SDO Teams Meeting, March 2008 Status and Overview of HMI–AIA Joint Science Operations Center (JSOC) Science Data Processing (SDP) P. Scherrer Science.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
8:30-9:10 AM: Philip Scherrer, What Can We Hope to Learn from SDO Overview of SDO HMI Investigation HMI Instrument.
2) Database System Concepts and Architecture. Slide 2- 2 Outline Data Models and Their Categories Schemas, Instances, and States Three-Schema Architecture.
Page 1JSOC Review – 17 March 2005 JSOC 17 March 2005 “Peer” Review Stanford SDP Development Plan Philip Scherrer
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
Introduction to Database Programming with Python Gary Stewart
Oracle Database Architectural Components
Helioseismology for HMI Science objectives and tasks* Data analysis plan* Helioseismology working groups and meetings *HMI Concept Study Report, Appendix.
Compute and Storage For the Farm at Jlab
Managing Multi-User Databases
Module 11: File Structure
E.C. Auden1, J.L. Culhane1, Y. P. Elsworth2, A. Fludra3, M. Thompson4
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
HMI Data Analysis Pipeline
Status and Overview of HMI–AIA Joint Science Operations Center (JSOC) Science Data Processing (SDP) April 22, 2008 P. Scherrer.
JSOC Pipeline Processing System Components
Instrument Performance Requirements
HMI Data Analysis Pipeline
Chapter 16 File Management
Presentation transcript:

Page 1JSOC Review – 17 March 2005 JSOC Pipeline Processing: Data organization, Infrastructure, and Data Products Rasmus Munk Larsen, Stanford University

Page 2JSOC Review – 17 March 2005 Overview JSOC data series organization Pipeline execution environment and architecture User libraries Co-I analysis module contribution Pipeline Data Products

Page 3JSOC Review – 17 March 2005 JSOC data organization Evolved from FITS-based MDI dataset concept to –Fix known limitations/problems –Accommodate more complex data models required by higher-level processing Main design features –Lesson learned from MDI: Separate meta-data (keywords) and image data No need to re-write large image files when only keywords change (lev1.8 problem) No (fewer) out-of-date keyword values in FITS headers Can bind to most recent values on export –Easy data access through query-like dataset names All access in terms of sets of data records, which are the “atomic units” of a data series A dataset name is a query specifying a set of data records (possibly from multiple data series): –jsoc:hmi_lev0_fg[recordnum=12345] (a specific filtergram with unique record number 12345) –jsoc:hmi_lev0_fg[ ] (a minute’s worth of filtergrams) –jsoc:hmi_fd_V[T_OBS>=‘ ’ AND T_OBS<‘ ’ AND N_MISSING<100] –Storage and tape management must be transparent to user Chunking of data records into storage units for efficient tape/disk usage done internally Completely separate storage and catalog (i.e. series & record) databases: more modular design Legacy MDI modules should run on top of new storage service –Store meta-data (keywords) in relational database (Oracle) Can use power of relational database to rapidly find data records Easy and fast to create time series of any keyword value (for trending etc.) Consequence: Data records for a given series must be well defined (i.e. have a fixed set of keywords)

Page 4JSOC Review – 17 March 2005 hmi_lev0_cam1_fg Logical Data Organization JSOC Data SeriesData records for series hmi_fd_V Single hmi_fd_V data record aia_lev0_cont1700 hmi_lev1_fd_M hmi_lev1_fd_V aia_lev0_FE171 hmi_lev1_fd_V#12345 hmi_lev1_fd_V#12346 hmi_lev1_fd_V#12347 hmi_lev1_fd_V#12348 hmi_lev1_fd_V#12349 hmi_lev1_fd_V#12350 hmi_lev1_fd_V#12351 … … Keywords : RECORDNUM = # Unique serial number SERIESNUM = # Slots since epoch. T_OBS = ‘ _23:22:40_TAI’ DATAMIN = E+03 DATAMAX = E P_ANGLE = LINK:ORBIT,KEYWORD:SOLAR_P … Storage Unit = Directory Links: ORBIT = hmi_lev0_orbit, SERIESNUM = CALTABLE = hmi_lev0_dopcal, RECORDNUM = 7 L1 = hmi_lev0_cam1_fg, RECORDNUM = R1 = hmi_lev0_cam1_fg, RECORDNUM = … Data Segments: Velocity = hmi_lev1_fd_V#12352 hmi_lev1_fd_V#12353

Page 5JSOC Review – 17 March 2005 JSOC Series Definition (JSD) #======================= Global series information ===================== Seriesname: “hmi_fd_v" Description: “HMI full-disk Doppler velocity...." Author: “Rasmus Munk Larsen" Owners: “production" Unitsize: 90 Archive: 1 Retention: Tapegroup: 2 Primary Index: T_Obs #============================ Keywords ================================= # Format: # Keyword:,,,,, # or # Keyword:, link,, # Keyword: “T_Obs", time, “ _00:00:00_TAI”, "%F %T", “s", “Nominal observation time" Keyword: “D_Mean", double, 0.0, “%lf", “m/s", “Data mean" Keyword: “D_Max”, double, 0.0, “%lf", “m/s", “Data maximum" Keyword: “D_Min", double, 0.0, “%lf", “m/s", “Data minimum" Keyword:... Keyword: “P_Angle”, link, “Attitude”, “P_Angle” #============================ Links ===================================== # Format: # Link:,, { static | dynamic } # Link: “L1", “hmi_lev0_fg", static Link: “R1", “hmi_lev0_fg", static Link: “L2", “hmi_lev0_fg", static Link: “R2", “hmi_lev0_fg", static Link:... Link: “Caltable”, “hmi_dopcal”, static Link: “Attitude”, “sdo_fds”, dynamic #============================ Data segments ============================= # Data:,,,,, # Data: "velocity", float, 2, 4096, 4096, "m/s", fitz SQL: INSERT INTO masterseries_table VALUES (‘hmi_fd_v’,’HMI full-disk…’, … SQL: CREATE TABLE hmi_fd_v (recnum integer not null unique, T_Obs binary_float, … SQL: CREATE INDEX hmi_fd_v_pidx on hmi_fd_v (T_Obs) SQL: CREATE INDEX hmi_fd_v_ridx on hmi_fd_v (recnum) SQL: CREATE SEQUENCE hmi_fd_v testclass1.jsd JSD parser Oracle database Creating a new Data Series:

Page 6JSOC Review – 17 March 2005 Global Database Tables

Page 7JSOC Review – 17 March 2005 Database tables for example series hmi_fd_v Tables specific for each series contain per record values of –Keywords –Record numbers of records pointed to by links –DSIndex = an index identifying the SUMS storage unit containing the data segments of a record –Series sequence counter used for generating unique record numbers

Page 8JSOC Review – 17 March 2005 Data Record Management Service (DRMS) Data Record Management Service (DRMS) Pipeline client-server architecture JSOC Disks Analysis code C/Fortran/IDL/Matlab JSOC Library Record Cache (Keywords+Links+Data paths) OpenRecords CloseRecords GetKeyword, SetKeyword GetLink, SetLink OpenDataSegment CloseDataSegment Pipeline client process JSOC Disks Data Record Management Service (DRMS) Storage Unit Management Service (SUMS) Tape Archive Service Series Tables Record Catalogs Storage Unit Tables Record Catalogs Record Tables Oracle Database Server AllocUnit GetUnit PutUnit SQL queries Storage unit transfer Data Segment I/O Storage unit transfer Generic file I/O DRMS socket protocol

Page 9JSOC Review – 17 March 2005 Pipeline batch processing A pipeline batch is encapsulated in a single database transaction: –If no module fails all data records are commited and become visible to other clients of the JSOC catalog at the end of the session –If failure occurs all data records are deleted and the database rolled back –It is possible to commit data produced up to intermediate checkpoints during sessions DRMS Service = Session Master Input data records Output data records DRMS API Register session DRMS API Module 1 DRMS API Module 2.2 … DRMS API Module N DRMS API Commit Data & Deregister Pipeline batch = atomic transaction Record & Series Database SUMS DRMS API Module 2.1

Page 10JSOC Review – 17 March 2005 Sample DRMS client API calls DRMS_Session_t *drms_connect(char *drms_locator, char *username) –Establish socket connection to DRMS server process designated by locator string –Retrieve and cache global series information from database void drms_disconnect(DRMS_Session_t *session, int abort) –If abort=0 then commit new storage units (data segments) to SUMS, and commit meta-data to the database –If abort=1 then tell SUMS to discard new storage units. Roll back all database (meta-data) manipulations since last commit –Close socket connection to DRMS server int drms_commit(DRMS_Session_t *session) –Commit without closing connection DRMS_RecordSet_t *drms_open_records(DRMS_Session_t *session, char *dataset, char *mode) –Parse dataset descriptor in “dataset” and generate SQL queries to retrieve the specified records –Populate DRMS_Record_t data structures with meta-data from the matching records –Extract set of unique DSIndex values from retrieved records and call SUMS to get their online location –This call may involve waiting for SUMS to stage data from tape. A non-blocking version will also be provided –mode can take the values RD (read), CLONE_COPY, CLONE_SHARE. The latter two make new copies of the data records with new unique record numbers DRMS_RecordSet_t *drms_create_records(DRMS_Session_t *session, char *series, int num_recs) –Create num_recs new records for the specified series. Assign them new unique record numbers generated by the database char *drms_get_record_path(DRMS_Record_t *record) –Returns a string with the directory in which data (segments) for the record is stored

Page 11JSOC Review – 17 March 2005 Example of module code: int main(int argc, char *argv[]) { DRMS_Session_t *session; DRMS_RecordSet_t *filtergrams, *dopplergram; int first_frame, status; char query[1024]; session = drms_connect(argv[1], “production“, “passwd”); sprintf(query, "hmi_lev0_fg[T_Obs>=%s AND T_Obs<%s]", argv[2], argv[3]); filtergrams = drms_open_records(session, query, "RD", &status); if (filtergrams->num_recs==0) { printf("Sorry, no filtergrams found for that time interval.\n"); drms_disconnect(session, 1); return -1; } first_frame = 0; /* Start looping over record set. */ for (;;) { first_frame = find_next_framelist(first_frame, filtergrams); if (first_frame == -1) /* No more complete framelists. Exit. */ break; dopplergram = drms_create_records(session, "hmi_fd_v", 1, &status); compute_dopplergram(first_frame, filtergrams, dopplergram); drms_close_records(session, dopplergram); } drms_disconnect(session, 0); return 0; } A module doing a (naïve) Doppler velocity calculation could look as shown below Usage example: doppler helios.stanford.edu:33546 " _16:00:00_TAI" " _17:00:00_TAI"

Page 12JSOC Review – 17 March 2005 Example continued… int compute_dopplergram(int first_frame, DRMS_RecordSet_t *filtergrams, DRMS_RecordSet_t * dopplergram) { int n_rows, n_cols, tuning; DRMS_Segment_t *fg[10], *dop; short *fg_data[10], *pol; float *dop_data; char linkname[3]; /* Get pointers for doppler data array. */ dop = drms_open_datasegment(dopplergram->records[0], "v_doppler", "RDWR"); n_cols = drms_getaxis(dop, 0); n_rows = drms_getaxis(dop, 1); dop_data = (float *)drms_getdata(dop, 0, 0); /* Get pointers for filtergram data arrays and set associated link in dopplergram record. */ for (i=first_frame; i<first_frame+10; i++) { fg[i] = drms_open_datasegment( filtergrams->records[i], "intensity", "RD"); fg_data[i] = (short *)drms_getdata(fg, 0, 0); pol = drms_getkey_string(filtergrams->records[i], "Polarization"); tuning = drms_getkey_int(filtergrams->records[i], "Tuning"); sprintf(linkname,”%c%1d”,pol[0],tuning); drms_set_link(dopplergram->records[0], linkname, filtergrams->records[i]->recnum); } /* Do the actual Doppler computation.*/ calc_v(n_cols, n_rows, fg_data, dop_data); }

Page 13JSOC Review – 17 March 2005 HMI module status and MDI heritage Doppler Velocity Heliographic Doppler velocity maps Tracked Tiles Of Dopplergrams Stokes I,V Continuum Brightness Tracked full-disk 1-hour averaged Continuum maps Brightness feature maps Solar limb parameters Stokes I,Q,U,V Full-disk 10-min Averaged maps Tracked Tiles Line-of-sight Magnetograms Vector Magnetograms Fast algorithm Vector Magnetograms Inversion algorithm Egression and Ingression maps Time-distance Cross-covariance function Ring diagrams Wave phase shift maps Wave travel times Local wave frequency shifts Spherical Harmonic Time series Mode frequencies And splitting Brightness Images Line-of-Sight Magnetic Field Maps Coronal magnetic Field Extrapolations Coronal and Solar wind models Far-side activity index Deep-focus v and c s maps (0-200Mm) High-resolution v and c s maps (0-30Mm) Carrington synoptic v and c s maps (0-30Mm) Full-disk velocity, sound speed, Maps (0-30Mm) Internal sound speed Internal rotation Vector Magnetic Field Maps MDI pipeline modules exist Standalone production codes in use at Stanford Research codes in use at Stanford Codes developed at HAO Codes being developed in the community Codes developed at Stanford Primary observables Intermediate and high level data products

Page 14JSOC Review – 17 March 2005 Analysis modules: co-I contributions and collaboration Contributions from co-I teams: –Software for intermediate and high level analysis modules –Output data series definition Keywords, links, data segments, size of storage units etc. –Documentation (detailed enough to understand the contributed code) –Test data and intended results for verification –Time Explain algorithms and implementation Help with verification Collaborate on improvements if required (e.g. performance or maintainability) Contributions from HMI team: –Pipeline execution environment –Software & hardware resources (Development environment, libraries, tools) –Time Help with defining data series Help with porting code to JSOC API If needed, collaborate on algorithmic improvements, tuning for JSOC hardware, parallelization Verification

Page 15JSOC Review – 17 March 2005 Questions addressed by HMI team meeting Jan List of standard science data products –Which data products, including intermediate ones, should be produced by JSOC? –What cadence, resolution, coverage etc. will/should each data product have? Eventually a JSOC series description must be written for each one. –Which data products should be computed on the fly and which should be archived? –Have we got the basic pipeline right? Are there maturing new techniques that have been overlooked? –Have preliminary list. Aiming to have final list by November Detailing each branch of the processing pipeline –What are the detailed steps in each branch? –Can some of the computational steps be encapsulated in general tools that can be shared among different branches (example: tracking)? –What are the computer resource requirements of computational steps? Contributed analysis modules –Who will contribute code? –Which codes are mature enough for inclusion? Should be at least working research code now, since integration has to begin by c. mid –Fairly well defined for seismology pipeline, less so for vector magnetic processing. Aiming to have final list of codes to include by April 2006.

Page 16JSOC Review – 17 March 2005 Example: Global Seismology Pipeline