IRIS Services, Products, Quality Assurance Efforts, and Potential Links to High Performance Computing in the Era of BIG DATA By T. Ahern, M. Bahavar, R.Casey,

Slides:



Advertisements
Similar presentations
18 Copyright © 2005, Oracle. All rights reserved. Distributing Modular Applications: Introduction to Web Services.
Advertisements

Making the System Operational
Real-time data for seismic monitoring at INGV National Earthquake Center S. Mazza et al. Centro Nazionale Terremoti Istituto Nazionale di Geofisica e Vulcanologia.
Database System Concepts and Architecture
® Microsoft Office 2010 Browser and Basics.
ASL QC Procedures Status and plans. GSN ANSS Traditional Waveform Review  The “morning run” Daily summarizes problems with availability, timing,
03/17/2014 Data Management System -Data Services- Temporary Experiment Data Brief Data Services News MUSTANG update- QC on PASSCAL Data PASSCAL SC Spring.
IRIS Services Initiative Improving Data Access and Integration for the GeoSciences Linus Kamb, Joanna Muench, Tim Ahern IRIS Data Management Center.
Oct 31, 2000Database Management -- Fall R. Larson Database Management: Introduction to Terms and Concepts University of California, Berkeley School.
Experimental Facilities Division ANL-ORNL SNS Experimental Data Standards (Status) Richard Riedel SNS Data Acquisition Group Leader.
16 months…. The Visibility Information Exchange Web System is a database system and set of online tools originally designed to support the Regional Haze.
Components and Architecture CS 543 – Data Warehousing.
Improved Quality Control for Seismic Networks ---MUSTANG.
INDEPTH Network INDEPTH Data Systems Kobus Herbst.
Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System 1 Zaihua Ji Doug Schuster Steven Worley Computational.
The GEON Integrated Data Viewer (IDV) and IRIS DMC Services: CyberInfrastructure Support for Seismic Data Visualization and Interpretation Charles Meertens.
August 13-19, 2010Data Management Workshop Foz do Iguassu- Brazil Seismic Quality Assurance Rick Benson IRIS DMC Rick Benson IRIS DMC.
Distributed Databases Dr. Lee By Alex Genadinik. Distributed Databases? What is that!?? Distributed Database - a collection of multiple logically interrelated.
Anthony Atkins Digital Library and Archives VirginiaTech ETD Technology for Implementers Presented March 22, 2001 at the 4th International.
This presentation will guide you though the initial stages of installation, through to producing your first report Click your mouse to advance the presentation.
IRIS Products. LEVEL 4 Integrated Research Products LEVEL 3 Seismological Research Products LEVEL 2 Derived Information Standard Processing Higher Level.
Discussion and conclusion The OGC SOS describes a global standard for storing and recalling sensor data and the associated metadata. The standard covers.
PQLX - A Station Assessment & Data Quality Control System
Ihr Logo Data Explorer - A data profiling tool. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution.
1 Overview SUNY Business Intelligence Initiative (SBII) Library Dashboards Circulation Analysis Collection Analysis.
By Tim Ahern, Program Manager IRIS DMS A “Short” Introduction to the IRIS Data Management Center Data Holdings, Data Organization, and Data Access.
By Tim Ahern, Rick Benson, Rob Casey, Chad Trabant and Bruce Weertman and many more talented people at the IRIS DMC.
Array Response Functions with ArrayGUI
DISTRIBUTED COMPUTING
Accessing Data Using Web Services. IRIS Services – service.iris.edu FDSN Web services dataselect station event Documentation IRIS web services fedcatalog.
+ IRIS Data Services Shortcourse December 15, 2014 MOMA Room, Palomar Hotel.
 Web Services and Data Products DSSC Fall 2014 Manoch Bahavar Alex Hutko Yazan Suleiman Bruce Weertman Mike Stults Mick Van Fossen Robert Weekly Chad.
Accessing Data through Web Services. IRIS Services – service.iris.edu  FDSN Web services  dataselect  station  event  Documentation Documentation.
Web Services at IRIS Implementations, Directions, and International Coordination Tim Ahern, Director of Data Services, IRIS Web Services Team: Bruce Weertman,
Thurs Nov 12, 13:45. PQLX - A Station Assessment & Data Quality Control System Applications and Uses.
Higher-Level Clients to Leverage MUSTANG Metrics Dr. Mary Templeton IRIS Data Management Center Managing Data from Seismic Networks September
Computational Seismology at LLNL: A National Lab Perspective Arthur Rodgers Atmospheric, Earth and Environmental Sciences Department Lawrence Livermore.
By Tim Ahern, Program Manager IRIS DMS A “Short” Introduction to the IRIS Data Management Center Data Holdings, Data Organization, and Data Access.
IT at the IRIS DMC: Synergy with the CIG By Tim Ahern, IRIS DMS Program Manager By Tim Ahern, IRIS DMS Program Manager.
High Level Data Products at the IRIS DMC Tim Ahern, IRIS Director of Data Services Rick Benson, IRIS Director of Operations.
Chapter 5 McGraw-Hill/Irwin Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved.
Server to Server Communication Redis as an enabler Orion Free
By Tim Ahern, Director of Data Services, IRIS A short introduction to the IRIS Data Management System Data Holdings, Data Organization, and Data Access.
Automated Data Quality Assurance: QUACK and MUSTANG Mary Templeton IRIS Data Management Center Mary Templeton IRIS Data Management Center.
SIG: Synthetic Seismogram Exchange Standards (formats & metadata) Is it time to establish exchange standards for synthetic seismograms? IRIS Annual Workshop.
ADMIT: ALMA Data Mining Toolkit  Developed by University of Maryland, University of Illinois, and NRAO (PI: L. Mundy)  Goal: First-view science data.
Center for Engineering Strong Motion Data (CESMD) Update March 25, 2013 CISN Advisory & Steering Committees Meeting.
Mantid Stakeholder Review Nick Draper 01/11/2007.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Internet Applications (Cont’d) Basic Internet Applications – World Wide Web (WWW) Browser Architecture Static Documents Dynamic Documents Active Documents.
MUSTANG Quality Assurance Gillian Sharer TA Team Meeting November 9-10, 2015 With contributions from Rob Casey, Bruce Weertman, Laura Hutchinson, and Mary.
GSN QC at the IRIS DMC Mary Templeton GSN Coordination Meeting Seattle, WA November 16, 2011.
By Tim Ahern, IRIS Director of Data Services Rick Benson, Manoch Bahavar, Alex Hutko, Celso Reyes, Yazan Suleiman Chad Trabant, Bruce Weertman.
FDSN Working Group II Data Exchange and Data Centers By Bernard Dost, ORFEUS Data Centre, deBilt Netherland Tim Ahern FDSN Data Center for Continuous Data.
Understanding SEED Headers. SEED is an international standard for the exchange of digital seismological data SEED was designed for use by the earthquake.
By Tim Ahern Program Manager IRIS Data Management System Interoperability Developments at IRIS.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Tim Ahern IRIS Director of Data Services Active Source Data Management Workshop January 2014 – Tucson, Arizona.
June 27-29, DC2 Software Workshop - 1 Tom Stephens GSSC Database Programmer GSSC Data Servers for DC2.
INTRODUCTION TO INFORMATION SYSTEMS LECTURE 9: DATABASE FEATURES, FUNCTIONS AND ARCHITECTURES PART (2) أ/ غدير عاشور 1.
Thurs Nov 12, 12:45 Nov 8-17, 2009Data Management Workshop Cairo, Egypt.
Station Metadata: What do I Need?
Data Formats & Data Structures
Network Interactions with MUSTANG: MUSTANG Clients and Network Reports
Requesting a Standardized Data Set for the FDSN Network
Requesting a Standardized Data Set for the FDSN Network
USArray Quality Assurance
The Web Service based approach for data distribution at the IRIS DMC
Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System Zaihua Ji Doug Schuster Steven Worley Computational.
MUSTANG Training Session for North American Regional Networks
Presentation transcript:

IRIS Services, Products, Quality Assurance Efforts, and Potential Links to High Performance Computing in the Era of BIG DATA By T. Ahern, M. Bahavar, R.Casey, C. Trabant, A. Clark, A. Hutko, R. Karstens, Y. Suleiman, B. Weertman

Primary TOPICS Data Access Services – a new paradigm Improved internal and external ease of use Products – stepping stones to further research Improved Quality Assurance Developing connections to HPC environments

IRIS Crown Jewel

IRIS Data Services Challenge The data holdings are large! How do we develop simple methods to discover, access, and utilize the data? How can we assist researchers in early stages of their research? How can we support tools that are commonly used in the community? How can IRIS improve the quality of global seismological data?

IRIS Services – service.iris.edu FDSN Web services dataselect station event Documentation IRIS web services timeseries rotation sacpz resp evalresp virtualnetwork traveltime Flinnengdahl distaz products

Modern computer languages that include support for basic web services include: Programmatic support is widespread JavaScript R (e.g. Rcurl) C# C/C++ (multiple libraries) Java Perl Python PHP MatLab

Perl Fetch scripts: command line access FetchData FetchEvent FetchMetadata

FetchData retrieves miniSEED, simple metadata, SEED RESP and/or SAC Poles and Zeros using the following selection criteria: Network, Station, Location and Channel all optional, can contain * and ? wildcards, virtual networks supported Start and end time range Geographic box or circular region Selections: command line, selection list file or BREQ_FAST file FetchData options

Request 1 hour of GSN/ANMO long-period vertical (LHZ) data and simple metadata for M8.8 Chilean earthquake: $ FetchData -N IU –S ANMO –L 00 –C LHZ' -s ,06:34:00 -e ,07:34:00 -o /data/Chile-GSN-LHZ.mseed -m /data/Chile-GSN-LHZ.metadata Convert the miniSEED to SAC with metadata $ mseed2sac Chile-GSN-LHZ.mseed –m Chile-GSN- LHZ.metadata -E '2010,058,06:34:11/ / /22.9' FetchData example

2 minutes later… 121 SAC files and a quick-n-dirty record section: FetchData example results

Performance WS-dataselect has been shown to be able to deliver 1 terabyte of data per day to a single remote user

FetchEvent retrieves event information from ws-event and prints simple ASCII output. Events can be selected using these criteria: Start and end time range Geographic box or circular region Depth range Magnitude range and type Catalog and contributor IRIS event ID Other options: Include secondary origins (default is primary only) Order results by magnitude or time Limit to origins updated after a specific date FetchEvent options

Request events for a 20 minute period including secondary origins: $ FetchEvent -s ,6:30 -e ,6:50 -secondary FetchEvent example

The success of web services

International Coordination FDSN web services are well coordinated between Europe and the US Intend to promote them elsewhere Canada, Japan, China, SE Asia Many developers producing ws aware clients ObsPy SOD jWeed WILBER 3

LEVEL 4 Integrated Research Products LEVEL 3 Seismological Research Products LEVEL 2 Derived Information Standard Processing EFFORTS in Higher Level Products Adapted from National Research Council Committee on Data Management and Computation (CODMAC) 16 LEVEL 1 Quality Controlled Data LEVEL 0 Raw Data

Products from IRIS PRODUCT

Searchable ProdUct Depository (event products all products)(event productsall products)

Special Event Products

Hurricane Sandy: very bad for New York City, $75B in damage overall

Vertical East-West North-South Pressure Hurricane Sandy: very interesting seismic noise source

Russian bolide seen by Global Seismic Network stations (atmospheric to ground coupling generated long period surface waves) but…..

2009 & 2013 test had very similar locations!

On-demand synthetic seismograms We are computing a complete GF database for: * High resolution 2D axisymmetric SEM (maybe 0.5 Hz?) * All source depths/distances * Seven 1D reference models (PREM, AK135, PREMoceanic… * Available on demand/command line to anyone through IRIS * Returns synthetic seismograms: filtered, GCMT or any moment tensor convolved ETH: Tarje Nissen-Meyers, Martin viel Driel, Niloufar Abolfathian IRIS: Alex Hutko & Chad Trabant

Quality Assurance Using MUSTANG Modular Utility for Statistical Knowledge Gathering What is MUSTANG A system initially providing ~two-dozen QA metrics Web service architecture and accessible Crawls through all data in the archive Changes in data and metadata trigger recalculation Integration with IRIS Web Services suite Can be part of a larger network of QA systems

How is MUSTANG designed? Consists of 3 major components A Master Scheduler (MCR) A central storage system (BSS) A metrics compute cluster mcrmom sche d resche d jobmgr Node ANode BNode CNode DNode E store

Metrics Project Status Simple metrics development Includes development of data acquisition, messaging, metadata processing, and other foundational details Gaps, STA/LTA, Overlaps, Availability, Max/Min/Mean/Median values, RMS SNR – event based using tau-p Data Latency adapted from existing QUACK code Polarity reversal will follow SNR Linearity is challenging State of health metrics

Metrics Project Status (2) Multiple time series metrics Station percent completeness Multiple station min/max/mean/median Other metrics being worked on Complex processing – in pipeline PSD algorithm just completed Processing just beginning Calculations do not have instrument corrections applied PDF plots will be generated dynamically to support aggregation and spectral differencing

More Metrics in Development Coherence of two separate time series Cross-correlation of two separate channels Differencing in PDFs, Aggregate PDFs Percent difference above HNM Check channel orientation – finding max coherence Compare cross-spectrum of two co-located channels Compare data to synthetic tide

Later Phase Additional metrics to be produced Look for spectral trends through mode differencing Timing integrity check by comparing to TauP Correlation of data to atmospheric data Ping or glitch detection Histogram of DC offsets Dead channel detector

Visualization Client -LASSO Flagship visualization client Provide ability to easily browse metrics data Provide ability to generate plots of indicated metrics Provide ability to organize results in web page Intended audiences Network operators Scientific users

IRIS DMC: Enhanced Quality Assurance MUSTANG Metric Estimators Gaps, overlaps, completeness, signal to noise, power density, pdf mode changes, Glitches, (~24 metrics in phase 2) MUSTANG Metric Estimators Gaps, overlaps, completeness, signal to noise, power density, pdf mode changes, Glitches, (~24 metrics in phase 2) PostgreSQL Database Data Quality Technician Domestic & Non-US Network Operators Domestic & Non-US Network Operators Archived and Real Time Data

IRIS DMC: Research Ready Data Sets MUSTANG Metric Estimators Gaps, overlaps, completeness, signal to noise, power density, pdf mode changes, Glitches, (~24 metrics in phase 2) MUSTANG Metric Estimators Gaps, overlaps, completeness, signal to noise, power density, pdf mode changes, Glitches, (~24 metrics in phase 2) PostgreSQL Database Data Quality Technician Domestic & Non-US Network Operators Domestic & Non-US Network Operators Researcher Specifies Required Data Metric Constraints DMC Filters Data Request Using Defined Constraints Filtered Data Request Returned to Researcher Archived and Real Time Data Research Ready Data Sets

Auxiliary Data Center IRIS currently operates an Active Backup System in Boulder, CO at UNAVCO Replication of time series and DBMS And other key items such as software source, etc. We wish to move toward a fully functional auxiliary data center model LLNL SDSC Argonne This can provide cycles close to data

Multiple Fully Functioning DMCs LLNL DBMS Wave forms Wave forms Seattle DBMS Wave forms Wave forms SDSC DBMS Wave forms Wave forms Ingestion BUD Real Time System File Ingestion System Web Services - Entire suite Breqfast WILBER3 MUSTANG SeismiQuery Breqfast Requests WebRequest Load Balancer Load Balancer

Links with High Performance Computing LLNL DBMS Wave forms Wave forms Seattle DBMS Wave forms Wave forms Web Services With Research Readiness Web Services With Research Readiness Scriptable Event Extraction Scriptable Event Extraction Event Products Research Ready Formatted for HPC netCDF HDF5 ADIOS other

Coordination with University Researchers Builds on IRIS DMC Strengths Provide access to hi-graded event products Plumbing between the archive and HPC environment streamlined Builds on LLNL strengths Data Mining Algorithmic processing on an HPC environment Fosters Collaboration

Some short live demonstrations Fetch data Conversion to SAC The entire GSN in 2 minutes per event

THANK YOU FOR YOUR ATTENTION

Requirements Identical (or very similar) hardware and software Hardware 300 terabyte RAID ~ 5 Dell Enterprise Servers Firewalls, routers, local area network High speed connections to Internet 2 or greater Software VMWare Oracle Linux Oracle RDBMS Trying to move to Enterprise Postgres etc.

USArray GMVs Ground Motion Visualizations

-Continually running infrasound auto-detections

All SPUD products are accessible through a webservice/XML. Translation: command line download GCMTs ( for

Dozens of record sections