Towards Personalized and Active Information Management for Meteorological Investigations Beth Plale Indiana University USA.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

LEAD Portal: a TeraGrid Gateway and Application Service Architecture Marcus Christie and Suresh Marru Indiana University LEAD Project (
Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
General introduction to Web services and an implementation example
Integrating NOAA’s Unified Access Framework in GEOSS: Making Earth Observation data easier to access and use Matt Austin NOAA Technology Planning and Integration.
Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
StatCat Building a Statistical Data Finder ssrs.yale.edu/statcat Steven Citron-Pousty Ann Green Julie Linden Yale University.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
1 Introduction The Database Environment. 2 Web Links Google General Database Search Database News Access Forums Google Database Books O’Reilly Books Oracle.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
Chapter 1 Overview of Databases and Transaction Processing.
1 Using the Weather to Teach Computing Topics B. Plale, Sangmi Lee, AJ Ragusa Indiana University.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness and Peter Fox CSCI Week 9, October 27, 2008.
Dr. Kurt Fendt, Comparative Media Studies, MIT MetaMedia An Open Platform for Media Annotation and Sharing Workshop "Online Archives:
Metadata, Ontologies, and Provenance: Towards Extended Forms of Data Management Beth Plale, Yogesh Simmhan Computer Science Dept.
18:15:32Service Oriented Cyberinfrastructure Lab, Grid Deployments Saul Rioja Link to presentation on wiki.
L inked E nvironments for A tmospheric D iscovery Linked Environments for Atmospheric Discovery (LEAD) Kelvin K. Droegemeier School of Meteorology and.
Addressing the Data Deluge: the Structuring, Sharing, and Preserving of Scientific Experiment Data Beth Plale Sangmi Lee Scott Jensen Yiming Sun Computer.
Page 1 © Crown copyright 2005 NESC Workshop 6th-8th September 2005 V-GISC – SIMDAT Gil Ross (Met Office UK) NESC Workshop 6th to 8th September 2005.
CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness TA Weijing Chen Semantic eScience Week 10, November 7, 2011.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness and Joanne Luciano With Peter Fox and Li Ding CSCI Week 10, November.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
Integrated Grid workflow for mesoscale weather modeling and visualization Zhizhin, M., A. Polyakov, D. Medvedev, A. Poyda, S. Berezin Space Research Institute.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
MADIS Airlines for America Briefing Meteorological Assimilated Data Ingest System (MADIS) FPAW Briefing Steve Pritchett NWS Aircraft Based Observations.
INFSO-RI Enabling Grids for E-sciencE Intelligent Distributed Data Management in Earth System Science S. Kindermann, DKRZ, Germany.
Semantic Publishing Benchmark Task Force Fourth TUC Meeting, Amsterdam, 03 April 2014.
MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
© Geodise Project, University of Southampton, Integrating Data Management into Engineering Applications Zhuoan Jiao, Jasmin.
Convert generic gUSE Portal into a science gateway Akos Balasko.
Towards Unifying Vector and Raster Data Models for Hybrid Spatial Regions Philip Dougherty.
The Research Data Archive at NCAR: A System Designed to Handle Diverse Datasets Bob Dattore and Steven Worley National Center for Atmospheric Research.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
3-D rendering of jet stream with temperature on Earth’s surface ESIP Air Domain Overview The Air Domain encompasses a variety of topic areas, but its focus.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
LEAD Project Discussion Presented by: Emma Buneci for CPS 296.2: Self-Managing Systems Source for many slides: Kelvin Droegemeier, Year 2 site visit presentation.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
The National Weather Service Goes Geospatial – Serving Weather Data on the Web Ken Waters Regional Scientist National Weather Service Pacific Region HQ.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
LEAD Workflow Orchestration Lavanya Ramakrishnan Renaissance Computing Institute University of North Carolina – Chapel Hill Duke University North Carolina.
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
IPDA Registry Definitions Project Dan Crichton Pedro Osuna Alain Sarkissian.
The future of Delft-FEWS
A Quick tour of LEAD for the VGrADS
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Database Design Hacettepe University
Robert Dattore and Steven Worley
Presentation transcript:

Towards Personalized and Active Information Management for Meteorological Investigations Beth Plale Indiana University USA

Problem Statement Mesoscale meteorology research is highly data- driven. –Large percentage of data streams in from observational platforms. Available in OPeNDAP servers. –Data that is over 10 minutes old is too old. –Researchers are currently working on increasing real- time responsiveness to developing weather conditions. Mesoscale meteorology is a vast information space. –Forecasting models assimilate data from growing number of sources

Solution Statement Internet has proven the utility of user-oriented view towards information space management –Browser, bookmarks to organize –Blogs, web page tools ( FrontPage, Dreamweaver ) to publish We apply concept of user-oriented view to management of mesoscale meteorology information space. myLEAD: tool to help an investigator make sense of, and operate in, the vast information space that is mesoscale meteorology.

Motivation for LEAD Each year, mesoscale weather – floods, tornadoes, hail, strong winds, lightning, and winter storms – causes hundreds of deaths, routinely disrupts transportation and commerce, and results in annual economic losses > $13B.

Conventional Numerical Weather Prediction OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites

OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites Analysis/Assimilation Quality Control Retrieval of Unobserved Quantities Creation of Gridded Fields Conventional Numerical Weather Prediction

Analysis/Assimilation Quality Control Retrieval of Unobserved Quantities Creation of Gridded Fields Prediction PCs to Teraflop Systems Conventional Numerical Weather Prediction OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites

Analysis/Assimilation Quality Control Retrieval of Unobserved Quantities Creation of Gridded Fields Prediction PCs to Teraflop Systems Product Generation, Display, Dissemination Conventional Numerical Weather Prediction OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites

Analysis/Assimilation Quality Control Retrieval of Unobserved Quantities Creation of Gridded Fields Prediction PCs to Teraflop Systems Product Generation, Display, Dissemination End Users NWS Private Companies Students Conventional Numerical Weather Prediction OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites

Analysis/Assimilation Quality Control Retrieval of Unobserved Quantities Creation of Gridded Fields Prediction PCs to Teraflop Systems Product Generation, Display, Dissemination End Users NWS Private Companies Students Conventional Numerical Weather Prediction OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites The process is entirely serial and pre-scheduled: no response to weather! The process is entirely serial and pre-scheduled: no response to weather!

Analysis/Assimilation Quality Control Retrieval of Unobserved Quantities Creation of Gridded Fields Prediction PCs to Teraflop Systems Product Generation, Display, Dissemination End Users NWS Private Companies Students The LEAD Vision: No Longer Serial or Static OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites

Analysis/Assimilation Quality Control Retrieval of Unobserved Quantities Creation of Gridded Fields Prediction PCs to Teraflop Systems Product Generation, Display, Dissemination End Users NWS Private Companies Students The LEAD Vision: No Longer Serial or Static OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites

LEAD data: initial working data set ETA model gridded analysis METAR surface observations Rawinsondes – upper air balloon observations ACARS – commercial aircraft temperature and wind observations NEXRAD Level II data GOES visible satellite data

Returning to Solution Statement We apply concept of user-oriented view to management of mesoscale meteorology information space. myLEAD: tool to help an investigator make sense of, and operate in, the vast information space that is mesoscale meteorology.

Information space management tool At core is metadata catalog –Why? Observational products already being stored elsewhere. Public file and could be large, so do not want to copy user’s file system. Instead maintain “bookmark” Scale to support thousands of distributed users, including individual investigators, pre-college classroom investigators, casual observers.

Technical Challenges Querying must be efficient –Over data products described by rich domain-specific metadata –Over data products whose description can be augmented over time Obtaining metadata is hard –Automate as much as possible Privacy must be fully enforced –Any data product that user designates as private must remain private Publishing –Publish product to larger community: data file, model output, full experiment –Must be under user control –Discovery of information that has been made public Build trust –User may work within myLEAD space for 5 years of graduate work, for instance –User must be convinced of privacy, reliability, longevity, etc.

Rundown on Implementation Specs Building on top of MCS and OGSA-DAI –MCS for extensible db schema, general db schema, and security infrastructure already in place –OGSA-DAI for grid/web service architecture Database used is mySQL 5.0 –Supports stored procedures –Ogsa-dai to mySQL is JDBC Data product descriptions in and out of database conform to LEAD-specific XML schema. myLEAD server and myLEAD agent are written in java.

Related Work mySpace – AstroGrid, UK –Similar to myLEAD in reigning information space –Creates swatches in large federation of data archives for the cache and persistent data for a “community” –Provides common query access over cache space and persistent space RDF (Resource Description Framework) –Basic building block is the subject-predicate-object triple: –[S] – P -> [O] [Dickens] – hasWritten -> [Pickwick Papers] –Good for storing detailed relationship information (good for understanding the relationship between two terms) NEESgrid – NCSA –Uses RDF –Little available in public literature myGrid Information Repository (MIR) – myGRID, Manchester –Most similar to myLEAD –Support for text search scientific papers, uses Life Sciences Identifier (LSID) –myLEAD stronger personal orientation (gurantees, publishing, automatic metadata generation)

myLEAD service Server side services Client side services data model data model MCS myLEAD stored procedures OGSA-DAI JDBC MCS client myLEAD agent Portal access to myLEAD User interface relational DB myLEAD myLEAD Architecture

Factory myLEAD “agent” instance myLEAD “agent” instance WRF model Data mining task Data mining task workflow myLEAD service myLEAD service Storage Repository Service (RLS) Storage Repository Service (RLS) myLEAD portlet as component of LEAD portal /var/tmp/wrf_tmp IU NCSA myLEAD use scenario Workflow confers with myLEAD “agent” to determine location of scratch space

Metadata Catalog Data Model Users Investigations –Tornado April 20 Chicago Illinois Experiments –Ensemble: run of 100 simultaneous forecast models parameterized slightly differently Collections Logical files –Input observational files, input parameters, derived files, analysis results, images, model results, workflows, execution status messages AbeBingCaru

Investigation User – Dublin Core Attributes stored in “type” tables: i.e., string, float, temporal, int. Great extensibility, but need to carefully control naming; efficient querying could be an issue as well. Logical file Collection Data Model

myWorkspace: J. Kowaleski preferences Experiment 1: Norman, OK 21Oct04:23:11:45 Workflow template vizEta 03Aug04:13:35:40 Workflow template WRF 15May04:05:25:59 Favorite spaces Home disk space Thor cluster scratch space Input parameters NEXRAD 26Oct04:13:45:40 GOES-infrared 26Oct04:12:00:00 METAR 26Oct04:09:10:05 Wrf-out1-26Oct04:13:35:40 Input observational WRF-out Wrf-out2-26Oct04:13:37:25 Wrf-out3-26Oct04:13:43:15 workflow instance Collection level Logical file level Have associated a set of attributes that describe this data product Browser provides user a hierarchical view of space that is essentially flat. Users like hierarchy. Data Model

myLEAD agent Separate transient grid/web service –Has state about user, current investigation and experiment –Embeds myLEAD client API Purpose: –Controls naming –Helps use database structure in repeatable, meaningful way Maintains FSM of current state of execution; stores into new collection based on state –Input  model run  analysis  final results –Derives metadata attributes for new data product object when created during course of workflow by means of: Case-based reasoning Internal state Consulting ontology

Resources Geo- Data products Observational data Model generated data Collections Derived data Data analytics Workflow scripts compute resources, storage resource Data analytics resources (statistics table) services Model input resources Resources: “things that need describing (i.e., metadata)” Data mining Data Product Metadata

Notes Global ID “LSID” for geosciences Temporal coverage Same as spatial Spatial coverage GML, THREDDS, FGDC, COARDS-CF Geophysical quantity Defined by common vocabulary Platform Goes10, Goes8; WSR-88, CASA Instrument type site East-west; KXYZ Model run info Model derived data product Syntactic description Binary format of data product Contact info Dublin core Physical location of service Protocol to access service Dataset summary Dublin core list of predecessors GID of input data products, workflow instance Event mesocyclone, storm cell, tornado Quality Complex Completeness

Current Research Challenges Publishing –Publishing data product to larger community: data file, model output, full experiment –Discovery of information that has been made public Guarantees –Any data product that user designates as private must remain private –When request for product is issued, product must exist Flexible yet efficient schema –Inherited from MCS, supports evolved understanding of data product over time by means of extended attributes Immutable investigations –Collections, views, and logical files can be reused from earlier investigations without destroying integrity of earlier investigation Proactive agent –Infers metadata attributes from context of active experiment using case- base reasoning.

Beth Plale 4 days away from our national elections … wish us well.