ESML, Subsetting, Mining Tools

Slides:



Advertisements
Similar presentations
1 NASA CEOP Status & Demo CEOS WGISS-25 Sanya, China February 27, 2008 Yonsook Enloe.
Advertisements

Office of SA to CNS GeoIntelligence Introduction Data Mining vs Image Mining Image Mining - Issues and Challenges CBIR Image Mining Process Ontology.
BEDI -Big Earth Data Initiative
MODIS Data at NSIDC MODIS Collection 5/Long Term Data Record Workshop Molly McAllister & Terry Haran January
Information Retrieval in Practice
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
USING GIS TO FOSTER DATA SHARING AND COMMUNICATION SEAN MURPHY IVS BURLINGTON, VT.
Chapter 3 Software Two major types of software
Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Overview of Search Engines
November 2011 At A Glance GREAT is a flexible & highly portable set of mission operations analysis tools that increases the operational value of ground.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
UAH GRIDS Center Middleware Testing Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
Metadata (for the data users downstream) RFC GIS Workshop July 2007 NOAA/NESDIS/NGDC Documentation.
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE) Increasing Accessibility and Interoperability of NASA Data Products with GIS Tools.
Dr.S.Sridhar,Ph.D., RACI(Paris),RZFM(Germany),RMR(USA),RIEEEProc.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
, Implementing GIS for Expanded Data Accessibility and Discoverability ASDC Introduction The Atmospheric Science Data Center (ASDC) at NASA Langley Research.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Exploring the Applicability of Scientific Data Management Tools and Techniques on the Records Management Requirements for the National Archives and Records.
Grid-enabled Collaborative Research Applications Internet2 Member Meeting Spring, 2003 Sara J. Graves Director, Information Technology and Systems Center.
Unidata TDS Workshop TDS Overview – Part I XX-XX October 2014.
NEPTUNE Canada Workshop Oceans 2.0 Project Environment NEPTUNE Canada DMAS Team Victoria, BC February 16, 2009.
ATMOSPHERIC SCIENCE DATA CENTER ‘Best’ Practices for Aggregating Subset Results from Archived Datasets Walter E. Baskin 1, Jennifer Perez 2 (1) Science.
Why do I want to know about HDF and HDF- EOS? Hierarchical Data Format for the Earth Observing System (HDF-EOS) is NASA's primary format for standard data.
EARTH SCIENCE MARKUP LANGUAGE Why do you need it? How can it help you? INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
UAH The University of Alabama in Huntsville SUBSETTING Matt Smith Information Technology and Systems Center (ITSC) University of Alabama in Huntsville.
Ames Research Center NAS Division 1 A Data Miner for the Information Power Grid Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA.
On-Board Mining in the Sensor Web NSF Next Generation Data Mining November 2, 2002 Dr. Rahul Ramachandran For Steve Tanner and.
HDF & HDF-EOS Workshop VIII 2004 October Aurora, CO Bruce Beaumont, Matt Smith, Helen Conover, Sara Graves Subsetting at UAH.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
, Key Components of a Successful Earth Science Subsetter Architecture ASDC Introduction The Atmospheric Science Data Center (ASDC) at NASA Langley Research.
ESIP Federation 2004 : L.B.Pham S. Berrick, L. Pham, G. Leptoukh, Z. Liu, H. Rui, S. Shen, W. Teng, T. Zhu NASA Goddard Earth Sciences (GES) Data & Information.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
2015 GLM Annual Science Team Meeting: Cal/Val Tools Developers Forum 9-11 September, 2015 DATA MANAGEMENT For GLM Cal/Val Activities Helen Conover Information.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
Jianchun Qin, Liguang Wu, Michael Theobald, A. K. Sharma, George Serafino, Sunmi Cho, Carrie Phelps NASA Goddard Space Flight Center, Code 902 Greenbelt,
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific.
ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville (UAH)
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Cyberinfrastructure to promote Model - Data Integration Robert Cook, Yaxing Wei, and Suresh S. Vannan Oak Ridge National Laboratory Presented at the Model-Data.
EARTH SCIENCE MARKUP LANGUAGE Tutorial on how to write an ESML Description File (for ESML Schema v3.0) “Define Once Use Anywhere” INFORMATION TECHNOLOGY.
INTRODUCTION TO GIS  Used to describe computer facilities which are used to handle data referenced to the spatial domain.  Has the ability to inter-
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
September 23-25, 2003HDF-EOS Workshop VII ECS-HSA the HEW Subsetting Appliance HDF-EOS Workshop VII Silver Spring, MD – September 23-25, 2003 Dr. Sara.
CONVERSION OF CAD DATA TO GIS LAYERS Challenges and Techniques Compiled by: Tope Bello Summer 2005 Instructor POEC 6387 GIS Workshop Professor Ronald Briggs.
Information Retrieval in Practice
Dr.S.Sridhar,Ph.D., RACI(Paris),RZFM(Germany),RMR(USA),RIEEEProc.
NASA Earth Science Data Stewardship
Search Engine Architecture
MATLAB Distributed, and Other Toolboxes
INTRODUCTION TO GEOGRAPHICAL INFORMATION SYSTEM
Persistent Identifiers Implementation in EOSDIS
Joseph JaJa, Mike Smorul, and Sangchul Song
Enterprise Algorithm Change Process
Improving Data Access, Discovery, and Usability
The cf-python software library
Tom Rink Tom Whittaker Paolo Antonelli Kevin Baggett.
Creating a Data Mining Environment for Geosciences
What's New in eCognition 9
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
What's New in eCognition 9
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
What's New in eCognition 9
Presentation transcript:

ESML, Subsetting, Mining Tools MODIS Science Team Meeting July 24, 2002 Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville (UAH) www.itsc.uah.edu My name is Matt Smith Many of you have heard parts of this talk before, I presented a talk 3 years ago about a dataset-independent subsetter. But this time I want to explain something that some people have missed before.

Tools Encompassing All Phases of Scientific Analysis Science Data Usability Data/Application Interoperability Earth Science Markup Language (ESML) Science Data Preprocessing Subsetting Various Subsetting Tools such as HEW Science Data Analysis Data Mining Algorithm Development and Mining (ADaM) System Mission/Project/Field Campaign Coordination Electronic Collaboration

Science Data Usability http://esml.itsc.uah.edu

Earth Science Data Characteristics HDF HDF-EOS Different formats, types and structures (18 and counting for Atmospheric Science alone!) Different states of processing ( raw, calibrated, derived, modeled or interpreted ) Enormous volumes Heterogeneity leads to Data usability problem netCDF ASCII Binary GRIB

Data Usability Problem FORMAT 1 DATA FORMAT 2 DATA FORMAT 3 FORMAT CONVERTER READER 1 READER 2 APPLICATION Requires specialized code for every format Difficult to assimilate new data types Makes applications tightly coupled to data One possible solution - enforce a Standard Data Format Not practical, especially for legacy datasets

ESML Solution DATA FORMAT 1 DATA FORMAT 2 DATA FORMAT 3 ESML FILE ESML FILE ESML FILE ESML LIBRARY APPLICATION ESML (external metadata) files containing the structural description of the data format Applications utilize these descriptions to figure out how to read the data files resulting in data interoperability for applications

What is ESML? It is a specialized markup language for Earth Science metadata based on XML It is a machine-readable and -interpretable representation of the structure and content of any data file, regardless of data format ESML description files contain external metadata that can be generated by either data producer or data consumer (at collection, data set, and/or granule level) ESML provides the benefits of a standard, self-describing data format (like HDF, HDF-EOS, netCDF, geoTIFF, …) without the cost of data conversion ESML is an Interchange Technology that allows data/application interoperability

ESML Tools/Products Available http://esml.itsc.uah.edu Features ESML Schema Defines ASCII, Binary (McIDAS), HDF-EOS, GRIB formats (more formats to follow) Provides preprocessing, wild card, symbols and semantic tagging capabilities ESML Library (C++/Java JNI ) WINDOWS and LINUX URL access Handles ASCII, Binary (McIDAS), HDF-EOS, GRIB files Handles preprocessing, wild card and symbols ESML Editor Ability to write and validate ESML descriptions ESML Data Browser Ability to browse data files using the ESML description files

MODIS/CERES Collocation Application MISR/ Others MODIS ESML file CERES ESML file ESML file Network ESML Library Collocation Algorithm Purpose: To study the relationship between shortwave flux and cloud/aerosol properties Important for climate change studies Scientists can: Select remote files across the network Select fields by modifying semantic tags in the ESML file Analysis

Science Data Preprocessing http://subset.org

Currently Available/Planned Subsetting Applications HEW Subsetting Complete System (available) Subsetting Engine Only (available) Subsetting Center (available) SPOT - Subsettability Checker (available) HEW Integration with ECS (in work) Remote Subsetting Service (planned) Subsetting as a Web Service (planned) Customized Subsetting MODIS tools (available) Coarse-grain SSM/I Subsetter (available) General Purpose Customizable Subsetting Based on ADaM Data Mining Engine (available) Subsetting Tool using ESML (in work)

Tools developed for MODIS Scientists MODIS – Land, Quality Assessment modland – subsetter for MODIS gridded data stitcher – pieces together 2 or 4 contiguous MODIS tiles MODIS – Atmosphere modair - specialized subsetter for MODIS swaths

HEW integration with ECS EDG System ECS End user 1 2 EDG ECS Order submission (HTML) 7 4 Output data 3 Data order (Reingested) and reply UAH/ITSC-written subsetting and interface software Ongoing testing with ECS 6a.05 and EDG 3.4 at NSIDC, LP DAAC, GDAAC Enhancements for DAACs may be made Subset ODL and reply 5 6 Output Subsetter Input data data Subsetting System

ESML enabled generic Subsetter Other Formats Binary/ ASCII HDF-EOS ESML file ESML file ESML file Network ESML Library Subsetting Algorithm For HDF-EOS data not formatted for subsetting with the HDF-EOS library: ESML file can be used to correct the semantic tag required to subset HDF-EOS data without the need to recreate the data file Subsetted Data

Science Data Analysis http://datamining.itsc.uah.edu

Data Mining Data Mining is the task of discovering interesting patterns/anomalies and extracting novel information from large amounts of data Data Mining is an interdisciplinary field drawing from areas such as statistics, machine learning, pattern recognition and others

Iterative Nature of the Data Mining Process EVALUATION And PRESENTATION KNOWLEDGE DISCOVERY MINING SELECTION And TRANSFORMATION CLEANING And INTEGRATION PREPROCESSING DATA

ADaM Engine Architecture Preprocessed Data Patterns/ Models Results Data Translated Data Processing Input HDF HDF-EOS GIF PIP-2 SSM/I Pathfinder SSM/I TDR SSM/I NESDIS Lvl 1B SSM/I MSFC Brightness Temp US Rain Landsat ASCII Grass Vectors (ASCII Text) Intergraph Raster Others... Preprocessing Analysis Output GIF Images HDF-EOS HDF Raster Images HDF SDS Polygons (ASCII, DXF) SSM/I MSFC Brightness Temp TIFF Images Others... Selection and Sampling Subsetting Subsampling Select by Value Coincidence Search Grid Manipulation Grid Creation Bin Aggregate Bin Select Grid Aggregate Grid Select Find Holes Image Processing Cropping Inversion Thresholding Others... Clustering K Means Isodata Maximum Pattern Recognition Bayes Classifier Min. Dist. Classifier Image Analysis Boundary Detection Cooccurrence Matrix Dilation and Erosion Histogram Operations Polygon Circumscript Spatial Filtering Texture Operations Genetic Algorithms Neural Networks Others...

Reasons for Building a Data Mining Environment Provide scientists with the capabilities to iterate Allow the flexibility of creative scientific analysis Provide data mining benefits of Automation of the analysis process Reduction of data volume Provide a framework to allow a well defined structure for the entire analysis process Provide a suite of mining algorithms for creative analysis Provide capabilities to add “science algorithms” to the framework

ADaM : Mining Environment for Scientific Data The system provides knowledge discovery, feature detection and content-based searching for data values, as well as for metadata. contains over 120 different operations Operations vary from specialized science data-set specific algorithms to various digital image processing techniques, processing modules for automatic pattern recognition, machine perception, neural networks, genetic algorithms and others

Extensibility of ADaM Training Data Input Visualization/ Analysis General Purpose Algorithms Trainable Classifiers User Defined Training Data Input ADaM Mining Engine Input Modules Analysis Modules Output Modules Visualization/ Analysis Packages

Reasons for using ADaM for Scientific Data Analysis Provide scientists with the capabilities to iterate Allow the flexibility of creative scientific analysis Is a powerful tool for research and analysis given the volume of science data Extremely useful when manual examination of data is impossible Allows scientists to add problem specific algorithms to the ADaM toolkit Minimizes scientists’ data handling to allow them to maximize research time Reduces “reinventing the wheel”

Mission/Project/Field Campaign Coordination Electronic Collaboration

Strategic and Tactical Coordination Technologies to coordinate complex projects Data acquisition and integration from multiple platforms, instruments and agencies for quick exploitation Intra-project communications before, during, and after CAMEX campaigns

CAMEX-4 Coordination: pre-flight NASA managers may review status of aircraft, instruments, flight plans at various times throughout the mission CAMEX-4 Coordination: pre-flight RDBMS Coordination Clearinghouse Scalable, reliable data management Web-based interface with customized information access for different user groups; rapid development, scalability and portability Aircraft Crew: Perform aircraft maintenance and report status. Experiment PI: Coordinates with all participants, posts plan of the day Preflight mission briefing and flight planning Forecaster: Contacts local weather, forecast centers, weather support web sites to prepare and post daily morning weather briefing NASA Aircraft USAF Aircraft NOAA Aircraft Radars

CAMEX-4 Coordination: in flight Download latest satellite imagery to web RDBMS Coordination Clearinghouse Communications Satellite Transmit selected data to National Hurricane Center for inclusion in computer forecast models Experiment PI: Modify flight plan as needed in response to changing weather events Forecaster: Continue monitoring satellite imagery, radar data, and landing forecasts Instrument scientists: Collect, process, and store data on board aircraft NASA Aircraft USAF Aircraft NOAA Aircraft Radars

CAMEX-4 Coordination: post-flight RDBMS Coordination Clearinghouse Mission and Instrument scientists: Post sortie and instrument reports and quicklook data Aircraft Crew: Prepare aircraft and instruments for next flight and update status. Forecaster: Prepare post-flight weather briefing NASA Aircraft USAF Aircraft NOAA Aircraft Radars