Event and Feature Catalogs in the Virtual Solar Observatory Joseph A. Hourclé and the VSO Team SP54A-07 : 2008 May 30.

Slides:



Advertisements
Similar presentations
Metadata at ICPSR Sanda Ionescu, ICPSR.
Advertisements

Computer Vision for Solar PhysicsSDO Science Workshop, May 2011 Computer Vision for Solar Physics Piet Martens Montana State University Center for Astrophysics.
Publishing Workflow for InDesign Import/Export of XML
Database Implementation Issues CPSC 315 – Programming Studio Spring 2008 Project 1, Lecture 5 Slides adapted from those used by Jennifer Welch.
SDO Feature Finding Team Alisdair Davey SDO Feature Finding Team Alisdair Davey
Chapter 11 Data Management Layer Design
U of R eXtensible Catalog Team MetaCat. Problem Domain.
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
Karen TianHMI/AIA Science Teams Meeting February 13-17, 2006 Virtual Solar Observatory VSO provides a unified view of data from diverse solar sources
XIS™ XML Intranet System. XIS, the XML Intranet System provides the foundation for your database production and management. XIS maximizes the flexible.
Database Models. Flat File The most basic way to organize data is as a flat file. You can think of this as a single table with a large number of records.
6. Database Reports Lingma Acheson Department of Computer and Information Science IUPUI CSCI N207 Data Analysis Using Spreadsheets 1.
Ogden Air Logistics Center. Purpose of Excel2FV Many agencies produce point lists of different data (target lists, force locations, etc.) in either Excel.
Overview of Mini-Edit and other Tools Access DB Oracle DB You Need to Send Entries From Your Std To the Registry You Need to Get Back Updated Entries From.
F. I. Suárez-Sol á 1, E. González-Suárez 1, I. González-Hernández 1, A.R. Davey 2,J. Hourcl é 3, VSO Team 1 National Solar Observatory, Tucson AZ – 2 Harvard-Smithsonian.
Attribute Data in GIS Data in GIS are stored as features AND tabular info Tabular information can be associated with features OR Tabular data may NOT be.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
Scalable Metadata Definition Frameworks Raymond Plante NCSA/NVO Toward an International Virtual Observatory How do we encourage a smooth evolution of metadata.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Relational Database CISC/QCSE 810 some materials from Software Carpentry.
Planetary Science Archive PSA User Group Meeting #1 PSA UG #1  July 2 - 3, 2013  ESAC PSA Archiving Standards.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Introduction to HTML. What is HTML?  Hyper Text Markup Language  Not a programming language but a markup language  Used for presentation and layout.
 Agenda 2/20/13 o Review quiz, answer questions o Review database design exercises from 2/13 o Create relationships through “Lookup tables” o Discuss.
Systems Life Cycle. Know the elements of the system that are created Understand the need for thorough testing Be able to describe the different tests.
What is the VSO? (and what isn’t it?). The VSO …  Allows you to search multiple archives in a single search  Keeps you from needing to keep track of.
Views Lesson 7.
Design Considerations for Catalogs Joseph A. Hourclé NSO-Tucson.
1 Digital Preservation Testbed Database Preservation Issues Remco Verdegem Bern, 9 April 2003.
Database Management Systems.  Database management system (DBMS)  Store large collections of data  Organize the data  Becomes a data storage system.
Component 4: Introduction to Information and Computer Science Unit 6a Databases and SQL.
Geographic Text Search Corporate Proprietary, Copyright , MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.
AstroGrid Solar/STP planning meeting Agenda: Helioscope Preparing for Solar-B Time-series viewing application IVOA and time series A PPARC funded project.
Dale E. Gary Professor, Physics, Center for Solar-Terrestrial Research New Jersey Institute of Technology 1 9/25/2012Prototype Review Meeting.
1 Technology in Action Chapter 11 Behind the Scenes: Databases and Information Systems Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
1 Machine Learning and Data Mining for Automatic Detection and Interpretation of Solar Events Jie Zhang (Presenting, Co-I, SCS*) Art Poland (PI, SCS*)
EOVSA Data and Database System J. McTiernan EOVSA CoDR 25-April-2011.
SQL Jan 20,2014. DBMS Stores data as records, tables etc. Accepts data and stores that data for later use Uses query languages for searching, sorting,
Analyzing Systems Using Data Dictionaries Systems Analysis and Design, 8e Kendall & Kendall 8.
Database Management Systems (DBMS)
Spreadsheet vs Database What’s the difference and who cares?
VSO Status Update HDMC Meeting 9 June Vision To allow solar physicists to identify and search for data even if they don't know it exists. Make.
Gold – Crystal Reports Introductory Course Cortex User Group Meeting New Orleans – 2011.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Algorithm Preparation and Data Availability 1. Mullard Space Science Laboratory, University College London. 2. Physics and Astronomy Department, University.
DATA Spatial Data – where things are Non Spatial Data or Attribute Data – What things are Data in a computer database are managed and accessed through.
Interface for Glyco Vault Functionality and requirements. Initial proposal. Maciej Janik.
ADNET Systems, Inc. Jack Ireland & Helioviewer Team ADNET Systems, Inc. Helioviewer Discovery for Everyone Everywhere.
Sally McCallum Library of Congress
3.1 CSC 102 Introduction to Information Systems Databases.
Connecting to External Data. Financial data can be obtained from a number of different data sources.
Scenario use cases Szymon Mueller PSNC. Agenda 1.General description of experiment use case. 2.Detailed description of use cases: 1.Preparation for observation.
SQL Basics Review Reviewing what we’ve learned so far…….
SCI-BUS is supported by the FP7 Capacities Programme under contract RI ER-FLOW is supported by the FP7 Infrastructures under contract RI
XML Databases Presented By: Pardeep MT15042 Anurag Goel MT15006.
Information Management in a Non-Bibliograpic Environment: Scientific Data Joseph A. Hourclé 2007-Nov-20 FLICC
Storage and File Organization
Microsoft Office Access 2010 Lab 2
Database Implementation Issues
Lecture 12 Lecture 12: Indexing.
Physical Database Design
Introduction to Database Systems
Relational Database Model
DATABASE IMPLEMENTATION ISSUES
The ultimate in data organization
Database Implementation Issues
Database Implementation Issues
Database management systems
Presentation transcript:

Event and Feature Catalogs in the Virtual Solar Observatory Joseph A. Hourclé and the VSO Team SP54A-07 : 2008 May 30

Types of Catalogs  Data Catalogs Used to track all data available Used to track all data available May be ‘observation’ centric or ‘file’ centric May be ‘observation’ centric or ‘file’ centric Typically maintained by the mission or PI Typically maintained by the mission or PI Have basic similarities across archives Have basic similarities across archives  Event / Feature Catalogs Added science input Added science input Typically a byproduct of other research Typically a byproduct of other research Very heterogeneous Very heterogeneous

What are we cataloging?  Features: Active Regions Active Regions Sunspots Sunspots CMEs CMEs Filaments Filaments Prominences Prominences Bright points Bright points Coronal Loops Coronal Loops Oscillations Oscillations Coronal Holes Coronal Holes EIT Waves EIT Waves  Events: Radio Bursts Flares Campaigns  Non Events: Data Gaps  Data: * Publications Annotation

Processing Catalogs  Ingestion Reading the information Reading the information Understanding the information Understanding the information  Storage  Presentation Single Catalog Use Single Catalog Use Multi Catalog Use Multi Catalog Use

Ingestion  Who is the authoritative source? What if there are multiple value-added derivative products? What if there are multiple value-added derivative products? Which is the best format for ingestion? Which is the best format for ingestion?  What is being cataloged? What is the unit for each record? What is the unit for each record?  What data is in each record? Columns != Data Fields Columns != Data Fields May need to infer values from other Fields May need to infer values from other Fields

Ingestion, cont’d  Formatting issues? Fixed width values overflowing column Fixed width values overflowing column Formatting may store information Formatting may store information Color / Font effects Color / Font effects May vary the record depending on info May vary the record depending on info May be maintained by hand May be maintained by hand … by multiple maintainers Data in sub-headings Data in sub-headings  Missing Data? How are null values / error issues marked? How are null values / error issues marked?

Storage  IDL: Difficult to access from other platforms Difficult to access from other platforms  XML Self documenting; good for interchange, not so great for use Self documenting; good for interchange, not so great for use  Flat file Compact, but must be loaded into another system to ‘do science’. Good for interchange if well formed & documented Compact, but must be loaded into another system to ‘do science’. Good for interchange if well formed & documented  RDBMS Good for searching & filtering … may not be able to handle all data types / multi-value fields without normalization  Hierarchical databases Can handle multi-value, but not designed for record cross-correlation

Storage of Field Values  Columns may be multiple fields: (value) or (comment) (value) or (comment) (value) and (units) (value) and (units)  What do we store? Store both a ‘display’ and a more useful value? Store both a ‘display’ and a more useful value? Eg, store display value & units, but also store value in a fixed unit. Eg, store display value & units, but also store value in a fixed unit. Ensure correct sorting, eg for X-ray flares: Ensure correct sorting, eg for X-ray flares: M9 < X2 < X10 … store as M09 / X02 / X10 M9 < X2 < X10 … store as M09 / X02 / X10 What if X100? Store as W/m2? What if X100? Store as W/m2?

Presentation  Can I mimic the original display? Does that limit the uses of the catalog? Does that limit the uses of the catalog?  Do the concepts need to be adjusted? Changes in definitions or accepted community standards Changes in definitions or accepted community standards  Do columns need to be linked to make sense? Min / Max / Units for a range Min / Max / Units for a range Field may not be sortable without another column Field may not be sortable without another column  Other presentation issues come from use.

Single Catalog Use: How do we access the catalog?  SQL very powerful, but difficult to learn / use. How do we export to ‘do science’ with it? very powerful, but difficult to learn / use. How do we export to ‘do science’ with it?  IDL good for scientists who use IDL, allows them to ‘do science’ without conversion good for scientists who use IDL, allows them to ‘do science’ without conversion  HTML GUI Javascript allows more processing of tables. Still need to export for more complicated science  APIs Yes, but what format output, and what features do we need to support?

Single Catalog Use, cont’d  What is the best presentation format for a given use of the catalog? Simple display / browsing Simple display / browsing Searching / Filtering Searching / Filtering … are there other common science tasks?  Different uses may suggest or require different formats

Multi-Catalog Use  Need to understand what the fields mean so we can cross-correlate the tables. If correlations done by hand, is O(n 2 ) If correlations done by hand, is O(n 2 ) Just because it’s of the same unit doesn’t mean it’s directly comparable. Just because it’s of the same unit doesn’t mean it’s directly comparable. VOTable has ‘UCD+’, but may not be specific enough VOTable has ‘UCD+’, but may not be specific enough  Some concepts don’t translate well: Carrington Coordinates to Heliographic Carrington Coordinates to Heliographic Observations from well off the sun-earth line Observations from well off the sun-earth line Of items off the solar disk Of items off the solar disk

Multi-Catalog Use, cont’d  Ontologies More descriptive More descriptive Can define relationships between column types (eg, how to convert) Can define relationships between column types (eg, how to convert) Expensive to start Expensive to start VSTO & SWEET could serve as a foundation VSTO & SWEET could serve as a foundation SESDI already has prototypes SESDI already has prototypes Becomes an O(n) problem Becomes an O(n) problem Describe each catalog individually Describe each catalog individually Reasoner determines how to join them Reasoner determines how to join them

For the Future  Virtual Solar Observatory Ingesting solar-related catalogs so they can be served via an API for other projects Ingesting solar-related catalogs so they can be served via an API for other projects Being tested by HelioViewer Being tested by HelioViewer  Heliophysics Event List Manager Define API requirements for catalogs Define API requirements for catalogs Deal with cross-catalog issues Deal with cross-catalog issues

And now …  For those playing along at home : Look at the example image Look at the example image Try to find how many things you can find that might be a problem with the catalog Try to find how many things you can find that might be a problem with the catalog Go to the next slide for a list of (some) of the issues Go to the next slide for a list of (some) of the issues