HUBBLE LEGACY ARCHIVE STSCI Astronomical Data Tagging Web 2.0 meets Astronomy in the HLA Niall I. Gaffney, W. Warren Miller (STScI)
HUBBLE LEGACY ARCHIVE STSCI What is the HLA Hubble Legacy Archive –Joint project STScI, ST-ECF, CADC –Providing best archive data products from HST data Improving WCS solutions Combine data Extracting image photometry and GRISM spectra Create Simple and Powerful User Interface –Typical HST archive user visits once a year –Get the right data into the users own environment Users want to use their daily applications (e.g. web) Users have their own data analysis system
HUBBLE LEGACY ARCHIVE STSCI HLA UI Philosophy UI “Requirements” from users –Interfaces must be simple, understandable, powerful, rich, self-explanatory “Google like” –Interface must feature the Data and not the Query –Interface must NOT get in the way of getting data and using them in the tools users are accustomed to –Interface should expose information that previous interfaces have not been able to
HUBBLE LEGACY ARCHIVE STSCI Early Data Release - Target Oriented
HUBBLE LEGACY ARCHIVE STSCI Who else does this…
HUBBLE LEGACY ARCHIVE STSCI What is Web 2.0 Web 2.0 is a change in how we use the network Web 2.0 is NOT dynamic web pages (AJAX) –Web 2.0 is enabled by AJAX Web 2.0 are applications and APIs delivered via the web –Netscape vs. Google –DoubleClick vs. AdSense –My Home Page vs. My Blog or MySpace A synergy between services and information to provide a more focused information service User aware and user provided (context) Tim O’Reilly article with long discussion
HUBBLE LEGACY ARCHIVE STSCI YouTube - Data and Tags
HUBBLE LEGACY ARCHIVE STSCI Where to get Tags for our Data Proposal data not enough (one target in a sea) Astronomers are few and busy –Its not “Browse or Perish”, “Publish or Perish”
HUBBLE LEGACY ARCHIVE STSCI What we did Use a “basic footprint” (aka cone search) with Simbad to identify objects within a given field –Not a true footprint as objects returned are all points Used Simbad to then get bibcodes for objects Used ADS to get keywords for each bibcode Harvested other data from HST proposal information (abstract, proposed targets…) Use Apache Lucene as our search engine Modified the Apache Lucene search demo
HUBBLE LEGACY ARCHIVE STSCI How well did this work 43% of the 2769 ACS WFC “visits” in the past 2 years 38% of “visits” are parallels (semi-random pointing) Average ~ 22 keywords per observation with keywords
HUBBLE LEGACY ARCHIVE STSCI DEMO
HUBBLE LEGACY ARCHIVE STSCI Where to go next Scientific input needed –Is More Like This useful or annoying scientifically more often than not? Can it be tweaked? Footprints and more Footprints –Intersection of observation footprints with object footprints improve tags (especially smaller fields) –Real time evaluation for cutouts and surveys (seconds not minutes) Standardize tags more –Case, spelling, removal of irrelevant words (e.g. “Galaxy Clusters General” -> “Galaxy Clusters”, “Colour” -> “Color”, “Charged Coupled Device” =>/dev/null)
HUBBLE LEGACY ARCHIVE STSCI AstroTube