Bob MannChicago Provenance Workshop Non-bio (necro-?) sciences (Jim Frew, Bob Mann) Examples of current practice and issues Examples of current practice.

Slides:



Advertisements
Similar presentations
IVOA, Pune India September Data Access Layer Working Group Pune Workshop Summary Doug Tody National Radio Astronomy Observatory International.
Advertisements

28 March 2003e-MapScholar: content management system The e-MapScholar Content Management System (CMS) David Medyckyj-Scott Project Director.
Chapter 3 Loaders and Linkers
MAST-VizieR/NED cross correlation tutorial 1. Introduction Figure 1: Screenshot of the MAST VizieR Catalog Search Form. or enter here as object class:
E-Science Data Information and Knowledge Transformation The BinX Language.
FMOS Observations and Data 14 January 2004 FMOS Science Workshop.
VBA Modules, Functions, Variables, and Constants
HAWCPol / SuperHAWC Software & Operations J. Dotson July 28, 2007.
SOFIA Archiving Requirements for the SOFIA Data Cycle System Mark Morris, UCLA Joe Mazzarella & Steve Lord, IPAC John Milburn & Jochen Horn, UCLA.
DCS Architecture Bob Krzaczek. Key Design Requirement Distilled from the DCS Mission statement and the results of the Conceptual Design Review (June 1999):
CIVIL RIGHTS DATA COLLECTION Workshop April 9, 2015 The Civil Rights Data Collection (CRDC) is a mandatory data collection of the U.S. Department of Education’s.
Introduction to Spitzer and some applications Data products Pipelines Preliminary work K. Nilsson, J.M. Castro Cerón, J.P.U. Fynbo, D.J. Watson, J. Hjorth.
Data Management: Documentation & Metadata Types of Documentation.
BinX and Astronomy Bob Mann Institute for Astronomy and National e-Science Centre.
18 April 2007 Second Generation VLT Instruments 1 VIRCAM & CPL: Lessons Learned Jim Lewis and Peter Bunclark Cambridge Astronomy Survey Unit.
Data provenance in astronomy Bob Mann Wide-Field Astronomy Unit University of Edinburgh
Microsoft Office Word 2013 Expert Microsoft Office Word 2013 Expert Courseware # 3251 Lesson 4: Working with Forms.
Different approaches to digital preservation Hilde van Wijngaarden Digital Preservation Officer Koninklijke Bibliotheek/ National Library of the Netherlands.
Introduction To Databases IDIA 618 Fall 2014 Bridget M. Blodgett.
Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.
Beyond Record Structures Dr. Robert J. Hammell Assistant Professor Towson University Computer and Information Sciences Department 8000 York Road - Suite.
Introduction to Sky Survey Problems Bob Mann. Introduction to sky survey database problems Astronomical data Astronomical databases –The Virtual Observatory.
S. Derriere et al., ESSW03 Budapest, 2003 May 20 UCDs - metadata for astronomy Sébastien Derriere François Ochsenbein Thomas Boch CDS, Observatoire astronomique.
Organizing Information Digitally Norm Friesen. Overview General properties of digital information Relational: tabular & linked Object-Oriented: inheritance.
Supported by the National Science Foundation’s Information Technology Research Program under Cooperative Agreement AST with The Johns Hopkins University.
What is Sure BDCs? BDC stands for Batch Data Communication and is also known as Batch Input. It is a technique for mass input of data into SAP by simulating.
6e-1 Science Data Products Daryl Swade DMS Systems Engineer S&OC System Design Review #1.
AON Data Questionnaire Results 21 Respondents Last Updated 27 March 2007 First AON PI Meeting Scot Loehrer, Jim Moore.
DateADASS How to Navigate VO Datasets Using VO Protocols Ray Plante (NCSA/UIUC), Thomas McGlynn and Eric Winter NASA/GSFC T HE US N ATIONAL V IRTUAL.
Functions and Demo of Astrogrid 1.1 China-VO Haijun Tian.
Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of.
Science with the Virtual Observatory Brian R. Kent NRAO.
F. Toussaint (WDCC, Hamburg) / / 1 CERA : Data Structure and User Interface Frank Toussaint Michael Lautenschlager World Data Center for Climate.
Scalable Metadata Definition Frameworks Raymond Plante NCSA/NVO Toward an International Virtual Observatory How do we encourage a smooth evolution of metadata.
F. Genova, Berlin 7, Paris, 2 December 2009 The astronomical information network.
WFCAM Science Archive Critical Design Review, April 2003 The SuperCOSMOS Science Archive (SSA) WFCAM Science Archive prototype Existing ad hoc flat file.
Professor Michael J. Losacco CIS 1110 – Using Computers Database Management Chapter 9.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Discussion - Survey Design Survey product equation: #fields = fld/nt x useable x (%xnights/yr) x years = 4 x 0.5 x (0.75 x 13 x 18) x 3 = 4 x 0.5 x 175.
Federation and Fusion of astronomical information Daniel Egret & Françoise Genova, CDS, Strasbourg Standards and tools for the Virtual Observatories.
Catalogues: Each 850  m field was analyzed with the 2-D CLUMPFIND identification algorithm (Williams, de Geus & Blitz 1994; ApJ, 428, 693) to produce.
Association techniques for the Virtual Observatory Bob Mann.
Label Design Tool Management Council F2F Washington, D.C. November 29-30, 2006
ITGS Databases.
Using HTML Textual and Structural Data for Web Image Search Cheng Thao, Ethan Munson, Jim Dabrowski, Nikolas D. Bohne University of Wisconsin-Milwaukee.
AstroGrid Solar/STP planning meeting Agenda: Helioscope Preparing for Solar-B Time-series viewing application IVOA and time series A PPARC funded project.
Making the Sky Searchable: Automatically Organizing the World’s Astronomical Data Sam Roweis, Dustin Lang &
1 Understanding Cataloging with DLESE Metadata Karon Kelly Katy Ginger Holly Devaul
Data Archives: Migration and Maintenance Douglas J. Mink Telescope Data Center Smithsonian Astrophysical Observatory NSF
Implementing an RDF Schema for Pathology Images, From the Association for Pathology Informatics Jules J. Berman, Ph.D., M.D. APIII, Pittsburgh, PA Monday,
Disk & File System Management Disk Allocation Free Space Management Directory Structure Naming Disk Scheduling Protection CSE 331 Operating Systems Design.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
Edinburgh e-Science MSc Bob Mann Institute for Astronomy & NeSC University of Edinburgh.
COS PIPELINE CDR Jim Rose July 23, 2001OPUS Science Data Processing Space Telescope Science Institute 1 of 12 Science Data Processing
Relational Databases: Basic Concepts BCHB Lecture 21 By Edwards & Li Slides:
12 Oct 2003VO Tutorial, ADASS Strasbourg, Data Access Layer (DAL) Tutorial Doug Tody, National Radio Astronomy Observatory T HE US N ATIONAL V IRTUAL.
Event and Feature Catalogs in the Virtual Solar Observatory Joseph A. Hourclé and the VSO Team SP54A-07 : 2008 May 30.
E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen
WFC3 PIPELINE CDR Jim Rose October 16, 2001OPUS Science Data Processing Space Telescope Science Institute 1 of 13 Science Data Processing
British Atmospheric Data Centre ( Searching: Whither NDG? Bryan Lawrence.
Documenting LabVIEW Data & Data Mining with LabVIEW and DIAdem Presentation with self paced training exercises.
Annotation of “special structures” in astronomy Bob Mann Institute for Astronomy and National e-Science Centre University of Edinburgh.
Metadata for the SKA - Niruj Mohan Ramanujam, NCRA.
University of Colorado at Denver and Health Sciences Center Department of Preventive Medicine and Biometrics Contact:
The INES Archive in the era of Virtual Observatories
What is FITS? FITS = Flexible Image Transport System
W. Christopher Lenhardt
INFO/CSE 100, Spring 2005 Fluency in Information Technology
Boyce Astro: Online Catalogs BRIEF Boyce Astro:
Presentation transcript:

Bob MannChicago Provenance Workshop Non-bio (necro-?) sciences (Jim Frew, Bob Mann) Examples of current practice and issues Examples of current practice and issues  Astronomy: Bob Mann, Alex Szalay  Earth Sciences: Jim Frew, Dave Maier  Others… Draw up list of issues Draw up list of issues Discussion Discussion

Some provenance & data derivation issues in astronomy Bob Mann Institute for Astronomy, Edinburgh Univ. & National e-Science Centre

Bob MannChicago Provenance Workshop Outline Trends in astronomy & implications for provenance Trends in astronomy & implications for provenance Two provenance issues Two provenance issues  Recording provenance in the FITS data format  Provenance in database federation Alex Szalay: Provenances in pipelines and databases Provenances in pipelines and databases Annotations in astronomy databases Annotations in astronomy databases

Bob MannChicago Provenance Workshop Evolution in astronomical practice “Collectivisation & the empowerment of the individual” “Collectivisation & the empowerment of the individual”  Fewer individual observational programmes and more sky surveys  More people access the data, via archives “The specialist is dead, long live the generalist” “The specialist is dead, long live the generalist”  Use multi-wavelength data  Expertise in classes of astronomical object, not observational techniques

Bob MannChicago Provenance Workshop Implications for provenance More science being done with data that the individual scientist didn’t take More science being done with data that the individual scientist didn’t take …& about which the scientist knows less More reliance on pipeline processing More reliance on pipeline processing More science with catalogues of source attributes derived from primary data More science with catalogues of source attributes derived from primary data More science being done through combining data from multiple sources – more later More science being done through combining data from multiple sources – more later

Bob MannChicago Provenance Workshop FITS: Flexible Image Transport System Format of a FITS file ( Format of a FITS file (  Primary Header: metadata describing instrument, observation & file contents  Primary Data Array: array of dimensions – usually a 2D image + none or more Extensions:  Array, ASCII Table or Binary Table, each with Header (New FITS-inspired XML format – VOTable)

Bob MannChicago Provenance Workshop FITS header entries Keyword-value pairs + optional comment Keyword-value pairs + optional comment e.g. PLTSCALE= '67.14 ' / [arcsec/mm] plate scale Three types of header keyword Three types of header keyword  Mandatory – e.g. NAXIS  Optional – e.g. DATAMAX  Additional – i.e. user-defined, but not from restricted list (mandatory + optional)

Bob MannChicago Provenance Workshop Provenance in FITS headers Many optional keywords related to provenance: Many optional keywords related to provenance:  ORIGIN, DATE-OBS, TELESCOP, INSTRUME, OBSERVER, REFERENC plus HISTORY – ` plus HISTORY – ` The text should contain a history of steps and procedures associated with the processing of the associated data. Any number of HISTORY card images may appear in a header.’ (FITS Standard)

Bob MannChicago Provenance Workshop Example FITS header extracts (1) SIMPLE = T / file does conform to FITS standard BITPIX = 32 / number of bits per data pixel NAXIS = 2 / number of data axes NAXIS1 = 648 / length of data axis 1 NAXIS2 = 648 / length of data axis 2 EXTEND = T / FITS dataset may contain extensions BUNIT = 'Primary Array' / Units of the image XPROC0 = 'evselect table=''product/P PNU002PIEVLI0000.FIT:EVENTS'' w&‘ CONTINUE 'ithfilteredset=no filteredset=''filtered.fits'' keepfilteroutput=no&‘ CONTINUE ' destruct=yes flagcolumn=''EVFLAG'' flagbit=-1 filtertype=''expres&‘ CONTINUE 'sion'' expression=''GTI(intermediate/GlobalHK-all-1-Attitude_GTI-X0&‘ CONTINUE ' fits, TIME) && GTI(intermediate/pnEvents-epn-1-EPIC_flare&‘ CONTINUE '_GTI-U fits:STDGTI, TIME) && (RAWY>12) && (PATTERN 12) && (PATTERN<=4) &’ CONTINUE ' (PI in (200:12000]) && (PI>=500 || (PI =500 || (PI<500 && FLAG & 0x8 == 0 && P&’ CONTINUE 'ATTERN==0)) && (FLAG & 0x2fa0024) == 0'' dssblock='''' writedss=yes&‘ CONTINUE ' cleandss=no updateexposure=yes filterexposure=yes blockstocopy=''&' CONTINUE ''' attributestocopy='''' energycolumn=''PHA'' withzcolumn=no zcolu&‘ … New Keyword Multi-line entry

Bob MannChicago Provenance Workshop Example FITS header extracts (2) XTENSION= 'IMAGE ' / Image extension BITPIX = 16 / Bits per pixel NAXIS = 2 / Number of axes … HISTORY This is the end of the header written by the ING observing-system. WAT0_001= 'system=image' WAT1_001= 'wtype=zpx axtype=ra projp1=1.0 projp3=220.0' WAT2_001= 'wtype=zpx axtype=dec projp1=1.0 projp3=220.0' … TRIM = 'Sep 2 16:14 Trim data section is [51:2098,1:4100]' BP-FLAG = 'Sep 2 16:14 Bad pixel file is /home/jrl/wfcred/stds/A bad' BT-FLAG = 'Sep 2 16:14 Overscan section is [1:50,1:4128] with mean= ' BI-FLAG = 'Sep 2 16:14 Zero level correction image is /data/cass03a/was/mframe‘ FF-FLAG = 'Sep 2 16:14 Flat field image is /data/cass03d/was/mframes/r_ ‘ ILLUMCOR= 'Sep 2 16:14 Illumination image is tmpill.pl with scale= ' … End of header entries generated at telescope Keywords describing data reduction process

Bob MannChicago Provenance Workshop Example FITS header extracts (3) SIMPLE = T / file does conform to FITS standard BITPIX = 16 / number of bits per data pixel … NHKLINES= 146 / Number of lines from house-keeping file HKLIN001= 'JOB.JOBNO UKJ349' / HKLIN002= 'JOB.DATE-MES 1998:09:29' / … HISTORY = 'SuperCOSMOS image analysis and mapping mode (IAM and MM)' / HISTORY = 'data written by xydcomp_ss.' / HISTORY = 'Any questions/comments/suggestions/bug reports should be sent' / HISTORY = 'to / House-keeping = provenance metadata

Bob MannChicago Provenance Workshop FITS provenance - summary Header keywords designed for recording provenance information – esp. HISTORY Header keywords designed for recording provenance information – esp. HISTORY HISTORY cards written in free text – not readily machine-interpretable HISTORY cards written in free text – not readily machine-interpretable Project-specific provenance keywords not readily interpretable at all outside project Project-specific provenance keywords not readily interpretable at all outside project

Bob MannChicago Provenance Workshop Provenance in database federation Sky survey databases in many wavebands Sky survey databases in many wavebands New science from federating them New science from federating them Need to associate entries in different DBs Need to associate entries in different DBs Unified Column Descriptors (UCDs): Unified Column Descriptors (UCDs):  Taxonomy based on collation of column names from hundreds of databases Location on sky provides natural indexing Location on sky provides natural indexing

Bob MannChicago Provenance Workshop Matching by proximity not always adequate Need to know more about astrophysical properties of two populations to know which of the red objects is the most likely counterpart to the cyan source

Bob MannChicago Provenance Workshop Recording association provenance Might want to record associations in DBs Might want to record associations in DBs Users want to know whether to trust them Users want to know whether to trust them Complex probabilistic association algorithms Complex probabilistic association algorithms  Difficult to describe easily Associations may change in light of new data Associations may change in light of new data  Can users challenge them via annotation?

Bob MannChicago Provenance Workshop Summary Astronomers record lots of provenance info Astronomers record lots of provenance info  Want machine-interpretability Some astronomical provenance is complex Some astronomical provenance is complex  Want means of describing algorithms Starting to get links between databases and online copies of scientific papers Starting to get links between databases and online copies of scientific papers No culture of annotation by users - yet No culture of annotation by users - yet

Bob MannChicago Provenance Workshop