Presentation is loading. Please wait.

Presentation is loading. Please wait.

Capturing and Organizing Scientific Annotations

Similar presentations


Presentation on theme: "Capturing and Organizing Scientific Annotations"— Presentation transcript:

1 Capturing and Organizing Scientific Annotations
Greg Riccardi Florida State University Riccardi: Workshop on Data Management March 17, 2004

2 What is an Annotation? An assertion of a relationship among objects
Someone claims that several objects are connected by a relationship and gives evidence of the connection Includes record of author and date of assertion Objects are often datasets with provenance Annotations often assert quality characteristics of data objects Crucial social components Attribution, confidence, and validity Ontologies and compliance with standards Establishment of object naming strategy Security policies Riccardi: Workshop on Data Management March 17, 2004

3 Example from SkyServer
These object are the same Telescope and catalog info SkyQuery dataset SkyQuery dataset Analysis Query string Query string Riccardi: Workshop on Data Management March 17, 2004

4 Types and Importance of Annotations
Three types of annotations Systematic Semi-structured Ad Hoc Annotations are of primary importance in data semantics and analysis Record of semantics of data Record of peoples opinions about data We need tools to make annotations easy to create, organize, understand, and search Riccardi: Workshop on Data Management March 17, 2004

5 Systematic Annotations
Collected automatically Anticipated and organized Factual Experimental metadata See example of Jefferson Lab run log A run log entry asserts a relationship between the metadata and the raw data The run number identifies each object Rows in runBegin table, runEnd table, runFiles table, runComment table Object identification is much more difficult in most cases As noted in earlier talks Experimental metadata is not always collected or curated properly Riccardi: Workshop on Data Management March 17, 2004

6 Systematic Provenance Annotations
Derivation provenance Record of computational creation of data Must be collected by computations directly Query provenance In SkyQuery, user submits query and results dataset is retained in MyDB The query must be retained to record semantics of dataset GGF Database Access and Integration Working Group (DAIS) Deveoping standards for representing queries on databases and other data stores Provides a data access recipe that can be used to fetch a particular dataset Morphbank images of scanning electron micrographs Riccardi: Workshop on Data Management March 17, 2004

7 Semi-Structured Annotations
Anticipated and organized Collected mostly by hand Experimental logbook from Jefferson Lab Riccardi: Workshop on Data Management March 17, 2004

8 Jefferson Lab Logbook Run and log daily summaries
Standard logbook entry Many standard (expected) fields Comment field filled with ad hoc annotations “ADB crate” “voltage” Complaint about logbook usage Suggested strategy for creating logbook entries Automatically generated logbook entry Post processing software creates database entries directly Image tags point to files on some computer Riccardi: Workshop on Data Management March 17, 2004

9 Semi-Structured Annotations
Anticipated and organized Collected mostly by hand Experimental logbook from Jefferson Lab Logbook entry has specific fields Run id, subject, author, entry_type, system Entry has an ad hoc field Searching comment field requires interpretation of words [Ontologies?] Search page for log book Based on predefined structure Created and used by experts Riccardi: Workshop on Data Management March 17, 2004

10 Ad Hoc Annotations Asserts connection between arbitrary objects
Example from morphology Riccardi: Workshop on Data Management March 17, 2004

11 Morphology Publication Example
Riccardi: Workshop on Data Management March 17, 2004

12 Ad Hoc Annotations Asserts connection between arbitrary objects
Example from morphology Searching is difficult Ambiguous and inefficient Google is a search engine for ad hoc annotations Not based on organized ontology Not based on document structure Riccardi: Workshop on Data Management March 17, 2004

13 Annotating data quality
Suppose that someone finds error in a SkyQuery dataset Create an ad-hoc annotation “Objects X, Y, Z in data catalog D are incorrectly identified” Include annotation in any query? We don’t know how to carry quality annotations into the query results Riccardi: Workshop on Data Management March 17, 2004

14 Organizing Annotations
Need to find ways to structure ad hoc annotations When structure emerges, capture it Create specific schemas Create specific interfaces for collection, display and search Main goal is to make it easy enough for scientists They must see advantages to the extra work of structuring their thoughts and conforming to ontologies Riccardi: Workshop on Data Management March 17, 2004

15 Querying the Annotation Activity
Publish/Subscribe database strategies Publish the history of updates Subscribe to queries on the history Suppose you are the curator of a SkyQuery database Someone claims that the object catalog is wrong You should be informed Riccardi: Workshop on Data Management March 17, 2004

16 Example of Annotation Query
These object are the same Telescope and catalog info SkyQuery dataset SkyQuery dataset Analysis Curator Query string Query string Riccardi: Workshop on Data Management March 17, 2004

17 Challenges of Ad Hoc Annotations
Establishing globally unique, persistent data object names Optimizing searches Result semantics Ontologies Capturing structure of frequent annotation styles Providing user interfaces to define semi-structured annotations Riccardi: Workshop on Data Management March 17, 2004

18 Annotations Technology: SAM
Scientific Annotation Middleware Jim Myers and Al Geist EMSL Electronic Notebook Riccardi: Workshop on Data Management March 17, 2004

19 Annotations Technology: Amaya
Annotations of HTML and XML documents Project includes browser and document editor Text annotations attached to XHtml, XML, MathML and SVG Annotea collaborative annotation technology Riccardi: Workshop on Data Management March 17, 2004

20 References SkyQuery and SkyServer Jefferson Lab Logbooks
Jefferson Lab Logbooks Home page Today’s runs Today’s Logbook entries Run detail page Logbook entry Morphbank: Johan Liljeblad & Fredrik Ronquist Scientific Annotation Middleware W3C Amaya XML Annotation project Riccardi: Workshop on Data Management March 17, 2004


Download ppt "Capturing and Organizing Scientific Annotations"

Similar presentations


Ads by Google