Presentation is loading. Please wait.

Presentation is loading. Please wait.

17 th October 2002Data Provenance Grid Data Requirements Scoping Metadata & Provenance Dave Pearson Oracle Corporation UK.

Similar presentations


Presentation on theme: "17 th October 2002Data Provenance Grid Data Requirements Scoping Metadata & Provenance Dave Pearson Oracle Corporation UK."— Presentation transcript:

1 17 th October 2002Data Provenance Grid Data Requirements Scoping Metadata & Provenance Dave Pearson Oracle Corporation UK

2 17 th October 2002Data Provenance GRID Data Requirements Scoping Requirements Gathering –Establish need for database interoperability in the Grid –3 months exercise –Interviews and questionnaire –Deliverable as Scoping report Participants –UK e-Science communities and Grid projects –Astrophysics, Bioinformatics, Combinatorial Chemistry, Ecology, Engineering, Environmental Sciences, HEP, Neuroscience Findings –Widespread requirement for provenance information for Establishing reliability & quality of data Traceability raw data to publication Automated lab book Reproducing and recreating results Impact analysis –Few existing solutions

3 17 th October 2002Data Provenance Data Processing Processing Characteristics - -Well defined work flow - -Correction, calibration, transformation,filtering, merging - -Relatively static reference data - -Stable processing functions (audited changes) - -Periodic reprocessing from archive Instrument Raw Data Reference Data Multi-stage Processing Processed Data Archive In Silico

4 17 th October 2002Data Provenance Analysis and Interpretation Summarisation Processed Data Archive Summarised Data Analysis Characteristics - Variable workflow - Standard functions - Standard and personal filtering and summarisation - Retain drill down capability

5 17 th October 2002Data Provenance Analysis and Interpretation Analysis and Interpretation Characteristics - - Highly dynamic work flow - - Multiple data types - - Volatile data - - Annotations, inferences, conclusions - - Evidential reasoning - - Shared multiple versions of truth - - Periodic version consolidation Processed Data Result data Retrieval & Update Summarised Data Personalised DatabaseConclusions/Inferences - Descriptions - Trends - Correlations - Relationships

6 17 th October 2002Data Provenance Metadata Requirements Technical Metadata –Direct referencing - Physical location and data schema/structure –Data currency/status – version, time stamping –Accreditation/Access permissions - Ownership (Dublin Core) –Query time/Governance - data volume, no. of records, access paths Contextual Metadata –Logical referencing physical data – semantic/syntactic ontologies –Lexical translation – Thesaurus, ontological mapping –Named derivations (summarisations) Scope of Requirements –All science communities –Related to provenance

7 17 th October 2002Data Provenance Metadata Requirements Data Versioning –Distinguish latest/agreed version of data –Maintain history record of change –Synchronise and mirror replicated data –Distinguish shared personal interpretations and/or annotations Provenance –Record of data processing – calibration, filtering, transformation –Record of workflow – methods, standards and protocols –Reasoning – evidential justification for inferences & conclusions Scope of Requirements –All science communities –Includes Technical and Contextual Metadata

8 17 th October 2002Data Provenance Provenance Issues Schema evolution Granularity of record –Processed v Derived Inheritance Lack of structured annotations, ontologies Interactive analysis = dynamic workflow Multiple derived data sources Context of usage Best practice can change Multiple versions of the truth Evidential reasoning Existing data & applications Where is the provenance record stored


Download ppt "17 th October 2002Data Provenance Grid Data Requirements Scoping Metadata & Provenance Dave Pearson Oracle Corporation UK."

Similar presentations


Ads by Google