Download presentation
Presentation is loading. Please wait.
Published byChristal Osborne Modified over 9 years ago
1
17 th October 2002Data Provenance Grid Data Requirements Scoping Metadata & Provenance Dave Pearson Oracle Corporation UK
2
17 th October 2002Data Provenance GRID Data Requirements Scoping Requirements Gathering –Establish need for database interoperability in the Grid –3 months exercise –Interviews and questionnaire –Deliverable as Scoping report Participants –UK e-Science communities and Grid projects –Astrophysics, Bioinformatics, Combinatorial Chemistry, Ecology, Engineering, Environmental Sciences, HEP, Neuroscience Findings –Widespread requirement for provenance information for Establishing reliability & quality of data Traceability raw data to publication Automated lab book Reproducing and recreating results Impact analysis –Few existing solutions
3
17 th October 2002Data Provenance Data Processing Processing Characteristics - -Well defined work flow - -Correction, calibration, transformation,filtering, merging - -Relatively static reference data - -Stable processing functions (audited changes) - -Periodic reprocessing from archive Instrument Raw Data Reference Data Multi-stage Processing Processed Data Archive In Silico
4
17 th October 2002Data Provenance Analysis and Interpretation Summarisation Processed Data Archive Summarised Data Analysis Characteristics - Variable workflow - Standard functions - Standard and personal filtering and summarisation - Retain drill down capability
5
17 th October 2002Data Provenance Analysis and Interpretation Analysis and Interpretation Characteristics - - Highly dynamic work flow - - Multiple data types - - Volatile data - - Annotations, inferences, conclusions - - Evidential reasoning - - Shared multiple versions of truth - - Periodic version consolidation Processed Data Result data Retrieval & Update Summarised Data Personalised DatabaseConclusions/Inferences - Descriptions - Trends - Correlations - Relationships
6
17 th October 2002Data Provenance Metadata Requirements Technical Metadata –Direct referencing - Physical location and data schema/structure –Data currency/status – version, time stamping –Accreditation/Access permissions - Ownership (Dublin Core) –Query time/Governance - data volume, no. of records, access paths Contextual Metadata –Logical referencing physical data – semantic/syntactic ontologies –Lexical translation – Thesaurus, ontological mapping –Named derivations (summarisations) Scope of Requirements –All science communities –Related to provenance
7
17 th October 2002Data Provenance Metadata Requirements Data Versioning –Distinguish latest/agreed version of data –Maintain history record of change –Synchronise and mirror replicated data –Distinguish shared personal interpretations and/or annotations Provenance –Record of data processing – calibration, filtering, transformation –Record of workflow – methods, standards and protocols –Reasoning – evidential justification for inferences & conclusions Scope of Requirements –All science communities –Includes Technical and Contextual Metadata
8
17 th October 2002Data Provenance Provenance Issues Schema evolution Granularity of record –Processed v Derived Inheritance Lack of structured annotations, ontologies Interactive analysis = dynamic workflow Multiple derived data sources Context of usage Best practice can change Multiple versions of the truth Evidential reasoning Existing data & applications Where is the provenance record stored
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.