Download presentation
Presentation is loading. Please wait.
Published byStephen Garrett Modified over 10 years ago
1
“Lineage/Provenance” Workgroup Report Birgitta, Amol, Ihab, Thomas, Anish, Martin, Matthias
2
Why lineage? Users – Want lineage – Trace results Huge data warehouses, complicated queries/views Understand processes/workflows (Biomedical databases, Genomics databases, etc.) Systems – Need lineage Closed & complete representation models – Typically: Boolean constraints among tuples
3
Lineage in Uncertain and Probabilistic Databases Closedness of operations Complete representation models Capture semantics of relational operations w/constraints on the data – Extensional semantics Identify “safe” plans – Intensional semantics Need to track constraints – Recursive vs. transitive Query processing issues/opportunities – Highly system-specific
4
Approximations Granularities of lineage – Schema-level, record-level, external Avoid expensive cases – E.g.: WHERE count(*)=3 (A & B & C) OR (B & C & D) OR …. Approximate lineage distributions for “expensive” predicates Convolution-like summary of the impact of input tuples into the output distribution
5
Uncertainty in the Lineage Itself Not sure where information comes from ( external source), attach confidences to lineage? Uncertainty in data integration “Probabilistic rules” Anonymize lineage/show multiple explanations Aggregate lineage/granularity
6
Privacy Issues May not be allowed to expose exact lineage Query lineage, explain lineage, or use summaries/approximations
7
Relation to Graphical Models Encoding issues – E.g., Bayesian Nets, additional CPT’s Qualitative issues – Changes in the uncertainty, inference – Exploit metadata/relationships between input variables – Updates in the lineage
8
Presentation of Lineage Navigate through different granularities Aggregate lineage/show summaries
9
Data Integration w/Lineage Patch lineage pointers? Identify regularities/common patterns in lineage to reduce uncertainty Detect dependencies among data items from different databases Supporting data mining tasks, lineage as additional metadata
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.