Download presentation
Presentation is loading. Please wait.
Published byNorma Joseph Modified over 9 years ago
1
Schema Summarization cong Yu Department of EECS University of Michigan congy@eecs.umich.edu H. V. Jagadish Department of EECS University of Michigan jag@eecs.umich.edu VLDB 2006
2
Example :ER Schema
3
Example: Hierarchical and Relational schema Schema element attribute structural link value links SetOf type
4
Introduction Consider both schema structure and data distribution define a schema summary. introduce summary importance and summary coverage as desirable properties for a good schema summary. develop algorithms to automatically generate schema summaries based on these two goals. propose a Query Discovery Cost for objectively evaluating the quality of schema summary Endeavors and contributions in this paper
5
Example: Schema Summary
6
Full schema summary Expanded schema summary Abstract link Value link Abstract element
7
Schema Summary Definition 1: Schema Definition 2: Schema Summary A schema is defined to be a labelled directed graph, SG = A summary of schema SG = is defined to be a labelled directed graph, SS = r’ = r; E’ ⊆ E, S’ ⊆ S, V’ ⊆ V; M is a finite set of correspondences between original schema elements and abstract elements, each (e e’ ) ∈ M, where e ∈ E and e’ ∈ AE, means that e is directly or indirectly represented by e’ in the summary; AE is a finite set of abstract elements; AL is a finite set of abstract links;
8
Schema Summary Quality Summary Importance : selected elements should be more representative of the schema; (connectivity&&cardinality) Summary Coverage : selected elements should be broadly distributed throughout the schema (closeness).
9
Summary Element Importance Formula 1 (Schema Element Importance). the relative cardinality RC(e1 → e2) is calculated from the database as the average number of e2 data nodes connected to each e1 data node. r is the number of iterations. p is a tuning parameter called neighborhood factor. is set to the cardinality of the element in the database.
10
Summary Importance Definition 3 (Summary Importance). the denominator of the fraction for importance is really the sum of all element cardinalities in the database.
11
Summary Element Coverage Formula 2 (Schema Element Affinity). Formula 3 (Summary Element Coverage).
12
Summary Coverage Definition 4 :Summary Coverage the denominator equals to the total cardinality of all the schema elements.
13
Efficient Schema Summarization Annotating Schema Graph Maximizing Summary Importance Maximizing Summary Coverage Balancing Importance and Coverage Output: SG Output: E, the set of K most important elements Output : E, the set of K elements with highest summary coverage among mutually non-dominant elements; DS, the set of element pairs with dominance relationship. Output: E, the set of K elements for the balanced summary.
14
EVALUATING SCHEMA SUMMARY Datasets A suitable dataset must: 1)come with both data and schema information; 2)be complex enough for its summaries to be meaningful, and 3)have a set of known queries associated with it. we chose two benchmarks, one XML and one relational, and one real dataset 1) XMark ( XML benchmark dataset ) 2) TPC-H (relational benchmark dataset ) 3) MiMI (real word scientific dataset )
15
EVALUATING SCHEMA SUMMARY comparison with Expert Summaries Agreement between automatic summaries and expert summaries on XMark and MiMI datasets. User agreement measures the percentage of elements all three experts agree on.
16
EVALUATING SCHEMA SUMMARY Query Discovery Cost Query Discovery: models how an ordinary user, defined as one with little schema knowledge about the database, formulates a concrete query specification given the information need she has in mind.
17
EVALUATING SCHEMA SUMMARY Summary Benefits for Query Discovery: Impact of Summary Size
18
EVALUATING SCHEMA SUMMARY Summary Benefits for Query Discovery: Impact of Schema Structure and Data Distribution
19
EVALUATING SCHEMA SUMMARY Summary Benefits for Query Discovery: Impact of Balancing Importance and Coverage
20
EVALUATING SCHEMA SUMMARY Summary Benefits for Query Discovery: Impact of Data Evolution Agreement between summaries on different versions of MiMI dataset. Current version is Jan 2006.
21
EVALUATING SCHEMA SUMMARY Summary Benefits for Query Discovery: Comparison with ER Model Abstraction Comparing against ER model abstraction techniques on MiMI. Summaries are of size 10.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.