Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching ER 2012 October 2012, Florence.

Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence Italy

Presentation Outline  Background Schema Matching  Schema Matching Evaluation Current model: Set based Precision and Recall Proposed Model: Similarity Spaces, a vector-space model Non-Binary measures  Usage example: Tuning schema matchers using Non-binary measure 2

Background Schema Matching  Schema matching is the task of providing correspondences between concepts describing the meaning of data  Schema matching is recognized to be a basic operation required by data integration and web-query interface integration 3

Background Schema Matching: Schemas  Schemas contain attributes  Each attributes may have a name, label, type, domain (allowed values), instances, etc.  Structural links and relationships are defined between attributes 4 Web-form SchemaSmall Business-Document Schema With 5 concepts

Background Schema Matching: First Line Matchers  First line matchers (a.k.a similarity measures) compare two schemas, generating correspondences between them  Each correspondence is assigned a confidence value over [0,1]  The results is often represented as a similarity matrix: 5 0.32 0.64 0.84 0.35 0.62

Background Schema Matching: Second Line Matchers  Second line matchers transform similarity matrices  Filters transform a matrix by removing those values which do not satisfy the constraint function. Examples: Threshold, MaxDelta. 6 Similarity Matrix Transformed Similarity Matrix

Background Schema Matching: Second Line Matchers  Second line matchers transform similarity matrices.  Filters transform a matrix by removing those values which do not satisfy the constraint function. Examples: Threshold, MaxDelta.  Decision makers transform a matrix by changing the values of some correspondence to 1 and the rest to 0. 7 Similarity Matrix Binary Similarity Matrix

 Schema matching systems employ various first and second line matchers; their results are composed, aggregated and combined. Background Schema Matching: Systems 8 String Matcher Domain Matcher Parent-Child Matcher Instance Matcher Aggregation Filter Decision maker Schema Pair Binary Similarity Matrix 1 st Line Matchers2 nd Line Matchers

Presentation Outline  Background Schema Matching  Schema Matching Evaluation Current model: Set based Precision and Recall Proposed Model: Similarity Spaces, a vector-space model Non-Binary measures  Usage example: Tuning schema matchers using Non-binary measure 9

Exact Match  Current evaluation model provides measures for evaluating a complete system using set-based measures  Major shortcoming: Evaluation of individual components (e.g. first line matchers) and non-binary results (uncertain schema matching systems) is undefined. Schema Matching Evaluation Current Model 10 String Matcher Domain Matcher Parent-Child Matcher Instance Matcher Aggregation Filter Decision maker Schema Pair Binary Similarity Matrix True Positive (TP) False Negative (FN) False Positive (FP) Precision = Recall=

Schema Matching Evaluation  Begin with a similarity matrix:  Taking each entry as an element in a vector transforms this matrix to a similarity vector: (0.84,0.29,0.34,0.32,1.00,0.33,0.32,0.33,0.35,0.30,0.30,0.64) 11 Similarity Spaces: A Vector Space Model S1S2S1S2 1 cardNum2 city3 arrivalDay4 checkInTime 1 clientNum0.840.32 0.30 2 city0.291.000.330.30 3 checkInDay0.340.330.350.64

Schema Matching Evaluation  We propose a Vector Space model for evaluation  Dimensions are possible correspondences between an attribute pair  Vectors are matching results 12 Similarity Spaces: A Vector Space Model S1S2S1S2 1 cardNum2 city3 arrivalDay4 checkInTime 1 clientNum1000 2 city0100 3 checkInDay0001 S1S2S1S2 1 cardNum2 city3 arrivalDay4 checkInTime 1 clientNum0.840.32 0.30 2 city0.291.000.330.30 3 checkInDay0.340.330.350.64

The Schema Matching Evaluation Problem 13 The Schema Matching Evaluation Problem K Informed? 012>2 Yes ? Non-Binary Binary Non-Binary Binary NoNon-Binary Binary Non-Binary Binary Non-Binary Binary  Area in green marks where most research has focused to date. Areas in Yellow designate limited work done.

Schema Matching Evaluation  Over this vector space, evaluation functions are defined:  For example, the well known precision and recall functions are functions of two vectors ( v = outcome of a decision maker. v e = exact match ) : Similarity Spaces: A Vector Space Model 14

Schema Matching Evaluation  Accommodating non-binary evaluation is now trivial: allow v to be non-binary Non-binary measures 15

 Implications: Schema Matching Evaluation Non-binary measures - Implications 16 String Matcher Domain Matcher Parent-Child Matcher Instance Matcher Aggregation Filter Decision maker Schema Pair Binary Similarity Matrix 1 st Line Matchers2 nd Line Matchers Can evaluate individual 1 st line matchers Can evaluate Interim Results Can evaluate Uncertain Results

Match Distance Sometimes you need a metric…  We define two complementary distance metrics: 17

Match Distance Behavior vs. NBPrecision and NBRecall  Results of synthetic evaluation 18

Match Distance Behavior vs. NBPrecision and NBRecall  Results of synthetic evaluation 19 Nonsense solution of increasing magnitudeNoisy solution with increasingly strict filter applied

Schema Matching Evaluation Predictors BackgroundModelEvaluation EXACT MATCH What if you don’t have an exact match? In most applications, this is the case…

Schema Matching Evaluation  Predictors are a special class of schema matching evaluation methods, which do not use an exact match as part of the input.  Predictors can be classified into two subclasses: Internalizers that refer to the internal structure of a vector as an indication of match quality (e.g. max, stdev, average) Idealizers assume the existence of an ideal vector and compare with it Predictors BackgroundModelEvaluation

Schema Matching Evaluation Idealizers BackgroundModelEvaluation (Ideal Vector)

Schema Matching Evaluation  Desired design properties: Tunable: We should be able to tune predictors towards the desired quality in a specific scenario. Generalizable: Predictors should be based upon principles which are applicable at several levels of granularity and can be specialized to some levels of granularity.  Desired empirical properties: Correlated: Well correlated with the quality they are designed to evaluate Robust: Correlations are robust and statistically significant over varied matching systems and datasets 23 Desired Properties of Predictors BackgroundModelEvaluation

Using Prediction Tunable Prediction Models 24  Loosely correlated predictors can be composed into a model.  The weights of its participating predictors can be tuned  Construction by (multiple) step-wise regression.

Using Prediction Tunable Prediction Models 25  Added bonus: Increased correlation

Schema Matching Evaluation  Consider the following example and how a matrix level vs. an attribute level predictor would fare in it 26 Why Granularity Matters Exact MatchMatcher Vectors

The Schema Matching Evaluation Problem 27 The Schema Matching Evaluation Problem K Informed? 012>2 Yes ? Non-Binary Binary Non-Binary Binary NoNon-Binary Binary Non-Binary Binary Non-Binary Binary  Area in green marks where most research has focused to date. Areas in Yellow designate limited work done.

Presentation Outline  Background Schema Matching  Schema Matching Evaluation Current model: Set based Precision and Recall Proposed Model: Similarity Spaces, a vector-space model Non-Binary measures  Usage examples Tuning schema matchers using Non-binary measure Weighting ensembles using attribute-level prediction 28

Usage Examples  First-line matcher named Term has tunable parameter label score weight (α) defining the relative weight of the term’s label and name  Tuning can be done via machine learning methods or statistical methods  All tuning methods benefit from: Smoothness: Gradual changes in α  gradual changes in measure Robustness: Observed behavior is robust w.r.t number of test cases Tuning scenario 29 Label Name: leaveSlice

Usage Example Smoothness 30  To use binary precision, a decision maker is required.  Introducing a decision maker causes random noise caused by arbitrary thresholds

Usage Example Robustness – effect of sample size 31  An outlier in schema pair no.2 causes Binary-precision (fig. (b)) to diverge greatly from the eventual observed behavior with 10 pairs.  Unperturbed by outliers, NBPrecision (fig. (a)) displays robust behavior (pairs)

Using Prediction Dynamic Prediction Models - Results 32

Conclusions  Non-binary measures are a useful addition to the schema matching evaluation tool-kit.  Non-binary evaluation presents desired characteristics in tuning scenarios (smoothness and robustness)  Using the similarity vector space model we can generate additional measures, breaking from traditional measures (e.g. binary precision and recall) to measures more attuned to modern schema matching needs. 33

Thank You 34 Questions?

Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching ER 2012 October 2012, Florence.

Similar presentations

Presentation on theme: "Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching ER 2012 October 2012, Florence."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching ER 2012 October 2012, Florence.

Similar presentations

Presentation on theme: "Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching ER 2012 October 2012, Florence."— Presentation transcript:

Similar presentations

About project

Feedback