Affinity-based Schema Matching Silvana Castano Università di Milano D2I –– Modena, 27 aprile 2001
Affinity relationships ySchema-level matching (no instances) yFind mappings between schema elements that correspond semantically to each other yAffinity mappings have an associated calculated degree of similarity yARTEMIS is the schema matching component of the MOMIS system which evaluates schema element (e.g., ODLI3 classes) affinity based on the following comparison features xclass names xclass attributes (name and domain) xclass references (name and domain)
Affinity Coefficients Name Affinity [0,1] Names are compared by exploiting knowledge provided by a reference ontology O (e.g., the Common Thesaurus) O xOntology O is organized according to given strengthened terminological relationships (e.g., synonymy, hypernymy). Given two names n and n’ an affinity function A(n,n’) [0,1] returns the strength of the path between them in O; they have affinity iff A(n,n’) is greater than
Name Affinity NA(School_Member, Professor) = 0.8 * 0.8 = 0.64 School_Member Professor Student University_Student Research_Staff CS_Person 0.8 Thesaurus Definition Determining the path with highest strength Computing Affinity coefficient NA(o,o’) = { otherwise0 1 * 2 * … * (m-1) o m o’ and 1 * 2 *…* (m-1) With o m o’ path of length m, m 1, with highest strength i strength of i-th terminological relationship in o m o’
Affinity Coefficients Structural Affinity [0,1] yDomains are compared based on compatibility relationships (e.g., validated attributes in the Common Thesaurus) yThe coefficient is proportional to the number of matching attributes and referenced classes Global Affinity [0,1] yComprehensive affinity value, weighted sum of the two previous coefficients
Semantic Correspondences Validity Check Thesaurus Domain Compatibility Structural Affinity (Interactive validation) School_Member - name (string) - faculty (string) Professor - title (string) - first_name (string) - last_name (string) SA(School_Member, Professor) = 0.25 ODL I 3 Classes first_name last_name name
Affinity-based Clustering yHierachical clustering techniques are employed to identify all schema elements candidates to integration based on evaluated affinities. yClusters of candidates are selected interactively based on affinity thresholds yClusters selection occurs in a way that schema elements inside a cluster are characterized by high values of affinity with other elements of the cluster and by lower levels of affinity with elements outside.
Affinity Tree: Candidate Clusters Candidate cluster choice Threshold choice (es. 0,6) Professor Student CS_Person CourseSection Room School_Memeber University_Student Research_Staff Location OfficeDepartment
Ongoing work and open issues yApplication of the affinity/clustering schema matching techniques to integration of XML datasources has been started by Milano and first results have been published [IWASS00,Retis01,DS-901]. Ongoing work is on XML data reconciliation by providing a further ontological layer on top of integrated representations. yAffinity and clustering are actually performed based on intensional inter-schema properties yExtension of affinity and clustering techniques to consider also extensional inter-schema properties will be performed in collaboration with Modena and Reggio-Emilia.