1 Weighted-Tree Simplicity Algorithm for Similarity Matching of Partial Product Descriptions Lu Yang, Biplab Sarker, Virendra C. Bhavsar and Harold Boley IASSE Talk, July 21, 2005
2 Outline Buyer-Seller AgentMatcher in e-Markets Product Descriptions : Arc-Labeled, Arc-Weighted Trees Tree Similarity Tree Simplicity Computational Results Use Case Conclusion
3 Buyer-Seller AgentMatcher in e-Markets Main Server User Info User Profiles User Agents … … Agents … … e-Market To other sites (network) Web Browser User Matcher 1 Matcher n
4 Buyer-Seller AgentMatcher in e-Markets Seller RankSimilarity n Recommendations for a buyer Buyer 1Seller 1 Buyer 2 Seller m Seller 2 Matcher i Buyer n
5 Product Descriptions : Arc-Labeled, Arc-Weighted Trees 0.8 Car Ford Sedan 10000SmallTaurusV Category Year Make Engine Model Size Kms Buyer TreeSeller Tree 0.4 Car Ford Sedan SmallTaurus 2000 Category Year Make Model Size Kms Scenario: Used-car buying/selling Similarity = missing subtree Missing subtree represents partial product description
6 Tree Similarity Algorithm Recursively traverse (sub)trees top-down Compute similarity bottom-up S(T 1, T 2 ) = S i (w i + w' i )/2 When a subtree of a tree is missing in the other tree, compute the simplicity of the subtree
7 Tree simplicity measure computes the simplicity of a single (sub)tree Tree simplicity contributes to tree similarity Takes into account: Arc weights Out-going degree of non-leaf nodes (tree breadth) Depth of leaf nodes Wider and deeper trees lead to smaller tree simplicity values Tree Simplicity Algorithm (Cont’d)
8 : the simplicity value of a single tree T : depth degradation index : depth degradation factor d : depth of a leaf node m : root node degree of tree T that is not a leaf node w j : arc weight of the j th outgoing arc from the root node of tree T T j : subtree connected to the j th arc with arc weight w j Tree Simplicity Algorithm (Cont’d) if T is a leaf node, otherwise.
9 Tree Simplicity — Extreme Cases The simplest tree is a single-node tree with simplicity 1.0 The most complex tree is an infinite tree (our previous tree complexity measure made the tree complexity go toward infinite in this case) Š(T) = 0 or Š(T) = 0 or
10 Tree Simplicity — Example A B F b CDE c d e f g hij GHI J d DIDI Š(T) = (0.2· Š(T B ) +0.1· Š(T C ) + 0.3· Š(T D ) + 0.4· Š(T E ) ) Š(T) = Š(T E ) =(0.5 · Š(T I ) +0.5 · Š(T J ) ) Š(T B ) = (0.8 · Š(T F ) +0.1 · Š(T G ) · Š(T H ) ) = =
11 Tree Simplicity — Analysis of k-ary trees A A1A1 l1l1 1/k lklk AkAk k-ary tree k-ary trees: Each node (except the leaf nodes) has k children Each path from the root to a leaf node has the same length
12 Computational Results Simplicity as a function of depth d
13 Simplicity as a function of root-node degree k Computational Results
14 Simplicity as a function of root-node degree k and depth d Computational Results
15 Use Case — Learning Object Tree classification lom taxonpath 1.0 taxon Applied Sciences, Technology classification -set taxon- set general title language en Introduction To Oracle general- set keyword keyword -set database lifeCycle Contribute- set 1.0 educational language learning Resource Type 0.2 context 0.2 educational- set intende dEndUs erRole 0.2 typicalAge Range 0.2 course Learner lifeCycle- set contribute entity technical rights format technical- set 0.5 copyrightAnd Other Restrictions 0.5 rights- set descriptions 0.5 otherPlatform Requirements oracle 0.5 YesHTML Don’t care
16 Use Case — Learner Query Tree classification lom taxonpath 1.0 taxon Applied Sciences, Technology classification -set taxon- set lifeCycle Contribute- set 1.0 educational language learning Resource Type 0.2 context 0.2 educational- set intende dEndUs erRole 0.2 typicalAge Range 0.2 course Learner lifeCycle- set contribute entity technical rights format technical- set 0.5 copyrightAnd Other Restrictions 0.5 rights- set descriptions 0.5 otherPlatform Requirements NoHTML Similarity = Simplicity =
17 Conclusion Arc-labeled, arc-weighed tree representations for buyer-seller product descriptions Tree similarity measure for buyer-seller matching Allows partial product description Tree simplicity measure for partial product description matching Simpler missing (sub)trees lead to greater simplicity values and thus greater similarity values Applications eduSource: learner and learning object matching Teclantic: project profile matching
18 Thank you! Questions?