Web Couple: Coupling web information Sourav Bhwomick Center of Advanced Information Systems 1/2/2019
Why web coupling? Related information in the web is supplied by different information provider. Web documents containing similar information can reside in different web tables in Web Database. 7 1/2/2019
Why web coupling? Directly querying the WWW to gather these information is an expensive and repetitive affair since these information are already materialized in different web tables in the web database. There should be a mean to gather these similar information by additional manipulation of the materialized web tables. 1/2/2019
Why web coupling? The web couple operator gives us the capability to manipulate these web tables to harness useful related information. 1/2/2019
Web Couple Operator Web couple operator is a composite operator. It is a combination of Web Cartesian Product followed by Web Select. In our web database Web cartesian product followed by a web select is a frequently used operation. This motivates us to create a separate composite operator to handle this. 1/2/2019
Some notations Let W be a web table with schema Let p be a predicate of P such that: is the argument of the predicate; is an attribute; is the operator in the predicate; 1/2/2019
Some notations Val(p) is the operand of the op(p). 1/2/2019
Web Couple operator Web couple gathers similar web documents or information from two web tables. Two web tuples and can be coupled if there exist atleast one pair of nodes from and which contains similar information. 1/2/2019
Web Couple operator The web couple operator is basically a web cartesian product followed by web select: We denote web couple by the symbol: 1/2/2019
Definitions Coupling Nodes: We define coupling nodes as node variables participating in the web coupling. We express the coupling nodes of two web schemas as a pair i.e (c, z) since they cannot exist as single node variable. 1/2/2019
Definitions One coupling node variable can be in more than one pair. That is a set of pair of coupling nodes are not disjoint. The attribute of the coupling node as defined in the predicate of the node is called coupling attribute. The predicate is called the coupling predicate. 1/2/2019
Web Coupling 1/2/2019
Types of web coupling Single node coupling : Web coupling when only one node variable in the each schema are involved. Multinode coupling: When more than one node variables in each schemas participate in the web coupling. 1/2/2019
Types of web coupling System driven web coupling: In this case the system to decide which are the node variables to be coupled (coupling nodes). If atleast a pair of coupling nodes cannot be identified then the web tables cannot be coupled. 1/2/2019
System driven web coupling COUPLE TABLE3 FROM TABLE1 AND TABLE 2 FROM TABLE1 AND TABLE2 AT SCHEMA/TUPLE 1/2/2019
Types of web coupling User driven web coupling: In this case the user decides which are the node variables to be coupled (coupling nodes). Coupling is performed only on those user specified node variable(s). 1/2/2019
User driven web coupling COUPLE TABLE3 FROM TABLE1 AND TABLE 2 ON NODES (x.TABLE1 , y.TABLE2) AT SCHEMA/TUPLE 1/2/2019
Types of web coupling Attribute driven web coupling: In this case the user specifies the coupling attributes. Coupling is performed only on those user specified coupling attribute(s). 1/2/2019
Attribute driven web coupling COUPLE TABLE3 FROM TABLE1 AND TABLE 2 ON ATTRIBUTE “TEXT” AT SCHEMA/TUPLE(optional) 1/2/2019
Types of web coupling Value driven web coupling: In this case the user specifies the values of the attributes of the nodes on which coupling should be performed. Coupling is performed only on those user specified attribute values. 1/2/2019
Value driven web coupling COUPLE TABLE3 FROM TABLE1 AND TABLE 2 ON VALUE “Software Agents” AT SCHEMA/TUPLE(optional) 1/2/2019
Levels of web coupling Schema level web coupling. Tuple level web coupling. 1/2/2019
Schema level web coupling We inspect the schemas to decide whether the two web tables can be coupled. If coupling conditions cannot be identified then the two web tables cannot be coupled. We do not inspect the web tuples in the web table. 1/2/2019
Schema level web coupling Let n and m be the number of web tuples of the two input web tables. Then the coupled web table based on schema level web coupling will always have n*m web tuples. 1/2/2019
Tuple level web coupling We inspect the web tuples of the two input web tables to identify nodes with similar information. The number of web tuples in the coupled web table <=n*m 1/2/2019
Why two levels? A schema does not capture all the information of the web documents in a web table. Thus it is not always possible to identify coupling condition by inspecting the schemas. It is possible to find existence of coupling nodes which are not defined in the schemas. 1/2/2019
Why two levels? Tuple level coupling gives us a mean to correlate web documents containing similar information from the web tables (that cannot be identified from their schemas) at the expense of additional processing. 1/2/2019
Conditions for web coupling The coupling nodes are and 1/2/2019
Conditions for web coupling The coupling nodes are and 1/2/2019
Conditions for web coupling The coupling nodes are and 1/2/2019
Conditions for web coupling The coupling nodes are and 1/2/2019
Conditions for web coupling The coupling nodes are and 1/2/2019
Conditions for web coupling The coupling nodes are and 1/2/2019
Conditions for web coupling The coupling nodes are and For example: computer.html 1/2/2019
Conditions for web coupling The coupling nodes are and 1/2/2019
Conditions for web coupling URLs with same directory name such as “/computer/” may contain similar information. Paths with “/cgi-bin/” are not considered. Include all conditions for web join. 1/2/2019
Construction of coupled schema (schema level) When atleast a pair of coupling nodes are identical (same url). When none of the pair are identical. 1/2/2019
Case 1 In case there exist at least one pair of coupling nodes which are identical to one another then we construct the coupled schema as discussed in web join paper. 1/2/2019
Case 2 1/2/2019
Coupling Strength Measures degree of similarity between two coupling nodes. Hot tuple: A tuple is considered hot if where is the coupling strength of tuple and is called the hotness threhold. 1/2/2019
Coupling Strength Hot tuples refer to tuples with high degree of similarity between the coupling nodes. Hot table factor: is the ratio of number of hot tuples to the total number of tuples in the web table. 1/2/2019
Coupling Strength Ranking based on coupling strength helps the user to view the tuples containing high degree of similar information (hot) earlier since all hot tuples are ranked higher than other tuples. We can view the hot tuples without scanning the whole table. 1/2/2019
Coupling Ratio Coupling ratio denoted by is: where is the number of pair of coupling nodes and total number of possible pair of nodes in the web tuple. 1/2/2019
Coupling Ratio Higher coupling ratio signifies that tuples participating in the coupling contains high degree of similar information. 1/2/2019
Issues Construction of coupled schema at the tuple level. How to calculate the coupling strength? What is the ranking function? Algorithm for ranking coupled tuples. Properties of web couple operator. Difference between web couple and web join. 1/2/2019