Download presentation
Presentation is loading. Please wait.
1
Web Couple: Coupling web information
Sourav Bhwomick Center of Advanced Information Systems 1/2/2019
2
Why web coupling? Related information in the web is supplied by different information provider. Web documents containing similar information can reside in different web tables in Web Database. 7 1/2/2019
3
Why web coupling? Directly querying the WWW to gather these information is an expensive and repetitive affair since these information are already materialized in different web tables in the web database. There should be a mean to gather these similar information by additional manipulation of the materialized web tables. 1/2/2019
4
Why web coupling? The web couple operator gives us the capability to manipulate these web tables to harness useful related information. 1/2/2019
5
Web Couple Operator Web couple operator is a composite operator.
It is a combination of Web Cartesian Product followed by Web Select. In our web database Web cartesian product followed by a web select is a frequently used operation. This motivates us to create a separate composite operator to handle this. 1/2/2019
6
Some notations Let W be a web table with schema
Let p be a predicate of P such that: is the argument of the predicate; is an attribute; is the operator in the predicate; 1/2/2019
7
Some notations Val(p) is the operand of the op(p). 1/2/2019
8
Web Couple operator Web couple gathers similar web documents or information from two web tables. Two web tuples and can be coupled if there exist atleast one pair of nodes from and which contains similar information. 1/2/2019
9
Web Couple operator The web couple operator is basically a web cartesian product followed by web select: We denote web couple by the symbol: 1/2/2019
10
Definitions Coupling Nodes: We define coupling nodes as node variables participating in the web coupling. We express the coupling nodes of two web schemas as a pair i.e (c, z) since they cannot exist as single node variable. 1/2/2019
11
Definitions One coupling node variable can be in more than one pair. That is a set of pair of coupling nodes are not disjoint. The attribute of the coupling node as defined in the predicate of the node is called coupling attribute. The predicate is called the coupling predicate. 1/2/2019
12
Web Coupling 1/2/2019
13
Types of web coupling Single node coupling : Web coupling when only one node variable in the each schema are involved. Multinode coupling: When more than one node variables in each schemas participate in the web coupling. 1/2/2019
14
Types of web coupling System driven web coupling: In this case the system to decide which are the node variables to be coupled (coupling nodes). If atleast a pair of coupling nodes cannot be identified then the web tables cannot be coupled. 1/2/2019
15
System driven web coupling
COUPLE TABLE3 FROM TABLE1 AND TABLE 2 FROM TABLE1 AND TABLE2 AT SCHEMA/TUPLE 1/2/2019
16
Types of web coupling User driven web coupling: In this case the user decides which are the node variables to be coupled (coupling nodes). Coupling is performed only on those user specified node variable(s). 1/2/2019
17
User driven web coupling
COUPLE TABLE3 FROM TABLE1 AND TABLE 2 ON NODES (x.TABLE1 , y.TABLE2) AT SCHEMA/TUPLE 1/2/2019
18
Types of web coupling Attribute driven web coupling: In this case the user specifies the coupling attributes. Coupling is performed only on those user specified coupling attribute(s). 1/2/2019
19
Attribute driven web coupling
COUPLE TABLE3 FROM TABLE1 AND TABLE 2 ON ATTRIBUTE “TEXT” AT SCHEMA/TUPLE(optional) 1/2/2019
20
Types of web coupling Value driven web coupling: In this case the user specifies the values of the attributes of the nodes on which coupling should be performed. Coupling is performed only on those user specified attribute values. 1/2/2019
21
Value driven web coupling
COUPLE TABLE3 FROM TABLE1 AND TABLE 2 ON VALUE “Software Agents” AT SCHEMA/TUPLE(optional) 1/2/2019
22
Levels of web coupling Schema level web coupling.
Tuple level web coupling. 1/2/2019
23
Schema level web coupling
We inspect the schemas to decide whether the two web tables can be coupled. If coupling conditions cannot be identified then the two web tables cannot be coupled. We do not inspect the web tuples in the web table. 1/2/2019
24
Schema level web coupling
Let n and m be the number of web tuples of the two input web tables. Then the coupled web table based on schema level web coupling will always have n*m web tuples. 1/2/2019
25
Tuple level web coupling
We inspect the web tuples of the two input web tables to identify nodes with similar information. The number of web tuples in the coupled web table <=n*m 1/2/2019
26
Why two levels? A schema does not capture all the information of the web documents in a web table. Thus it is not always possible to identify coupling condition by inspecting the schemas. It is possible to find existence of coupling nodes which are not defined in the schemas. 1/2/2019
27
Why two levels? Tuple level coupling gives us a mean to correlate web documents containing similar information from the web tables (that cannot be identified from their schemas) at the expense of additional processing. 1/2/2019
28
Conditions for web coupling
The coupling nodes are and 1/2/2019
29
Conditions for web coupling
The coupling nodes are and 1/2/2019
30
Conditions for web coupling
The coupling nodes are and 1/2/2019
31
Conditions for web coupling
The coupling nodes are and 1/2/2019
32
Conditions for web coupling
The coupling nodes are and 1/2/2019
33
Conditions for web coupling
The coupling nodes are and 1/2/2019
34
Conditions for web coupling
The coupling nodes are and For example: computer.html 1/2/2019
35
Conditions for web coupling
The coupling nodes are and 1/2/2019
36
Conditions for web coupling
URLs with same directory name such as “/computer/” may contain similar information. Paths with “/cgi-bin/” are not considered. Include all conditions for web join. 1/2/2019
37
Construction of coupled schema (schema level)
When atleast a pair of coupling nodes are identical (same url). When none of the pair are identical. 1/2/2019
38
Case 1 In case there exist at least one pair of coupling nodes which are identical to one another then we construct the coupled schema as discussed in web join paper. 1/2/2019
39
Case 2 1/2/2019
40
Coupling Strength Measures degree of similarity between two coupling nodes. Hot tuple: A tuple is considered hot if where is the coupling strength of tuple and is called the hotness threhold. 1/2/2019
41
Coupling Strength Hot tuples refer to tuples with high degree of similarity between the coupling nodes. Hot table factor: is the ratio of number of hot tuples to the total number of tuples in the web table. 1/2/2019
42
Coupling Strength Ranking based on coupling strength helps the user to view the tuples containing high degree of similar information (hot) earlier since all hot tuples are ranked higher than other tuples. We can view the hot tuples without scanning the whole table. 1/2/2019
43
Coupling Ratio Coupling ratio denoted by is:
where is the number of pair of coupling nodes and total number of possible pair of nodes in the web tuple. 1/2/2019
44
Coupling Ratio Higher coupling ratio signifies that tuples participating in the coupling contains high degree of similar information. 1/2/2019
45
Issues Construction of coupled schema at the tuple level.
How to calculate the coupling strength? What is the ranking function? Algorithm for ranking coupled tuples. Properties of web couple operator. Difference between web couple and web join. 1/2/2019
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.