Presentation is loading. Please wait.

Presentation is loading. Please wait.

Web Couple: Coupling web information

Similar presentations


Presentation on theme: "Web Couple: Coupling web information"— Presentation transcript:

1 Web Couple: Coupling web information
Sourav Bhwomick Center of Advanced Information Systems 1/2/2019

2 Why web coupling? Related information in the web is supplied by different information provider. Web documents containing similar information can reside in different web tables in Web Database. 7 1/2/2019

3 Why web coupling? Directly querying the WWW to gather these information is an expensive and repetitive affair since these information are already materialized in different web tables in the web database. There should be a mean to gather these similar information by additional manipulation of the materialized web tables. 1/2/2019

4 Why web coupling? The web couple operator gives us the capability to manipulate these web tables to harness useful related information. 1/2/2019

5 Web Couple Operator Web couple operator is a composite operator.
It is a combination of Web Cartesian Product followed by Web Select. In our web database Web cartesian product followed by a web select is a frequently used operation. This motivates us to create a separate composite operator to handle this. 1/2/2019

6 Some notations Let W be a web table with schema
Let p be a predicate of P such that: is the argument of the predicate; is an attribute; is the operator in the predicate; 1/2/2019

7 Some notations Val(p) is the operand of the op(p). 1/2/2019

8 Web Couple operator Web couple gathers similar web documents or information from two web tables. Two web tuples and can be coupled if there exist atleast one pair of nodes from and which contains similar information. 1/2/2019

9 Web Couple operator The web couple operator is basically a web cartesian product followed by web select: We denote web couple by the symbol: 1/2/2019

10 Definitions Coupling Nodes: We define coupling nodes as node variables participating in the web coupling. We express the coupling nodes of two web schemas as a pair i.e (c, z) since they cannot exist as single node variable. 1/2/2019

11 Definitions One coupling node variable can be in more than one pair. That is a set of pair of coupling nodes are not disjoint. The attribute of the coupling node as defined in the predicate of the node is called coupling attribute. The predicate is called the coupling predicate. 1/2/2019

12 Web Coupling 1/2/2019

13 Types of web coupling Single node coupling : Web coupling when only one node variable in the each schema are involved. Multinode coupling: When more than one node variables in each schemas participate in the web coupling. 1/2/2019

14 Types of web coupling System driven web coupling: In this case the system to decide which are the node variables to be coupled (coupling nodes). If atleast a pair of coupling nodes cannot be identified then the web tables cannot be coupled. 1/2/2019

15 System driven web coupling
COUPLE TABLE3 FROM TABLE1 AND TABLE 2 FROM TABLE1 AND TABLE2 AT SCHEMA/TUPLE 1/2/2019

16 Types of web coupling User driven web coupling: In this case the user decides which are the node variables to be coupled (coupling nodes). Coupling is performed only on those user specified node variable(s). 1/2/2019

17 User driven web coupling
COUPLE TABLE3 FROM TABLE1 AND TABLE 2 ON NODES (x.TABLE1 , y.TABLE2) AT SCHEMA/TUPLE 1/2/2019

18 Types of web coupling Attribute driven web coupling: In this case the user specifies the coupling attributes. Coupling is performed only on those user specified coupling attribute(s). 1/2/2019

19 Attribute driven web coupling
COUPLE TABLE3 FROM TABLE1 AND TABLE 2 ON ATTRIBUTE “TEXT” AT SCHEMA/TUPLE(optional) 1/2/2019

20 Types of web coupling Value driven web coupling: In this case the user specifies the values of the attributes of the nodes on which coupling should be performed. Coupling is performed only on those user specified attribute values. 1/2/2019

21 Value driven web coupling
COUPLE TABLE3 FROM TABLE1 AND TABLE 2 ON VALUE “Software Agents” AT SCHEMA/TUPLE(optional) 1/2/2019

22 Levels of web coupling Schema level web coupling.
Tuple level web coupling. 1/2/2019

23 Schema level web coupling
We inspect the schemas to decide whether the two web tables can be coupled. If coupling conditions cannot be identified then the two web tables cannot be coupled. We do not inspect the web tuples in the web table. 1/2/2019

24 Schema level web coupling
Let n and m be the number of web tuples of the two input web tables. Then the coupled web table based on schema level web coupling will always have n*m web tuples. 1/2/2019

25 Tuple level web coupling
We inspect the web tuples of the two input web tables to identify nodes with similar information. The number of web tuples in the coupled web table <=n*m 1/2/2019

26 Why two levels? A schema does not capture all the information of the web documents in a web table. Thus it is not always possible to identify coupling condition by inspecting the schemas. It is possible to find existence of coupling nodes which are not defined in the schemas. 1/2/2019

27 Why two levels? Tuple level coupling gives us a mean to correlate web documents containing similar information from the web tables (that cannot be identified from their schemas) at the expense of additional processing. 1/2/2019

28 Conditions for web coupling
The coupling nodes are and 1/2/2019

29 Conditions for web coupling
The coupling nodes are and 1/2/2019

30 Conditions for web coupling
The coupling nodes are and 1/2/2019

31 Conditions for web coupling
The coupling nodes are and 1/2/2019

32 Conditions for web coupling
The coupling nodes are and 1/2/2019

33 Conditions for web coupling
The coupling nodes are and 1/2/2019

34 Conditions for web coupling
The coupling nodes are and For example: computer.html 1/2/2019

35 Conditions for web coupling
The coupling nodes are and 1/2/2019

36 Conditions for web coupling
URLs with same directory name such as “/computer/” may contain similar information. Paths with “/cgi-bin/” are not considered. Include all conditions for web join. 1/2/2019

37 Construction of coupled schema (schema level)
When atleast a pair of coupling nodes are identical (same url). When none of the pair are identical. 1/2/2019

38 Case 1 In case there exist at least one pair of coupling nodes which are identical to one another then we construct the coupled schema as discussed in web join paper. 1/2/2019

39 Case 2 1/2/2019

40 Coupling Strength Measures degree of similarity between two coupling nodes. Hot tuple: A tuple is considered hot if where is the coupling strength of tuple and is called the hotness threhold. 1/2/2019

41 Coupling Strength Hot tuples refer to tuples with high degree of similarity between the coupling nodes. Hot table factor: is the ratio of number of hot tuples to the total number of tuples in the web table. 1/2/2019

42 Coupling Strength Ranking based on coupling strength helps the user to view the tuples containing high degree of similar information (hot) earlier since all hot tuples are ranked higher than other tuples. We can view the hot tuples without scanning the whole table. 1/2/2019

43 Coupling Ratio Coupling ratio denoted by is:
where is the number of pair of coupling nodes and total number of possible pair of nodes in the web tuple. 1/2/2019

44 Coupling Ratio Higher coupling ratio signifies that tuples participating in the coupling contains high degree of similar information. 1/2/2019

45 Issues Construction of coupled schema at the tuple level.
How to calculate the coupling strength? What is the ranking function? Algorithm for ranking coupled tuples. Properties of web couple operator. Difference between web couple and web join. 1/2/2019


Download ppt "Web Couple: Coupling web information"

Similar presentations


Ads by Google