Presentation is loading. Please wait.

Presentation is loading. Please wait.

Structural Web Search Using a Graph-Based Discovery System Nitish Manocha, Diane J. Cook, and Lawrence B. Holder University of Texas at Arlington

Similar presentations


Presentation on theme: "Structural Web Search Using a Graph-Based Discovery System Nitish Manocha, Diane J. Cook, and Lawrence B. Holder University of Texas at Arlington"— Presentation transcript:

1 Structural Web Search Using a Graph-Based Discovery System Nitish Manocha, Diane J. Cook, and Lawrence B. Holder University of Texas at Arlington cook@cse.uta.eduhttp://www-cse.uta.edu/~cook

2 Structured Web Search n Existing search engines use linear feature match n Web contains structural information as well n Hyperlink information n Web viewed as a graph [Kleinberg] n Subdue searches based on structure n Use as foundation of a structural search engine n Incorporation of WordNet allows for synonym match

3 object triangle n Discovers structural patterns in input graphs n A substructure is connected subgraph n An instance of a substructure is a subgraph that is isomorphic to substructure definition n Pattern discovery, classification, clustering R1 C1 T1 S1 T2 S2 T3 S3 T4 S4 Input DatabaseSubstructure S1 (graph form) Compressed Database R1 C1 object square on shape

4 Subdue Algorithm Start with individual vertices Start with individual vertices Keep only best substructures on queue Keep only best substructures on queue Expand substructure by adding edge/vertex Expand substructure by adding edge/vertex Compress graph and repeat to generate hierarchical description Compress graph and repeat to generate hierarchical description Optional use of background knowledge Optional use of background knowledge

5 Inexact Graph Match n Some variations may occur between instances n Want to abstract over minor differences n Difference = cost of transforming one graph to make it isomorphic to another n Match if cost/size < threshold

6 Application Domains n Protein data n Human Genome DNA data n Spatial-temporal domains n Earthquake data n Aircraft Safety and Reporting System n Telecommunications data n Program source code n Web data

7 page Represent Web as Graph n Breadth-first search of domain to generate graph n Nodes represent pages / documents n Edges represent hyperlinks n Additional nodes represent document keywords page university texas learning group projects subdu e robotics parallel hyperlink work word planning

8 WebSubdue’s Structural Search n Formulate query as graph n Use Subdue’s predefined substructure option to search for instances of query Instructor Teaching Robotics Research Robotics Publicatio n Robotics http Postscript | PDF

9 Query: Find all pages which link to a page containing term ‘Subdue’ Subgraph vertices: 1 page URL: http://cygnus.uta.eduhttp://cygnus.uta.edu 7 page URL: http://cygnus.uta.edu/projects.htmlhttp://cygnus.uta.edu/projects.html 8Subdue [1->7] hyperlink [7->8] word Subdue page hyperlink /* Vertex ID Label */ s v 1 page v 2 page v 3 Subdue /* Edge Vertex 1 Vertex 2 Label */ d 1 2 hyperlink d 2 3 word word page

10 Search for Presentation Pages n WebSubdue  22 instances n AltaVista  Query “host:www-cse.uta.edu AND image:next_motif.gif AND image:up_motif.gif AND image:previous_motif.gif.”  12 instances page hyperlink

11 Search for Reference Pages n Search for page with at least 35 in links n WebSubdue found 5 pages in www-cse n AltaVista cannot perform this type of search page hyperlink …

12 Inclusion of WordNet n When generating graph n Use common stopword list n When searching for subgraph instances n Morphology functions n October = Oct n teaching = teach n Synsets n Optional allowance of synonyms

13 Search for pages on ‘jobs in computer science’ n Inexact match: allow one level of synonyms n WebSubdue found 33 matches n Words include employment, work, job, problem, task n AltaVista found 2 matches page jobscomputerscience word

14 Search for ‘authority’ hub and authority pages n WebSubdue found 3 hub (and 3 authority) pages n AltaVista cannot perform this type of search n Inexact match applied with threshold = 0.2 (4.2 transformations allowed) n WebSubdue found 13 matches page hyperlink page word algorithms HUBS AUTHORITIES

15 Subdue Learning from Web Data n Distinguish professors’ and students’ web pages n Learned concept (professors have “box” in address field) n Distinguish online stores and professors’ web pages n Learned concept (stores have more levels in graph) pagebox word page

16 Conclusions n WebSubdue can be used to search for structural web data n Could be enhanced with additional WordNet features such as synset path length n Efficient structural search necessary for future of web search tools

17 To Learn More cygnus.uta.edu/subdue cook@cse.uta.edu http://www-cse.uta.edu/~cook


Download ppt "Structural Web Search Using a Graph-Based Discovery System Nitish Manocha, Diane J. Cook, and Lawrence B. Holder University of Texas at Arlington"

Similar presentations


Ads by Google