SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia.

Slides:



Advertisements
Similar presentations
Efficient IR-Style Keyword Search over Relational Databases Vagelis Hristidis University of California, San Diego Luis Gravano Columbia University Yannis.
Advertisements

Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Fast Algorithms For Hierarchical Range Histogram Constructions
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Knowledge Graph: Connecting Big Data Semantics
An Ontological Approach to the Document Access Problem of Insider Threat ISI 2005, (May 20) Boanerges Aleman-Meza 1 Phillip Burns 2 Matthew Eavenson 1.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Information Retrieval in Practice
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. 1 The Architecture of a Large-Scale Web Search and Query Engine.
Xyleme A Dynamic Warehouse for XML Data of the Web.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Overview of Web Data Mining and Applications Part I
Overview of Search Engines
Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering Jing Zhao University of Southern California Sep 19 th,
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection Boanerges Aleman-Meza, Meenakshi Nagarajan,
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
Cost-based Optimization of Graph Queries Silke Trißl Humboldt-Universität zu Berlin Knowledge Management in Bioinformatics IDAR 2007.
1 Announcements Research Paper due today Research Talks –Nov. 29 (Monday) Kayatana and Lance –Dec. 1 (Wednesday) Mark and Jeremy –Dec. 3 (Friday) Joe and.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal, Surajit Chaudhuri, Gautam Das Cathy Wang
25/03/2003CSCI 6405 Zheyuan Yu1 Finding Unexpected Information Taken from the paper : “Discovering Unexpected Information from your Competitor’s Web Sites”
Harikrishnan Karunakaran Sulabha Balan CSE  Introduction  Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model 
Web Usage Mining for Semantic Web Personalization جینی شیره شعاعی زهرا.
Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Department of computer science and engineering Two Layer Mapping from Database to RDF Martin Švihla Research Group Webing Department.
EasyQuerier: A Keyword Interface in Web Database Integration System Xian Li 1, Weiyi Meng 2, Xiaofeng Meng 1 1 WAMDM Lab, RUC & 2 SUNY Binghamton.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Algorithmic Detection of Semantic Similarity WWW 2005.
Searching and Ranking Documents based on Semantic Relationships PaperPaper presentation ICDE Ph.D. Workshop 2006 April 3rd, 2006, Atlanta, GA, USA This.
Graph Summaries for Subgraph Frequency Estimation 1 Angela Maduko, 2 Kemafor Anyanwu, 3 Amit Sheth, 4 Paul Schliekelman 1 LSDIS Lab, University of Georgia.
Building a Distributed Full-Text Index for the Web by Sergey Melnik, Sriram Raghavan, Beverly Yang and Hector Garcia-Molina from Stanford University Presented.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
An Ontology-based Approach to Context Modeling and Reasoning in Pervasive Computing Dejene Ejigu, Marian Scuturici, Lionel Brunie Laboratoire INSA de Lyon,
Augmenting (personal) IR Readings Review Evaluation Papers returned & discussed Papers and Projects checkin time.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Welcome to CPSC 534B: Information Integration Laks V.S. Lakshmanan Rm. 315.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Efficient Semantic Web Service Discovery in Centralized and P2P Environments Dimitrios Skoutas 1,2 Dimitris Sacharidis.
Discovering and Ranking Semantic Associations over a Large RDF Metabase Chris Halaschek, Boanerges Aleman- Meza, I. Budak Arpinar, Amit P. Sheth 30th International.
GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
Information Retrieval in Practice
Neighborhood - based Tag Prediction
Probabilistic Data Management
Associative Query Answering via Query Feature Similarity
Summarizing Entities: A Survey Report
Gong Cheng, Yanan Zhang, and Yuzhong Qu
Information Retrieval
Keyword Searching and Browsing in Databases using BANKS
International Marketing and Output Database Conference 2005
Information Networks: State of the Art
A Semantic Peer-to-Peer Overlay for Web Services Discovery
Presentation transcript:

SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia PaperPaper presentation at WWW2005, Chiba Japan Kemafor Anyanwu, Angela Maduko, and Amit Sheth. SemRank: Ranking Complex Relationship Search Results on the Semantic Web, Proceedings of the 14th International World Wide Web Conference (WWW2005), Chiba, Japan, May 10-14, 2005, pp This work is funded by NSF-ITR-IDM Award# titled ‘SemDIS: Discovering Complex Relationships in the Semantic Web’ and NSF-ITR-IDM Award# titled ‘Semantic Association Identification and Knowledge Discovery for National Security Applications.’SemDIS: Discovering Complex Relationships in the Semantic Web

Outline The Problem The SemRank relevance model SemRank computational issues in the SSARK system Evaluating SemRank: strategy and issues Related Work Conclusion and Future work

The Problem [Anyanwu et al WWW2003] proposed a query operator for finding complex relationships between entitiesAnyanwu et al WWW2003 [Angles et al ESWC05] a survey of graph- based query operations that should be enabled on the Semantic Web Question: How can results of relationship query operations be ranked?

g The Relationship Ranking Problem query q = (1, 3) (a pair of nodes) 2 3 a d e f 1 5 b c 7 6 f g 8 h 2. g Find the subgraph that covers q List the results in order of relevance could be done with step 1 or as a separate step 2n.2n. 1 4 bd 1. 3

Things to think about Relevance as best match vs. ???? Homogenous (hyperlinks) vs. heterogeneous relationships Should relevance be fixed for all situations? Size of result set potentially large

This paper has relationships to Semantic Searching Graph theory Database Systems –path expression queries, ranked queries, query processing, join algorithms, indexing, etc Data mining Linear algebra But …. Are all these relationships equally relevant when presenting to this audience?

The SemRank Model

SemRank’s Design Philosophy Tenet 1: Thou shall support variable rankings Tenet 2: Thou must not burden the user with complex query specification Tenet 3: Thou shall support main stream search paradigms

SemRank’s Key Concepts Modulative Ranking Relevance: Search Mode + Predictability Refraction Count –How varied is the result from what is expected from schema? Information Gain –How much information does a user gain by being informed about a result? S-Match –Best semantic match with user need (if provided)

High Information Gain High Refraction Count High S-Match Low Information Gain Low Refraction Count High S-Match adjustable search mode

Modulative Rank Function Typical preference or rank function –Rank i =   w i j * attr i j What we want is, given –µ - weight function parameter –and attributes attr 1, attr 2 … attr k e.g. length –for each attribute, select appropriate weight functions from g 1, g 2, … g m e.g. g i (µ) = µ each g i is some function of µ Then –Rank i (µ ) =  g j (µ) * (attr i k ) where g j is the weight function selected for attr k

Refraction as a measure of predictability

Refraction The path “ enrolled_in  taught_by  married_to “ doesn’t exist anywhere at schema layer We say that the path refracts at node 3 High refraction count in a path  low predictability StudentCourseProfessor enrolled_in Spouse married_to taught_by 12 enrolled_intaught_by 4 married_to 3

Semantic Summary C1C1 C2C2 C3C3 C4C4 C5C5 C1C1 p1, p2,p1, p2,p 1, p 2 p3p3 C2C2 P 5, p 4 C3C3 p 1, p 2 p1, p2p1, p2 p3p3 C4C4 p 4, p 5 C5C5 C1C1 C5C5 C4C4 C3C3 C2C2 p1p1 p2p2 p1p1 p3p3 p4p4 p 1, p 2 p5p5 p4p4 p 5 p3p3 p 1, p 2 p2p2 C 1  C 3 C 2  C 4 Representative Ontology Class

Semantic Summary & Refraction. A Semantic Summary is a graph of representative ontology classes with appropriate relations as arcs For a path p = r 1, p 1, r 2, p 2, r 3, there is a refraction at r 2 if p 1  (ROC i, ROC j ) and p 2  (ROC j, ROC k ) (or vice versa) where –ROC i, ROC j, ROC k are representative ontology classes of r 1, r 2, r 3 respectively

Information content and Information gain

Measuring Information Content of a Property Content is related to uncertainty removed Typically measured as some function of its probability –High probability -> low information content For p  P, P = set of property types, its information content I SP can be measured as: –I SP (p k ) = log 2 (1/Pr k (p = p k )) = - log 2 ( [[ p k ]] / [[ P ]] ) I SP (p) is maximum when –Pr i = 1 / [[ P ]] = log [[ P ]]

Information Content of a Property Sequence – global perspective The information content of a sequence of properties p 1  p 2  p 3     p k is –max(I SP (p i )), 1 ≤ i ≤ k p1p1 p2p2 p3p3 Prob = high Prob = low Prob = high Information content is dependent on p 2 weak point

Information Content – Local Perspective Global high information content but local low information content Given (a, p 1, b), information content with respect to only the valid possibilities between a and b ? (a, p 1, b), and valid(p 1 ) is P  =  (ROC i, ROC j ), a  ROC i and b  ROC j and superproperties Recompute probabilities based on P  (local) –I  =min(NI(p i ) + average of other NI

Total Information Content Total information content = Information content from global perspective + Information content from local perspective

S-Match Relevance Specification as keywords

published_in located_in Keywords

S-Match Uses the “best semantic match” paradigm For a keyword k i and a property p j on a path: –SemMatch(k i, p j ) = 0 < (2 d ) -1  1, where d is the minimum distance between the properties in a property hierarchy For a path ps, its S-Match value is: – the sum of the max(SemMatch(k i, p j ))

Putting it all together …….

SemRank For a search mode  and a path ps: Modulated information gain for ps, I  (ps) –I  (ps) = (1-)(I(ps)) -1 + I(ps) Modulated Refraction Count RC  (ps) –RC  (ps) = RC(ps) SEMRANK(ps) = I  (ps)  (1+RC  (ps))  (1+S-Match(ps))

Computing SemRank in SSARK

The SSARK system Ranking Engine Pipelined top-k results Preprocessor Query Processor RDF Documents Query & Result Interface User SubSystem x ??  ??  ?? y FDIX PHIX ROIX Index Manager Storage Manager LtStore UtStore Loader LAC Look Ahead Cache RC Result Cache Preprocessing phase Query Processing phase Ranking phase

2 3 a d e f 1 5 b c 4 Approach     g     Query Processor af fecb db, 4, 5, 4, 2, 1, 6, 6, 2, 5, 3 Ranking engine Assigns SemRank* values to leaves of the tree i.e. edges on the path * - without refraction count g

The Index Subsystem FDIX – Frequency Distribution IndeX –Stores the frequency distribution of properties ROIX – Representative Ontology IndeX –Maps classes to Representative Ontology Classes –Stores the semantic summary graph PHIX – Property Hierarchy IndeX –Uses the Dewey Decimal labeling scheme to encode the hierarchical relationships in a property hierarchy –Used for computing S-Match (match between keywords and properties in a path)

Index Subsystem contd. PHIX – Property Hierarchy IndeX –Uses the Dewey Decimal ?? labeling scheme to encode the hierarchical relationships in a property hierarchy –Used for computing S-Match (match between keywords and properties in a path) { 1.2.2, 2.1} 2 If keyword is 1 and property in path is then distance = 2 and S-Match = 1/2 2

  ∙ ∙ a, 3b, 2  ∙ c, 4d, 1e, 2f, 5  h, 1i, 6 g, 3   ∙ ∙ a, 3b, 2  ∙ c, 4d, 1e, 2f, 5  h, 1i, 6 g, 3 h, 1 i, 6, e, 2 f, 5, d, 1 c, 4, a  b, 5 g .i, 9, h, 1 i, 6, i, 6 ∙   ∙ a, 3b, 2  ∙ c, 4d, 1e, 2f, 5  h, 1 g, 3 g  h, 4 d, 1 a  b, 5 c  f, 9, c  e, 6 e, 2 f, 5, c  4, c.f, 9, h, 1 i, 6, a  b, 5 ∙   ∙ a, 3b, 2  ∙ c, 4d, 1e, 2f, 5  h, 1i, 6 g, 3 g  h, 4 g.i, 9 d1d1 c  e, 6 e, 2 f, 5, c  4, g  i, 9, c  f, 9, c  e, 6... Top-K Evaluation Final Top_k: 1. g.i, c. f, 9

Top-K Evaluation phase 2 – refraction count The total refraction count for a path is not known until the whole path has been assembled at the root node, so is not used in the first phase In phase 2, we integrate the refraction count into the top-k results at the root node and rerank –The final ordering is not an exact SemRank ordering but is a reasonable tradeoff

Evaluation Issues Data set needs –Entities described with a variety of relationships –Richly connected hierarchies –Realistic frequency distributions Synthetically generated realistic small data set using human defined rules –e.g. |(p = “audits”)| ≤ 0.1  |(p = “enrolls”)|

µ = 0

µ = 1

Related Work Semantic searching and ranking of entities on the Semantic Web Rocha et al WWW2004, Guha et al WWW2003, Stojanovic et al ISWC 2003, Zhuge et al WWW2003, Semantic ranking of relationships Halaschek VLDB demo 2004, Aleman-Meza et al SWDB03

Future Work Comprehensive evaluation Including some measures for importance of nodes in the paths Revise the Modulation function Optimizing Top-K evaluation –Decreasing height of tree –estimation techniques for a closer approximation to SemRank ordering

Data, demos, more publications at SemDis project web site (Google: semdis) Thank Yousemdis