GEORGIOS FAKAS Department of Computing and Mathematics, Manchester Metropolitan University Manchester, UK. Automated Generation of Object.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Efficient IR-Style Keyword Search over Relational Databases Vagelis Hristidis University of California, San Diego Luis Gravano Columbia University Yannis.
1 Autocompletion for Mashups Ohad Greenshpan, Tova Milo, Neoklis Polyzotis Tel-Aviv University UCSC.
PAPER BY : CHRISTOPHER R’E NILESH DALVI DAN SUCIU International Conference on Data Engineering (ICDE), 2007 PRESENTED BY : JITENDRA GUPTA.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
Site Level Noise Removal for Search Engines André Luiz da Costa Carvalho Federal University of Amazonas, Brazil Paul-Alexandru Chirita L3S and University.
Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)
Efficient Query Evaluation on Probabilistic Databases
Schema Summarization cong Yu Department of EECS University of Michigan H. V. Jagadish Department of EECS University of Michigan
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Eduard C. Dragut Ramon Lawrence Eduard C. Dragut Ramon Lawrence.
Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.
Search Engines and Information Retrieval
Search Strategies Online Search Techniques. Universal Search Techniques Precision- getting results that are relevant, “on topic.” Recall- getting all.
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
INFO 624 Week 3 Retrieval System Evaluation
XSEarch: A Semantic Search Engine for XML Sara Cohen Jonathan Mamou Yaron Kanza Yehoshua Sagiv Presented at VLDB 2003, Germany.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
Information Retrieval
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Connecting Diverse Web Search Facilities Udi Manber, Peter Bigot Department of Computer Science University of Arizona Aida Gikouria - M471 University of.
Web Search – Summer Term 2006 VII. Selected Topics - PageRank (closer look) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Chapter 5: Information Retrieval and Web Search
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Keyword Search in Relational Databases Jaehui Park Intelligent Database Systems Lab. Seoul National University
Search Engines and Information Retrieval Chapter 1.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Automated Creation of a Forms- based Database Query Interface Magesh Jayapandian H.V. Jagadish Univ. of Michigan VLDB
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Harikrishnan Karunakaran Sulabha Balan CSE  Introduction  Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model 
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Web Search. Crawling Start from some root site e.g., Yahoo directories. Traverse the HREF links. Search(initialLink) fringe.Insert( initialLink ); loop.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Alexey Kolosoff, Michael Bogatyrev 1 Tula State University Faculty of Cybernetics Laboratory of Information Systems.
Keyword Query Routing.
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
ITGS Databases.
Iana Atanassova Research: – Information retrieval in scientific publications exploiting semantic annotations and linguistic knowledge bases – Ranking algorithms.
Effective Keyword-Based Selection of Relational Databases By Bei Yu, Guoliang Li, Karen Sollins & Anthony K. H. Tung Presented by Deborah Kallina.
Information Retrieval
A Novel Visualization Model for Web Search Results Nguyen T, and Zhang J IEEE Transactions on Visualization and Computer Graphics PAWS Meeting Presented.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Keyword Searching and Browsing in Databases using BANKS Charuta Nakhe, Arvind Hulgeri, Gaurav Bhalotia, Soumen Chakrabarti, S. Sudarshan Presented by Sushanth.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
CS520 Web Programming Full Text Search Chengyu Sun California State University, Los Angeles.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
CS 440 Database Management Systems Web Data Management 1.
INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Queensland University of Technology
Neighborhood - based Tag Prediction
CS 440 Database Management Systems
Introduction to Information Retrieval
Junghoo “John” Cho UCLA
Object Summary 徐丹云.
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

GEORGIOS FAKAS Department of Computing and Mathematics, Manchester Metropolitan University Manchester, UK. Automated Generation of Object Summaries from Relational Databases: A Novel Keyword Searching Paradigm

Related Work: Keyword Search in Relational DBs Full-text Search ( e.g. Oracle 9i Text ) Kw Searching in Relational DB (DISCOVER, BANKS) Kw Search: Leverling, Peacock Result: e3-o2-c2 e4-06-c2

Related Work: Web Search Engines: Keyword Search Kw Search: Peacock Result: A ranked set of web pages

Related Work: Web Search Engines: Keyword Search Kw Search: Peacock Result: A ranked set of web pages

A Novel Keyword Searching Paradigm: Object Summaries (OSs) Kw Search: Peacock Result: A Ranked set of OSs

A Novel Keyword Searching Paradigm: Object Summaries (OSs) Kw Search: Peacock Result: A Ranked set of OSs Problems-Challenges: How can we automatically (1) Generate and (2) Rank OSs liberating users from knowledge of: (1) Schema and (2) Query Language?

OS Generation - Methodology Our solutions are based on the assumption that each database has central relations (denoted as RDS) that represent the DS’s. E.g Northwind RDS = { REmploye es, RCustomer s } Relations linked around RDSs include additional information about the DS t DS a central tuple containing the Kw; tuples around t DS contain additional information about the Data Subject. R DS the corresponding central Relation; similarly Relations around contain additional information.

OS Generation - Methodology t DS a central tuple containing the Kw; tuples around t DS contain additional information about the Data Subject. R DS the corresponding central Relation; similarly Relations around contain additional information.

OS Generation - Methodology t DS a central tuple containing the Kw; tuples around t DS contain additional information about the Data Subject. R DS the corresponding central Relation; similarly Relations around contain additional information.

OS Generation - Methodology t DS a central tuple containing the Kw; tuples around t DS contain additional information about the Data Subject. R DS the corresponding central Relation; similarly Relations around contain additional information.

OS Generation - Methodology G DS

OS Generation - Methodology G DS Problem: Problem: Not all Relations in G DS are relevant: How do I decide 1) What relations to select or not 2) When to Stop Traversing Solution: Solution: Investigate Relational Semantics: Schema Connectivity, Cardinality, Related Cardinality etc. Quantify Affinity of Relations

: Affinity of Relations to R DS in G DS Distance Physical (fd), Logical (ld), ld=fd- |M:N|

: Affinity of Relations to R DS in G DS Distance Physical (fd), Logical (ld), ld=fd- |M:N| E.g. Orders closer than Customer and CustomerDemo to Employees

: Affinity of Relations to R DS in G DS Distance Physical (fd), Logical (ld), ld=fd- |M:N| E.g. Orders closer than Customer and CustomerDemo to Employees Hubs: spurious shortcuts Rather irrelevant or lateral information RC(R1, R2)

: Affinity of Relations to R DS in G DS Connectivity Schema Connectivity (Co i ) Data-graph Connectivity : Relative Cardinality (RC i → j ), i.e. the average number of tuples of R i that are connected with each tuple from R j for 1:M RC i → j =|Ri|/|Rj| for M:1 RC i → j =1 Reverse Relative Cardinality (RRCi → j) is the reverse of RC i → j i.e. RRC i → j =RC i → j ).

: Affinity of Relations to R DS in G DS DAf(Ri)={(m1, w1), (m2, w2),.. (mn, wn)} m1=f1(ldi), m2=f1(log(10*RCi), m3=f1(log(10*RRCi), m4=f1(log(10*Coi) f1(α)=(11- α)/10 For a hub-child m1=f1(ldi *hi) and m2=f1(RCi) Formula 1 (Semantic Affinity): The affinity of R i to R DS, denoted as, with respect to a schema and a database conforming to the schema, can be calculated with the following formula: Where is the affinity of the R i ’s Parent to R DS or is 1 if R Parent ≡ R DS.□

: Affinity of Relations to R DS in G DS G DS ( θ )

A Ranked set of Partial OSs - A complete OS OS Ranking

OS Ranking- Problems and Challenges Existing Keyword Searching ranking semantics  the smaller size the higher ranking In contrast, in the proposed paradigm an OS containing many and well connected tuples should have certainly greater importance than an OS with less tuples. For instance, a Customer or Employee OS involved in many Orders or an Author authored many important papers and books.

OS Ranking- Importance Im(OS)= t i is a tuple of OS Im(t i ) is the Importance of t i (i.e. PageRank) |OS| is the amount of tuples in OS, Af R (t i ) is the affinity of R that t i belongs to

Experimental Evaluation MS Northwind and TPC-H DBs Precision, Recall, F-Score Compare G DS s and OSs produced by 12 G DS (θ) v G DS (h) G DS (h) was proposed by 10 participants G DS : average F-score 86.77, OS aver F-score 83

Conclusions –Future Work  Top-k OS results  Top-k size of an OS  Challenge: the weights of new tuples are not monotonic  (since a tuple’s PageRank may increase while its Affinity decrease).  Alternative to PageRank weighting systems are currently investigated; i.e. ObjectRanks

Conclusions -Novel Contributions  The formal definition of the novel Searching Paradigm which automatically produces a ranked set of OSs for a Data Subject.  minimum contribution from the user (i.e. only a Kw)  no prior knowledge of the DB schema or query language needed.  Excellent Precision, Recall and F-score results  The formal definition and quantification of Relation’s Affinity in the context of G DS  consider both Schema Design and Data distributions  A novel ranking paradigm to calculate Im(OS).  The quantification of tuples’ and OSs’ Importance.  A Combine Function that considers:  the weight (e.g. PageRank) of tuples,  Affinity and  size of OS

: Affinity of Relations to R DS in G DS

Affinity Ranking Correctness (Average)