FINDING RELEVANT INFORMATION OF CERTAIN TYPES FROM ENTERPRISE DATA Date: 2012/04/30 Source: Xitong Liu (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling.

Slides:



Advertisements
Similar presentations
Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.
Advertisements

Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :
Date : 2014/12/04 Author : Parikshit Sondhi, ChengXiang Zhai Source : CIKM’14 Advisor : Jia-ling Koh Speaker : Sz-Han,Wang.
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy Date : 2014/04/15 Source : KDD’13 Authors : Chi Wang, Marina Danilevsky, Nihit.
Bring Order to Your Photos: Event-Driven Classification of Flickr Images Based on Social Knowledge Date: 2011/11/21 Source: Claudiu S. Firan (CIKM’10)
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
DATABASE MANAGEMENT SYSTEMS BASIC CONCEPTS 1. What is a database? A database is a collection of data which can be used: alone, or alone, or combined /
DATABASE MANAGEMENT SYSTEMS BASIC CONCEPTS 1. What is a database? A database is a collection of data which can be used: alone, or alone, or combined /
On Sparsity and Drift for Effective Real- time Filtering in Microblogs Date : 2014/05/13 Source : CIKM’13 Advisor : Prof. Jia-Ling, Koh Speaker : Yi-Hsuan.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
SEEKING STATEMENT-SUPPORTING TOP-K WITNESSES Date: 2012/03/12 Source: Steffen Metzger (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling Koh 1.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
Automated Creation of a Forms- based Database Query Interface Magesh Jayapandian H.V. Jagadish Univ. of Michigan VLDB
1 Retrieval and Feedback Models for Blog Feed Search SIGIR 2008 Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
Beyond Co-occurrence: Discovering and Visualizing Tag Relationships from Geo-spatial and Temporal Similarities Date : 2012/8/6 Resource : WSDM’12 Advisor.
Linking Wikipedia to the Web Antonio Flores Bernal Department of Computer Sciencies San Pablo Catholic University 2010.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.
Querying Structured Text in an XML Database By Xuemei Luo.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.
Recommending Twitter Users to Follow Using Content and Collaborative Filtering Approaches John HannonJohn Hannon, Mike Bennett, Barry SmythBarry Smyth.
RANKING SUPPORT FOR KEYWORD SEARCH ON STRUCTURED DATA USING RELEVANCE MODEL Date: 2012/06/04 Source: Veli Bicer(CIKM’11) Speaker: Er-gang Liu Advisor:
Date : 2012/10/25 Author : Yosi Mass, Yehoshua Sagiv Source : WSDM’12 Speaker : Er-Gang Liu Advisor : Dr. Jia-ling Koh 1.
Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
Date: 2012/07/02 Source: Marina Drosou, Evaggelia Pitoura (CIKM’11) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
Effective Keyword-Based Selection of Relational Databases By Bei Yu, Guoliang Li, Karen Sollins & Anthony K. H. Tung Presented by Deborah Kallina.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Date: 2013/6/10 Author: Shiwen Cheng, Arash Termehchy, Vagelis Hristidis Source: CIKM’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Predicting the Effectiveness.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Ranking-based Processing of SQL Queries Date: 2012/1/16 Source: Hany Azzam (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling Koh.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Date: 2012/5/28 Source: Alexander Kotov. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Interactive Sense Feedback for Difficult Queries.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
Leveraging Knowledge Bases for Contextual Entity Exploration Categories Date:2015/09/17 Author:Joonseok Lee, Ariel Fuxman, Bo Zhao, Yuanhua Lv Source:KDD'15.
{ Adaptive Relevance Feedback in Information Retrieval Yuanhua Lv and ChengXiang Zhai (CIKM ‘09) Date: 2010/10/12 Advisor: Dr. Koh, Jia-Ling Speaker: Lin,
E-commerce in Your Inbox Product Recommendations at Scale
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Finding similar items by leveraging social tag clouds Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: SAC 2012’ Date: October 4, 2012.
1 Link Privacy in Social Networks Aleksandra Korolova, Rajeev Motwani, Shubha U. Nabar CIKM’08 Advisor: Dr. Koh, JiaLing Speaker: Li, HueiJyun Date: 2009/3/30.
QUERY-PERFORMANCE PREDICTION: SETTING THE EXPECTATIONS STRAIGHT Date : 2014/08/18 Author : Fiana Raiber, Oren Kurland Source : SIGIR’14 Advisor : Jia-ling.
Customized of Social Media Contents using Focused Topic Hierarchy
Improving Search Relevance for Short Queries in Community Question Answering Date: 2014/09/25 Author : Haocheng Wu, Wei Wu, Ming Zhou, Enhong Chen, Lei.
Where Did You Go: Personalized Annotation of Mobility Records
Guangbing Yang Presentation for Xerox Docushare Symposium in 2011
and Knowledge Graphs for Query Expansion Saeid Balaneshinkordan
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Web Information retrieval (Web IR)
Date : 2013/1/10 Author : Lanbo Zhang, Yi Zhang, Yunfei Chen
Enriching Taxonomies With Functional Domain Knowledge
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
Information Retrieval and Web Design
Topic: Semantic Text Mining
Presentation transcript:

FINDING RELEVANT INFORMATION OF CERTAIN TYPES FROM ENTERPRISE DATA Date: 2012/04/30 Source: Xitong Liu (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling Koh 1

Outline Introduction Problem Formulation Content requirements Type requirements Requirements Identification Similarity based method Language Modeling based Method Ranking Methods Experiment Conclusion 2

3 Introduction - Motivation

4 Q= “John Smith contact information ” “John Smith ” “contact information ” Find relevant information of certain types

Introduction Main Step: Requirements Identification Search based on both requirements Ranking Methods 5 This problem as keyword search over structured or semi-structured data, and then propose to leverage the complementary unstructured information in the enterprise data to solve the problem.

6 Content requirement Type requirement Similarity –based Language Modeling - based Structured Data Semi-Structured Datd Problem Formulation Requirements Identification Ranking Methods

Problem Formulation Q = (Q C U Q T ) Content requirements kind of information is relevant. Type requirements type of information is desirable. 7 Data Source Content requirements Attribute Value in tuple Type requirements Attribute Names Employee IDNameDepartment 1339 John Smith Employee

8 Problem Formulation Requirements Identification Ranking Methods Content requirement Type requirement Similarity –based Language Modeling - based Structured Data Semi-Structured Datd

Similarity based method 9 Q = “John Smith contact information” “John”, “Smith” Smith contact information John information “John”, “Smith” “information”, “contact” QCQC Calculate Similarity contact QTQT Identify Cluster

Similarity – Mutual information 10 John= 1John= 0 Smith= Smith=060250

Identify Cluster 11 Employee Employee ID, Name, Department, .. Table name + Attribute names Profile Document “John”, “Smith” “information”, “contact” Calculate Similarity “information”, “contact” Higher similarity QTQT

Language Modeling based Method 12 ………… ………………... ……………….. ……………….. Unstructured information θCθC Employee ID, Name, Department, .. θTθT Attribute set Total Term : 500 Total attribute : 15 QCQC QTQT Content requirements Type requirements

Adjust Identification Result 13 Similarity Based Method Language Modeling Based Method C/CC/T T/CT/T Query term = “information” Requirements Identification C : Content requirements T : Type requirements

Adjust Identification Result 14 Similarity Based Method Language Modeling Based Method If > then Q T If < then Q C

15 Problem Formulation Requirements Identification Ranking Methods Content requirement Type requirement Similarity –based Language Modeling - based Structured Data Semi-Structured Data

Ranking Method - Structured Data 16 Type Requirement Content Requirement T2T2 T1T1 T3T3 A2TA2T A3TA3T

Ranking Method - Structured Data 17 nl(T1, T2) is the number of foreign key links between table T1 and T2 nl(Employee ID, Department ) = 1

Query Expansion 18 Q T = “dimension” Top K Q T = “contact information” Relevant attributes “ ”, “Phone”,“Address” not overlapped term Attribute “Contact Person” has one overlapped term, but it’s not a relevant attribute with regard to the query. “width” “depth” “height”.

Ranking Method - Semi Structured Data 19 E i = {E C i,E T i } Entity E i = “Pride and Prejudice” E T i = “Novel” E C i = “Pride and Prejudice, 1/28, 1983, Jane Austen”

Ranking Method - Semi Structured Data 20 N(E i ) is the set of all the neighbor nodes of entity |N(E i )| is the size of N(E i ). | N(“Pride and Prejudice”) | = 1 (Node “Jane Austen”)

Experiment Real-world Enterprise Data Set The data set contains both unstructured and structured information about HP, which is referred to as REAL Simulated Data Set Constructed a simulated data set by choosing the Billion Triple Challenge 2009 dataset which consists of a RDF graph Chose the Category B of ClueWeb09 collection as the complementary unstructured data The data set is referred to as SIMU 21

Experiment Performance of requirement identification 22

23 Baseline Method Paper Method Experiment - REAL

Experiment - SIMU 24 Baseline Method 1dBL : 1-dimensional retrieval 2dBL : 2dSem : 2dBL + Q = Q T-EXP Paper Method 2dGraph : 2dGraphSEM: 2dGraph + Q = Q T-EXP

Conclusion The paper demonstrated the feasibility of leveraging unstructured information to improve the search quality over structured and semi-structured information. Ranking methods utilized unstructured information to identify type requirement in keyword queries and bridge the vocabulary gap between the query and the data collection. 25

26

27

28