Crowd-Augmented Social Aware Search Soudip Roy Chowdhury & Bogdan Cautis.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Group Recommendation: Semantics and Efficiency
Web Information Retrieval
BY ANISH D. SARMA, XIN DONG, ALON HALEVY, PROCEEDINGS OF SIGMOD'08, VANCOUVER, BRITISH COLUMBIA, CANADA, JUNE 2008 Bootstrapping Pay-As-You-Go Data Integration.
Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :
Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Best-Effort Top-k Query Processing Under Budgetary Constraints
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Data Engineering Research Group 4 faculty members Reynold Cheng David Cheung Ben Kao Nikos Mamoulis 20 research students (10 PhD, 10 MPhil)
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Finding Topic-sensitive Influential Twitterers Presenter 吴伟涛 TwitterRank:
Case Study: BibFinder BibFinder: A popular CS bibliographic mediator –Integrating 8 online sources: DBLP, ACM DL, ACM Guide, IEEE Xplore, ScienceDirect,
SNOW Workshop, 8th April 2014 Real-time topic detection with bursty ngrams: RGU participation in SNOW 2014 challenge Carlos Martin and Ayse Goker (Robert.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Expertise Networks in Online Communities: Structure and Algorithms Jun Zhang Mark S. Ackerman Lada Adamic University of Michigan WWW 2007, May 8–12, 2007,
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Top- K Query Evaluation with Probabilistic Guarantees Martin Theobald, Gerhard Weikum, Ralf Schenkel Presenter: Avinandan Sengupta.
EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.
Tag-based Social Interest Discovery
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
A Survey on Social Network Search Ranking. Web vs. Social Networks WebSocial Network Publishing Place documents on server Post contents on social network.
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
Da Yan and Wilfred Ng The Hong Kong University of Science and Technology.
MPI Informatik 1/17 Oberseminar AG5 Result merging in a Peer-to-Peer Web Search Engine Supervisors: Speaker : Sergey Chernov Prof. Gerhard Weikum Christian.
Network Aware Resource Allocation in Distributed Clouds.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
1 11 Subcarrier Allocation and Bit Loading Algorithms for OFDMA-Based Wireless Networks Gautam Kulkarni, Sachin Adlakha, Mani Srivastava UCLA IEEE Transactions.
Optimizing Plurality for Human Intelligence Tasks Luyi Mo University of Hong Kong Joint work with Reynold Cheng, Ben Kao, Xuan Yang, Chenghui Ren, Siyu.
Group Recommendations with Rank Aggregation and Collaborative Filtering Linas Baltrunas, Tadas Makcinskas, Francesco Ricci Free University of Bozen-Bolzano.
Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.
Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa
Implicit An Agent-Based Recommendation System for Web Search Presented by Shaun McQuaker Presentation based on paper Implicit:
Parallel and Distributed IR. 2 Papers on Parallel and Distributed IR Introduction Paper A: Inverted file partitioning schemes in Multiple Disk Systems.
Grouping search-engine returned citations for person-name queries Reema Al-Kamha, David W. Embley (Proceedings of the 6th annual ACM international workshop.
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
ON INCENTIVE-BASED TAGGING Xuan S. Yang, Reynold Cheng, Luyi Mo, Ben Kao, David W. Cheung {xyang2, ckcheng, lymo, kao, The University.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.
RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures Justin Levandoski Michael D. Ekstrand Michael J. Ludwig Ahmed Eldawy.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media J. Bian, Y. Liu, E. Agichtein, and H. Zha ACM WWW, 2008.
1 CS 430: Information Discovery Lecture 5 Ranking.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
Chance Constrained Robust Energy Efficiency in Cognitive Radio Networks with Channel Uncertainty Yongjun Xu and Xiaohui Zhao College of Communication Engineering,
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.
Academic Year 2014 Spring Academic Year 2014 Spring.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign RankFP : A Framework for Rank Formulation and Processing Hwanjo Yu, Seung-won.
Federated text retrieval from uncooperative overlapped collections Milad Shokouhi, RMIT University, Melbourne, Australia Justin Zobel, RMIT University,
Information Retrieval in Practice
Seung-won Hwang, Kevin Chen-Chuan Chang
Probabilistic Data Management
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Rank Aggregation.
Navigation-Aided Retrieval
INF 141: Information Retrieval
VECTOR SPACE MODEL Its Applications and implementations
Presentation transcript:

Crowd-Augmented Social Aware Search Soudip Roy Chowdhury & Bogdan Cautis

What are we talking about? Social Aware Search – Finding results relevant for the query and for the users (seeker) – Web Search (tf-idf) + Social search (social connections e.g., follower-following links) However, – Required numbers of results (K items) are not found – Algorithm does not ensure the quality of the retrieved results Our aim is to

What are we talking about? Social Aware Search – Finding results relevant for the query and for the users (seeker) – Web Search (tf-idf) + Social search (social connections e.g., follower-following links) However, – Required numbers of results (K items) are not found – Algorithm does not ensure the quality of the retrieved results Our aim is to Use

What are we talking about? Social Aware Search – Finding results relevant for the query and for the users (seeker) – Web Search (tf-idf) + Social search (social connections e.g., follower-following links) However, – Required numbers of results (K items) are not found – Algorithm does not ensure the quality of the retrieved results Our aim is to Use For Datasourcing

What are we talking about? Social Aware Search – Finding results relevant for the query and for the users (seeker) – Web Search (tf-idf) + Social search (social connections e.g., follower-following links) However, – Required numbers of results (K items) are not found – Algorithm does not ensure the quality of the retrieved results Our aim is to To address the following problems efficiently

Lets see an example! Query: get top 4 tweets for the query terms “#jesuscharlie #jesuisahmed”

Hashtag TermTweet IDFrequency #jesuischarlieD11 D21 D30 D42 D51 D60 Hashtag TermTweet IDFrequency #jesuisahmedD10 D21 D31 D41 D51 D60 By aggregating term- frequencies we get the final result

Hashtag TermTweet IDFrequency #jesuischarlieD11 D21 D30 D42 D51 D60 Hashtag TermTweet IDFrequency #jesuisahmedD10 D21 D31 D41 D51 D60 Hashtag TermTweet IDFrequency #jesuischarlie #jesuisahmed D11 D22 D31 D43 D52 D60 and top-4 items are

Similarly the social scores are calculated TweetIDHashtag term AuthorSocial Score D1#jesuischarlieElham0.9x0.9x0.5 D2#jesuischarlieElham0.9x0.9x0.5 #jesuischarlieDas0.9x0.9 D3#jesuisahme d Bob0.9 D4#jesuischarlieElham0.9x0.9x0.5 #jesuischarlieDas0.9x0.9 #jesuisahme d Das0.9x0.9 D5#jesuischarlieChang0.6 #jesuisahme d Chang0.6

Hashtag TermTweet IDSocial score #jesuischarlieD10.4 D21.21 D30 D41.21 D50.6 Hashtag TermTweet IDSocial score #jesuisahmedD10 D20 D30.9 D40.81 D50.6 Hashtag TermTweet IDSocial score #jesuischarlie #jesuisahmed D10.4 D21.21 D30.9 D42.02 D51.2 and top-4 items are Top-k results with social score!

Social-aware search Final results are calculated based on the score model – score(item|seeker,tag)= α × tf-idf(tag,item)+(1-α) × sc(item|seeker,tag) Following this model, the top-4 results for our example scenario – D4, D2, D5, and D3 Let us know consider some additional constraints to make sure the results are good in quality

#Constraint: Each result item must at least be tagged twice Example scenario with quality constraints Hence top-4 items are Hashtag TermTweet IDFrequency #jesuischarlie #jesuisahmed D11 D22 D31 D43 D52 D60

Hashtag TermTweet IDSocial score #jesuischarlie #jesuisahmed D10.4 D21.21 D30.9 D42.02 D51.2 #Constraint: Social score for an item must be > 1, in order to be in the final result list Example scenario with quality constraints Hence top-4 items are

Contributions Formalize quality constraints for social-aware search – List of quality constraints

List of quality constraints 1.Min # of posts for item-tag pair 2.Min # of distinct tags per item 3.Min # of tag occurrences per item 4.Threshold for social score 5.Threshold stability measures for tags – Based on moving average of relative tag frequency distribution [1] To be in the top-k result list an item, apart from the social aware search based threshold must also satisfy these constraints

Items that do not meet the constraints are friendsourced Friendsourcing tasks are designed to improve the quality of the top-k result Friendsourcing tasks = I, T, U, where items I are friendsourced to friends U and U provide tags T for items Friendsourcing

Human Tasks T1: Minimum number of posts for an item-tag pair - I1,t1,{u1,u2,...,uk} T2: Minimum number of distinct tags: I1, {t1, t2,..., tn }, {u1, u2,..., uk} T3: Minimum number of tag occurrences: {{{I1,I2,...,ln},t1,{u1,u2,...,uk}}, {I1,I2,...,ln},t2, {u1,u2,...,uk}},..., {I1,I2,...,ln},tn,{u1,u2,...,uk}}}

Human Tasks T4: Minimum number of taggers: I1,{t1,t2,...,tn}, {u1,u2,...,uk} T5: Minimum network-aware score: I1,{t1,t2,...,tn}, {u1,u2,...,uk} T6: Stability-based tag quality: I1, t1, {u1, u2,..., uk}

Human Task Optimization Problem 1: – Given a set of items {I1, I2, …, In} in a result list, that do not satisfy constraints (C1, C2,…., C6) – Choose an item / set of items that can complete the top-k result list with minimum numbers of tasks – Inter-item gain

Human Task Optimization contd. Problem 2: – Given a chosen item I {I1, … In}, that does not satisfy constraints (C1, C2,…., C6) – Choose a task Ti {T1,..,T6}, such that it can satisfy maximum number of constraints by minimizing the total numbers of tasks/item – Intra-item gain

Human Task Optimization contd. Problem 3: – Given a set of items {I1, I2, …, In} in a result list, that do not satisfy constraints (C1, C2,…., C6) – Choose a set of tasks, where I {1,..,n} and j {1,..,6} such that the intra and inter item gain is maximized

Solution hints Problem 1. – Among n items in the result lists, if for each of the constraints we create a partial ordered list of items wrt constraint thresholds – Item/s that appear highest/er positions in most of the list are chosen first to be friendsourced

Solution hints Problem 2. – There exists conditional probabilistic dependencies among constraints – E.g.,, where P(T2) =1 – we aim to find i such that the value of the conditional probability value of – Is maximized

Expert Selection Criteria Users are selected based on – User expertise score – Communication cost User expertise score (Exp ui ) – User profile/activity attributes – Question-specific user expertise – Algorithm-specific user attributes

Expert Selection Criteria Users are selected based on – User expertise score – Communication cost User expertise score (Exp ui ) – User profile/activity attributes – Question-specific user expertise – Algorithm-specific user attributes Communication cost (Cost ui ) – Social score

System Architecture 1 1

2 2 ✔ ✔

3 3 ✔

4 4

5 5 ✔ ✔

6 6

7 7 ✔ ✔ ✔

Screenshots of CANTO (Search Interfaces)

Screenshots of CANTO (Friendsourcing Seeker perspective)

Screenshots of CANTO (Friendsourcing provider perspective)

Summary We are working on – Augmenting crowd aka friends for Social-aware search – Algorithm for generating both adhoc and best efforts social-aware search result – Advanced expert selection algorithm Considering both budget and time constraints – Planning to explore MAB for expert selection So far used tweets and hashtags for experimentation, planning to experiments with Vodkaster dataset (user network, films, comments, micro critique data).

Thank you for your attention!

References 1.William Webber, Alistair Moffat, and Justin Zobel A similarity measure for indefinite rankings. ACM Trans. Inf. Syst. 2.Xuan S. Yang, David W. Cheung, Luyi Mo, Reynold Cheng, and Ben Kao On incentive- based tagging. In Proceedings of the 2013 IEEE International Conference on Data Engineering.