Download presentation
Presentation is loading. Please wait.
Published byJared Palmer Modified over 9 years ago
1
Crowd-Augmented Social Aware Search Soudip Roy Chowdhury & Bogdan Cautis
2
What are we talking about? Social Aware Search – Finding results relevant for the query and for the users (seeker) – Web Search (tf-idf) + Social search (social connections e.g., follower-following links) However, – Required numbers of results (K items) are not found – Algorithm does not ensure the quality of the retrieved results Our aim is to
3
What are we talking about? Social Aware Search – Finding results relevant for the query and for the users (seeker) – Web Search (tf-idf) + Social search (social connections e.g., follower-following links) However, – Required numbers of results (K items) are not found – Algorithm does not ensure the quality of the retrieved results Our aim is to Use
4
What are we talking about? Social Aware Search – Finding results relevant for the query and for the users (seeker) – Web Search (tf-idf) + Social search (social connections e.g., follower-following links) However, – Required numbers of results (K items) are not found – Algorithm does not ensure the quality of the retrieved results Our aim is to Use For Datasourcing
5
What are we talking about? Social Aware Search – Finding results relevant for the query and for the users (seeker) – Web Search (tf-idf) + Social search (social connections e.g., follower-following links) However, – Required numbers of results (K items) are not found – Algorithm does not ensure the quality of the retrieved results Our aim is to To address the following problems efficiently
6
Lets see an example! Query: get top 4 tweets for the query terms “#jesuscharlie #jesuisahmed”
7
Hashtag TermTweet IDFrequency #jesuischarlieD11 D21 D30 D42 D51 D60 Hashtag TermTweet IDFrequency #jesuisahmedD10 D21 D31 D41 D51 D60 By aggregating term- frequencies we get the final result
8
Hashtag TermTweet IDFrequency #jesuischarlieD11 D21 D30 D42 D51 D60 Hashtag TermTweet IDFrequency #jesuisahmedD10 D21 D31 D41 D51 D60 Hashtag TermTweet IDFrequency #jesuischarlie #jesuisahmed D11 D22 D31 D43 D52 D60 and top-4 items are
9
Similarly the social scores are calculated TweetIDHashtag term AuthorSocial Score D1#jesuischarlieElham0.9x0.9x0.5 D2#jesuischarlieElham0.9x0.9x0.5 #jesuischarlieDas0.9x0.9 D3#jesuisahme d Bob0.9 D4#jesuischarlieElham0.9x0.9x0.5 #jesuischarlieDas0.9x0.9 #jesuisahme d Das0.9x0.9 D5#jesuischarlieChang0.6 #jesuisahme d Chang0.6
10
Hashtag TermTweet IDSocial score #jesuischarlieD10.4 D21.21 D30 D41.21 D50.6 Hashtag TermTweet IDSocial score #jesuisahmedD10 D20 D30.9 D40.81 D50.6 Hashtag TermTweet IDSocial score #jesuischarlie #jesuisahmed D10.4 D21.21 D30.9 D42.02 D51.2 and top-4 items are Top-k results with social score!
11
Social-aware search Final results are calculated based on the score model – score(item|seeker,tag)= α × tf-idf(tag,item)+(1-α) × sc(item|seeker,tag) Following this model, the top-4 results for our example scenario – D4, D2, D5, and D3 Let us know consider some additional constraints to make sure the results are good in quality
12
#Constraint: Each result item must at least be tagged twice Example scenario with quality constraints Hence top-4 items are Hashtag TermTweet IDFrequency #jesuischarlie #jesuisahmed D11 D22 D31 D43 D52 D60
13
Hashtag TermTweet IDSocial score #jesuischarlie #jesuisahmed D10.4 D21.21 D30.9 D42.02 D51.2 #Constraint: Social score for an item must be > 1, in order to be in the final result list Example scenario with quality constraints Hence top-4 items are
14
Contributions Formalize quality constraints for social-aware search – List of quality constraints
15
List of quality constraints 1.Min # of posts for item-tag pair 2.Min # of distinct tags per item 3.Min # of tag occurrences per item 4.Threshold for social score 5.Threshold stability measures for tags – Based on moving average of relative tag frequency distribution [1] To be in the top-k result list an item, apart from the social aware search based threshold must also satisfy these constraints
16
Items that do not meet the constraints are friendsourced Friendsourcing tasks are designed to improve the quality of the top-k result Friendsourcing tasks = I, T, U, where items I are friendsourced to friends U and U provide tags T for items Friendsourcing
17
Human Tasks T1: Minimum number of posts for an item-tag pair - I1,t1,{u1,u2,...,uk} T2: Minimum number of distinct tags: I1, {t1, t2,..., tn }, {u1, u2,..., uk} T3: Minimum number of tag occurrences: {{{I1,I2,...,ln},t1,{u1,u2,...,uk}}, {I1,I2,...,ln},t2, {u1,u2,...,uk}},..., {I1,I2,...,ln},tn,{u1,u2,...,uk}}}
18
Human Tasks T4: Minimum number of taggers: I1,{t1,t2,...,tn}, {u1,u2,...,uk} T5: Minimum network-aware score: I1,{t1,t2,...,tn}, {u1,u2,...,uk} T6: Stability-based tag quality: I1, t1, {u1, u2,..., uk}
19
Human Task Optimization Problem 1: – Given a set of items {I1, I2, …, In} in a result list, that do not satisfy constraints (C1, C2,…., C6) – Choose an item / set of items that can complete the top-k result list with minimum numbers of tasks – Inter-item gain
20
Human Task Optimization contd. Problem 2: – Given a chosen item I {I1, … In}, that does not satisfy constraints (C1, C2,…., C6) – Choose a task Ti {T1,..,T6}, such that it can satisfy maximum number of constraints by minimizing the total numbers of tasks/item – Intra-item gain
21
Human Task Optimization contd. Problem 3: – Given a set of items {I1, I2, …, In} in a result list, that do not satisfy constraints (C1, C2,…., C6) – Choose a set of tasks, where I {1,..,n} and j {1,..,6} such that the intra and inter item gain is maximized
22
Solution hints Problem 1. – Among n items in the result lists, if for each of the constraints we create a partial ordered list of items wrt constraint thresholds – Item/s that appear highest/er positions in most of the list are chosen first to be friendsourced
23
Solution hints Problem 2. – There exists conditional probabilistic dependencies among constraints – E.g.,, where P(T2) =1 – we aim to find i such that the value of the conditional probability value of – Is maximized
24
Expert Selection Criteria Users are selected based on – User expertise score – Communication cost User expertise score (Exp ui ) – User profile/activity attributes – Question-specific user expertise – Algorithm-specific user attributes
25
Expert Selection Criteria Users are selected based on – User expertise score – Communication cost User expertise score (Exp ui ) – User profile/activity attributes – Question-specific user expertise – Algorithm-specific user attributes Communication cost (Cost ui ) – Social score
26
System Architecture 1 1
27
2 2 ✔ ✔
28
3 3 ✔
29
4 4
30
5 5 ✔ ✔
31
6 6
32
7 7 ✔ ✔ ✔
33
Screenshots of CANTO (Search Interfaces)
35
Screenshots of CANTO (Friendsourcing Seeker perspective)
36
Screenshots of CANTO (Friendsourcing provider perspective)
37
Summary We are working on – Augmenting crowd aka friends for Social-aware search – Algorithm for generating both adhoc and best efforts social-aware search result – Advanced expert selection algorithm Considering both budget and time constraints – Planning to explore MAB for expert selection So far used tweets and hashtags for experimentation, planning to experiments with Vodkaster dataset (user network, films, comments, micro critique data).
38
Thank you for your attention!
39
References 1.William Webber, Alistair Moffat, and Justin Zobel. 2010. A similarity measure for indefinite rankings. ACM Trans. Inf. Syst. 2.Xuan S. Yang, David W. Cheung, Luyi Mo, Reynold Cheng, and Ben Kao. 2013. On incentive- based tagging. In Proceedings of the 2013 IEEE International Conference on Data Engineering.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.