Download presentation
Presentation is loading. Please wait.
Published byAnderson Pilch Modified over 10 years ago
1
Capturing User Contexts: Dynamic Profiling for Information Seeking Tasks Roman Y. Shtykh Waseda University, Japan
2
Information Need as a Driving Force of Human Information Behaviour Recognition of one’s knowledge inadequacy to satisfy a particular goal (Case, 2002) “consciously identified gap” in one’s knowledge (Ingwersen and Jarvelin, 2005) How can the system be user-centric and satisfy sufficiently the user’s information need without knowing it?
3
Context Information need emerges in one’s individual context, and both context and information need are evolving over time Information behaviours happening to satisfy the information need and leading to an information object selection also take place in the same particular context
4
Context of “fragmentary” nature
5
BESS (BEtter Search and Sharing) Framework for collaborative information seeking and sharing. Uses uniform relevance feedback to infer user interests changing over time and – Use the knowledge about the interests to better satisfy seeking intents by providing information that is likely to match inferred user interests – PERSONALIZED SEEKING to evaluate shared information based on his/her interests (expertise) – PERSONALIZED SHARING
6
Profile Structure l – layer, k – concept number
7
Concept Formation with H2S2D (High Similarity Data-Driven) Clustering Online incremental clustering method for relevance feedback sequential data. Based on the peculiarities of a user’s seeking behavior (ASSUMPTIONS in the next slide).
8
Assumptions When a user searches, he/she usually sees (clicks on, focuses the attention on, etc.) several documents (links or other objects) until the most relevant is found. Most of these documents are potentially inter-similar to some extent and can give a conception about a particular user interest. Even if some similar documents are not sequenced till the present moment, there are documents related to the persistent user interests and re-searches on these interests are likely to occur. In these cases a user either clicks on the links he/she found before or on the links leading to the documents highly similar to those found before.
9
Assumptions (1) Relevance Feedback Feedback – sequentially-incoming uniform data S with subsequences of n (more than one) or more highly similar items linked through by a particular information need S = S 1 S 2 …S n … and can be considered potentially new semantic clusters (concepts).
10
Assumptions (2) Relevance Feedback Items not coming in high-similarity subsequences are still considered as potentially related to user interests, but since they are not much useful for profiles they are put into a candidate pool to be retrieved and used for concept formation later when a subsequence of similar feedback data items is observed.
11
Assumptions User Study. Assumption 1: subsequence percentage 12 users, two weeks S th = 0.05S th = 0.1
12
Assumptions User Study. Assumption 2: percentage of re-accessed and all inter-similar documents S th =0.05S th =0.1S th =0.2 83%74%57%
13
H2S2D Algorithm (1) Online incremental Unsupervised Key features: a new cluster definition relies upon sequential characteristics of relevance feedback; assignment of an incoming data item is delayed if there is no similar enough cluster, and performed when such a cluster is created.
14
H2S2D Algorithm Evaluation Results (Reuters collection) H2S2DECM S th ItemsAccPF PF 0.05 5000.780.560.620.800.500.56 10000.830.500.580.810.470.49 20000.860.52 0.820.500.52 ………………… 70000.890.480.490.850.470.49 80000.890.490.500.860.470.50 0.1 5000.950.610.640.890.610.57 10000.900.640.600.850.520.45 20000.900.650.540.850.540.44 ………………… 70000.950.56 0.850.580.39 80000.950.57 0.860.600.38 0.2 5000.990.950.940.840.760.30 10000.910.930.890.850.730.30 20000.910.820.790.850.710.26 ………………… 70000.930.810.700.840.670.20 80000.930.810.700.830.680.19
15
H2S2D Algorithm Evaluation Results (Reuters collection) Items S th =0.05S th =0.1S th =0.2 H2S2DECMH2S2DECMH2S2DECM 50054810548 10006411131061 20009413141174 30009414161284 400011520212093 500013622 21102 6000146242322113 700014625 24117 8000146252724123 Number of clusters created after n items are processed
16
H2S2D Algorithm Evaluation Results (Reuters collection) Ratio of candidate items to the number of processed items
17
Role and Position of User Profile in BESS
18
Interest-change-driven Profile Construction Construction criteria: Recency Frequency Persistency
19
Profile Structure l – layer, k – concept number
20
Profile Construction Session Layer Latest created or updated concept from C a = {C a1, …, C an } of user a Recency
21
Profile Construction Short-term Layer m most frequently updated and used concepts, which are, in their turn, chosen from r most recent (top) concepts in the concept recency list. Recency and Frequency
22
Profile Construction Long-term Layer derived from the concepts of the short-term layer which were most frequently observed as the short-term layer’s components. Persistency.
23
Profile Construction An example (1) Short-term layer 5187634 Firefoxreal estatesoftware Japanese newsconferencemassagetravel customize square meterscomputercompanyworkshopTokyotour pluginsarea businessmobile Malta colored scrollbar politicsmultimedia ticket browsers financeIR hotel
24
Profile Construction An example (2) Long-term layer 5187634 Firefoxreal estatesoftware Japanese newsconferencemassagetravel customize square meterscomputercompanyworkshopTokyotour pluginsarea businessmobile Malta colored scrollbar politicsmultimedia ticket browsers financeIR hotel
25
Conclusions (1) * In order to implement user-centric services, knowledge about a user’s information need (IN) is needed. * IN is something that cannot be captured (at least with today’s advances in human sciences) * An attempt to obtain “fragmentary” context can be done to further facilitate a user’s activities
26
Conclusions (2) * We proposed a multi-layered user modelling approach to dynamically organise and update a user’s contextual information according to its volatility and persistency characteristics. * We proposed Similarity Sequence Data-Driven clustering algorithm for concept construction. In spite of its relative simplicity, the proposed H2S2D method has demonstrated reasonably good clustering results in terms of accuracy and precision and has proved to be suitable for fast real-time relevance feedback processing to guarantee always-updated concepts.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.