AI on the Web, Part I CS592 Class Spring 2000.

AI on the Web, Part I CS592 Class Spring 2000

Part I: Personalized Prediction

UCI[*] [*] D.Billsus, M.Pazzani. A Hybrid User Model for News Story Classification. Proc. In UM99, Banff, Canada, June99. Intelligent agent compiles a daily news program for individual user (info retrieval) Architecture: How it works? Short-term vs. Long-term models for user modeling Time-coded feedback to increase prediction accuracy

NewsDude Architecture
- Go through an example here! - Retrieval Agent: Connect to news site on the Internet, download latest news stories and insert them into the local story cache (for queuing all stories that are waiting to receive a relevance score based on hybrid user model). - Recommender Agent: Takes stories out of the story cache, users current user model to computer a relevance score and inserts story into sorted recommendation queue. - User Interface: When user requests a news story, UI component takes top rated stories from recommendation queue and read it to user. - User provides feedback for updating current user model. - System maintains an ordered seq of news stories, inserts new stories into this seq as soon as they are evaluated and always presents the currently highest-ranked story to the user.

Learning Models short-term model (NN)
news threads for ongoing recent events long-term model (Naïve Bayes Classifier) general news preferences hybrid Use short-term model Use long-term model Assign default score Purpose of Short-term model: 1. Contain info about recently rated events to identify stories in same threads 2. Allow to identify stories that user already knows. Technology: NN (nearest neighbor) -It stories rated stories as training examples in memory. To classify a new, unlabeled instance, NN compares it to all stored ones given some similarity measure and determines the k nearest neighbors. Derive class label for new instance based on neighbors’ class labels. Similarity measure between 2 text docs: Stories are converted to IF-IDF vectors and use cosine similarity measure to quantify the similarity of 2 vectors (Salton). Adv. NN algo: Requires only one training example (a story with a new topic) for tracking a news thread. Purpose of Long-term model: Model user’s general prefs for news stories and compute predictions for stories that could not be classified by the short-term model. Technology: Naïve Bayesian Classifier. - Each story is rep as Boolean feature vectors where each feature indicates the presence or absence of a keyword in news story.

Time-coded feedback Use the amount of time a user has listened to a story as implicit feedback User’s direct binary feedback + Time-coded feedback = Fine-grained scale w/out extra burden on user (similar to Lieberman’s Letizia) pl = proportion of a story user has heard If story was rated as uninteresting: score = 0.3 * pl If story was rated as interesting: score = *pl If user asked for more information: score = 1.0 Future work in time-coded feedback: Modify text representation of a news story by reweighting the portion of the story to which user has actually listened. Example, user interrupts story after first 10 words read and decides to label the message as uninteresting. Should treat the first 10 words differently from the remaining test of the story. User reached a decision after first 10 words, we assume that one or more strong indicators for class membership are among these words. Since we rep news stories in TF-IDF vectors, it is possible to artificially modify those weights in order to assign more weight to words to which the user has listened. Show internet interface. Go through an example in Internet and explain what happened on Internet with architecture diagram.

NewsDude evaluation

NewsDude: Strengths and Limitations
Tracks user’s changing interests in real-time without sacrificing general interests Simple feedback but accurate prediction Rate enough before personalizing Not flexible: recalculate classifier if adding new keywords Similar systems: GroupLen (U.Minnesota)

Adaptive Web Sites (Mike Perkowitz and Oren Etzioni, IJCAI97)
Web sites that improve their organization and presentation by learning user access patterns. First, I will discuss the visual artifacts of the traditional wavelet de-noising. Then I will analyze the feature of the artifact. It much better than methods based on Fourier transformation(in which Gibb phenomena are global and of large amplitude), but still has potential to make impovement. For example, when haar wavelet is used, discontinity at middle of the signal will lead to no pseudo-Gibbs oscillations, but discontinuity at the binary irrational will lead to significant pseudo-Gibbs oscillations. Some examples

Issues in Adaptive Web Sites Design
Observation -- Discover User’s Interests Special user’s interests Group users’ interests All users’ interests Adjustment -- Adjust the original web-page design according to the observation Customization (by link, by keyword reordering) Optimization (by refining search results)

Adaptive Web Sites: Two Examples
WebWatcher Modify the original design by promotion, demotion and highlighting of links, and by linking web pages Constructed by Carnegie Mellon University PageGather Generate new web pages: Index Page Synthesis Constructed by University of Washington, Seattle

WebWatcher (T. Joachims, D. Freitag, T. Mitchell, IJCAI97)
A software agent acts as a tour guide for web visitor; Making suggestions on where to go next; Learning from information provided when users enter and exit the web page, and also learns user’s access patterns. Reorganizes web pages for user

WebWatcher: architecture
A User Requests WebWatcher commands Highlight advice Replaced URLs WebWatcher World Wide Web WebWatcher: a proxy agent between users and WWW

Learning from Previous Tours
Users provide keywords of interest before the tour starts; Those key words are added to the descriptions of every hyperlink this user follows; Interests and hyperlink descriptions are represented by high-dimensional feature vectors. Their elements are calculated by using TF-IDF heuristic; LinkQuality = Evaluation of probability that a user follow this hyperlink estimated as the average similarity of k highest ranked keywords associated with the hyperlink.

WebWatcher: TFIDF WebWatcher uses the TFIDF with cosine similarity measure to calculate the current user’s similarity to hyperlink description TFIDF calculates feature vector V as follows: Vi = Freq(Word i) * [ log2(n) - log2(DocFreq(Word i)) ] Freq(Word i) : the number of occurrences of Word i in this page DocFreq(Word i) : the number of pages Word i appears n is the total number of pages

WebWatcher: Reinforcement Learning
Reinforcement learning: learn control strategies that select optimal actions R(s): reward function at state s Q(s, a): the goodness of action a in state s Q(s,a) = R(s) +  * max { Q(s’, a’) | a’} s is the current state, s’ is the next state through a,  (0   < 1 ) : a discount factor that determines how severely to discount the value of rewards received further into the future.

Reinforcement Learning in WebWatcher
States correspond to Web pages Actions correspond to Hyperlinks R keyword (s): the TFIDF value of the keyword for page s Q keyword (s,a) will be learned as the sum of discounted TFIDF value of keyword over the optimal tour beginning with a. For every word w, WebWatcher uses a separate reward function R w(s) and learns a distinct Q w(s,a).

WebWatcher: An Example of state space
0.9 0.81 1 0.73 S 0.9 R=1 0.73 0.81 Initially, R =0 except destination R = 1 at destination web page  = 0.9

PageGather (Mike Perkowitz and Oren Etzioni, IJCAI97, 99)
Index Page Synthesis Instead of modifying the original web page design, PageGather create new index pages that contain collections of links related but currently unlinked pages. Based on cluster mining to find collections of related pages.

Cluster Mining: co-occurrence frequencies
For each pair of pages P1 and P2, compute: Pr(P1|P2) the probability of visiting P1 if P2 is visited Co-occurrence frequency between P1and P2 is the minimum of Pr(P1|P2) and Pr(P2|P1) Co-occurrence frequency is zero if these two pages are already linked. Compute a Similarity matrix Apply a threshold and set low similarities to zero

PageGather Algorithm Process the access log into visit data.
Compute the co-occurrence frequencies between pages and create a similarity matrix. Create the graph corresponding to the matrix, and find cliques (or connected components) in the graph. For each cluster found, create a web page consisting of links to the documents in the cluster.

Next Web Document Prediction
Papers by Albrecht, Zukerman and Nicholson “Predicting User’s Requests on the WWW”, UM99 “Pre-sending Documents on the WWW”, IJCAI99 Theme: use Markov Models to predict the next document requested, and pre-send it

Prediction Models Prediction models are of the form Assumptions
P(DR1, TR1 | previous requests) Assumptions distribution of the time for requesting a document is independent of the actual document the next document depends only on the previous document the time of the next request depends only on the time of the last request

Prediction Models (Cont...)
From these assumptions, we can derive P(DR1, TR1 | previous requests) = P(DR1 | previous documents) x P(TR1|TR) Need to estimate the value of each of the two terms in the above equation

Request Time Prediction

Document Prediction Four models are used for prediction
Time Markov Model Space Markov Model Second-order Time Markov Model Linked Space-Time Markov Model Graphical representations are used to represent each document prediction model

Document Prediction (Cont…)
If a document Di is request after an event Ei-1, then there is an arc between them For the Time Markov Model, Ei-1 is the last document reuqest (Di-1) For the Space Markov Model, Ei-1 is the referring document of Di

For the Second-order Time Markov Model, Ei-1 is a tuple which contains the last two documents requested For the Linked Space-Time Model, Ei-1 is a tuple that contains the last document requested and its referer

Each arc from event Ei-1 to Di has an associated weight w(Ei-1,Di) which is the frequency of an event-document pair across all training sessions The probability of the request is then

Hybrid Prediction Models
MaxHybrid Model Consults all the Markov prediction models and selects the one with the highest probability in its most likely prediction OrderedHybrid Model Orders the Markov models according to their performance: Linked, Second-order, Time, and Space. Selects the first one that can make a prediction

Hybrid Prediction Models(Cont…)
SpaceLinkedHybrid Model If the maximum prediction made by the Space Markov Model is > 0.77, then use its prediction. Otherwise, use those of the Linked Markov Model

Results The experimental data is 50 days of server log in the form of
{client,referer,requestedDoc,time,size} Prediction modesl were assessed in terms of the probability with which they predict the actual next request

Results (Cont…)

Pre-sending documents
IJCAI 99 Paper (same data set) including two costs cost of waiting for a documents (cost-per-second) cost of transmitting a document (cost-per-byte) Calculate the expected benefit using document probabilities: Expected-Benefit = Expected-Wait-Reduction - Expected-Total-Cost Result: Pre-sending with an 8-hr cache best!

AI on the Web, Part I CS592 Class Spring 2000.

Similar presentations

Presentation on theme: "AI on the Web, Part I CS592 Class Spring 2000."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

AI on the Web, Part I CS592 Class Spring 2000.

Similar presentations

Presentation on theme: "AI on the Web, Part I CS592 Class Spring 2000."— Presentation transcript:

Similar presentations

About project

Feedback