Presentation is loading. Please wait.

Presentation is loading. Please wait.

Empirical Investigations of WWW Surfing Paths Jim Pitkow User Interface Research Xerox Palo Alto Research Center.

Similar presentations


Presentation on theme: "Empirical Investigations of WWW Surfing Paths Jim Pitkow User Interface Research Xerox Palo Alto Research Center."— Presentation transcript:

1 Empirical Investigations of WWW Surfing Paths Jim Pitkow User Interface Research Xerox Palo Alto Research Center

2 August 1999 Agenda A few characteristics of the Web and clicks Aggregate click models –user surfing behaviors –post-hoc hit prediction Individual click models –entropy –path prediction

3 August 1999 Web is big, Web is good 1 new server every 2 seconds 7.5 new pages per second

4 August 1999 Users, sessions, and clicks Current Internet Universe Estimate97.1 million Time spent/month7:28:16 Number of unique sites visited/month15 Page views/month313 Number of sessions/month16 Page views/session19 Time spent/session0:28:01 Time spent/site 0:31:27 Duration of a page viewed0:01:28 Source Nielsen//NetRatings February 1999

5 August 1999 Popularity of pages—Zipf Zipf Distribution: –frequency is inversely proportionate to rank Zipf Law: –slope equals minus one

6 August 1999 EnterExit Users enter a website at various pages and begin surfing Continuing surfers distribute themselves down various paths Surfers arrive at pages having traveled different paths After some number of page visits surfers leave the web site (a) (b) (c) (d) p 1 p 3 p 2

7 August 1999 Model of surfing V L = V L-1 +  L Where L is the number of clicks and is  L varies as independent and identically distributed Gaussian random variables Surfing proceeds until the perceived cost is larger than the discounted expected future value

8 August 1999 Random walk with a stopping threshold Two parameter inverse Gaussian distribution mean (L) =  and variance (L) =  3 /

9 August 1999 Experimental design Client data –Georgia Tech (3 weeks August, 1994) –Boston University (1995) tens of thousands of requests Proxy data –AOL (5 days in December, 1997) tens of millions of requests Server data –Xerox WWW Site (week during May 1997)

10 August 1999 Probability distribution function Clicks 1 click/site mode 3-4 clicks/site median 8-10 clicks/site mean

11 August 1999 Cumulative distribution function *** Experimental — inverse Gaussian 75% of the distribution accounted for in three clicks

12 August 1999 Two observations Inverse Gaussian distribution has very long tail –expect to see large deviations from average Due to asymmetric nature of the Inverse Gaussian distribution, typical behavior does not equal average behavior

13 August 1999 An interesting derivation Up to a constant given by the third term, the probability of finding a group surfing at a given level scales inversely in proportion to its depth

14 August 1999 Number of surfers at each level

15 August 1999 Implications of the Law of Surfing Implications on techniques designed to enhance performance –HTTP Keep-Alive, Pre-fetching of content Can adapt content based location on curve in a cost sensitive manner –different user modalities (browser, searcher, etc.) –expend different CPU resources for different users Web site visitation modeling

16 August 1999 Spreading Activation Pump activation into source Activation spreads through the network Activation settles into asymptotic pattern

17 August 1999 Matrix Formulation 0 10 0 123456123456 0 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 1 0 0 1 0 0 0 1 1 0 0 0 1 1 0 0 0 1 2 3 4 5 6 C R..5.7 4 3 1 A SourcesNetworkActivation Content + Usage + Topology Technique A (t) = C + M A (t - 1) M = (1 -  ) I +  R Networks User Paths Text Similarity Topology

18 August 1999 Application to hit prediction Let f L be the fraction of users who, having surfed along L-1 links, continue to surf to depth L. Define the activation value N i,L as the number of users who are at node i after surfing through L clicks

19 August 1999 Predicted versus observed

20 August 1999 Surfing probabilitie s by outlink density

21 August 1999 Investigating user paths Each user path can be thought of as an ngram and represented as tuples of the form to indicate sequences of page clicks –Distribution of ngrams is the Law of Surfing Determine the conditional probability of seeing the next page given a matching prior ngram

22 August 1999 Uncertainty and entropy Conditional probabilities are also know as as k th -order Markov approximations/models Entropy is the expected (average) uncertainty of the random variable measured in bits –minimal number of bits to encode information –uncertainty in the sequence of letters in languages

23 August 1999 Mathematics of entropy

24 August 1999 Conditional entropy Conditional probability Chaining rule Joint probabilities

25 August 1999 Entropy versus ngram length

26 August 1999 Path predictions  Pr(PPM) the probability that a penultimate path,, observed in the test data was matched by the same penultimate path in the training data  Pr(Hit|PPM) the probability that page x n is visited, given that, is the penultimate path and the highest probability conditional on that path is p(x n |x n-1,…x n-k )  Pr(Hit) = Pr(Hit|PPM)*Pr(PPM), the probability that the page visited in the test set is the one estimated from the training as the most likely to occur

27 August 1999 Pr(Path Match)

28 August 1999 Pr(Hit|Path Match)

29 August 1999 Pr(Hit)

30 August 1999 Principles of path prediction Paths are not completely modeled by 1 st order Markov approximations Path Specificity Principle: longer paths contain more predictive power than lower order paths Complexity Reduction: keep models as simple and as small as possible

31 August 1999 Agenda revisited A few characteristics of the Web and clicks Aggregate click models –user surfing behaviors –post-hoc hit prediction Individual click models –entropy –path prediction

32 August 1999 Areas of further investigation Advance the state of predictive modeling –Move from post-hoc to a-priori prediction of user interactions on the Web –Test hypothetical models of Web site usage Validate existing and new models on more representative data sets Understand new and emerging applications –streaming video and audio, mobile, etc.

33 August 1999 More information pitkow@parc.xerox.com http://www.parc.xerox.com/ istl/projects/uir/projects/Webology.html


Download ppt "Empirical Investigations of WWW Surfing Paths Jim Pitkow User Interface Research Xerox Palo Alto Research Center."

Similar presentations


Ads by Google