Download presentation
Presentation is loading. Please wait.
1
1 Information Management on the World-Wide Web Junghoo “John” Cho UCLA Computer Science
2
2 The Web and Information Galore
3
3 10 Years Ago Reading papers for research Reading papers for research –Stacks of papers –Long wait
4
4 With Web
5
5 Challenges (1) Information overload Information overload –Too much information, too little time
6
6 Information Overload “XML” to Google “XML” to Google –14 Million matching documents! “XML” to Amazon “XML” to Amazon –464 matching books! Which one to read? Which one to read?
7
7 Challenges (2) Hidden Web Hidden Web –Not indexed by Search Engines –“Hidden” from an average user –Browse every site manually? …
8
8 Challenges (3) Transience Transience
9
9 Challenges (4) Scattered & unstructured data Scattered & unstructured data –All Computer Science faculty members and graduate students in the US?
10
10 Projects In Our Group Web Archive Web Archive Hidden Web Integration Hidden Web Integration Page Ranking Algorithm Page Ranking Algorithm User Recommendation System User Recommendation System
11
11 User Recommendation System 464 books on XML 464 books on XML Which one to read? Which one to read? –The one that my colleagues and friends recommend?
12
12 Amazon’s Recommendation System 1 – 5 star rating by individual users 1 – 5 star rating by individual users Books can be sorted by “average user rating” Books can be sorted by “average user rating”
13
13 My Typical Scenario Sort books by their average user rating Sort books by their average user rating Browse top 20 books to decide what to read Browse top 20 books to decide what to read
14
14 Questions Is “5 star” by one user better than “4.9 star” by 100 users? Is “5 star” by one user better than “4.9 star” by 100 users? –Intuitively, I prefer 4.9 star by 100 users –More “reliable” rating How much can I trust the rating of a particular person? How much can I trust the rating of a particular person? –How do I know that the person’s rating is reliable
15
15 Our Approach “Inherent quality” or “rating” of a book “Inherent quality” or “rating” of a book –How many users recommend the book (i.e., give high rating) if all users have read the book? More user rating More information on the “quality” of the book More user rating More information on the “quality” of the book –An average user is likely to give high rating for a high-quality book
16
16 Probabilistic Rating Model How likely is the book of “4 star rating”? How likely is the book of “4 star rating”? –Rating probability distribution Book rating/quality Probability density
17
17 Update of Rating Probability As more users provide rating, we update our probability distribution As more users provide rating, we update our probability distribution Book rating/quality Probability density
18
18 Update of Rating Probability As more users provide rating, we update our probability distribution As more users provide rating, we update our probability distribution Book rating/quality Probability density After five-star rating by a user
19
19 Update of Rating Probability As more users provide rating, we update our probability distribution As more users provide rating, we update our probability distribution Book rating/quality Probability density After one-star rating by a user
20
20 Update of Rating Probability As more users provide rating, we update our probability distribution As more users provide rating, we update our probability distribution Book rating/quality Probability density After many ratings
21
21 Bayesian Inference Theory Given a user rating UR, what is the inherent rating IR? Given a user rating UR, what is the inherent rating IR? )( )()|( )|( URP IRP URP IRP Probability of book rating BEFORE user rating Probability of book rating AFTER user rating
22
22 User Model The characteristics of a user The characteristics of a user Sensitivity: Slope of the curve Sensitivity: Slope of the curve +1: good, –1 : bad, 0: not useful Good Bad Book quality User rating Book quality User rating
23
23 User Model The characteristics of a user The characteristics of a user Bias: Average “height” of the curve Bias: Average “height” of the curve Positive bias Negative bias Book quality User rating Book quality User rating
24
24 Iterative Model Refinement As more users rate a book, we get better estimates on book quality As more users rate a book, we get better estimates on book quality As we estimate a book quality better, we get better idea on a user’s sensitivity and bias As we estimate a book quality better, we get better idea on a user’s sensitivity and bias
25
25 Iterative Model Refinement User-provided Rating Book Rating Estimate User Characteristics
26
26 Final Recommendation Recommend the book with the highest expected rating Recommend the book with the highest expected rating
27
27 Initial Results Our system prefers a 4.9-star book by 100 people to a 5-star book by 1 user Our system prefers a 4.9-star book by 100 people to a 5-star book by 1 user If a user gives random ratings, the system ignores the user’s rating If a user gives random ratings, the system ignores the user’s rating More thorough evaluation on the way More thorough evaluation on the way
28
28 Other Projects Web Archive Web Archive Hidden Web Integration Hidden Web Integration Page Ranking Algorithm Page Ranking Algorithm
29
29 Ph.D. Students on the Projects Alex Ntoulas Rob Adams Victor Liu –In Dr Chu’s group
30
30 Thank You Questions? Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.