1 Information Management on the World-Wide Web Junghoo “John” Cho UCLA Computer Science.

1 Information Management on the World-Wide Web Junghoo “John” Cho UCLA Computer Science

2 The Web and Information Galore

3 10 Years Ago Reading papers for research Reading papers for research –Stacks of papers –Long wait

4 With Web

5 Challenges (1) Information overload Information overload –Too much information, too little time

6 Information Overload “XML” to Google “XML” to Google –14 Million matching documents! “XML” to Amazon “XML” to Amazon –464 matching books! Which one to read? Which one to read?

7 Challenges (2) Hidden Web Hidden Web –Not indexed by Search Engines –“Hidden” from an average user –Browse every site manually? …

8 Challenges (3) Transience Transience

9 Challenges (4) Scattered & unstructured data Scattered & unstructured data –All Computer Science faculty members and graduate students in the US?

10 Projects In Our Group Web Archive Web Archive Hidden Web Integration Hidden Web Integration Page Ranking Algorithm Page Ranking Algorithm User Recommendation System User Recommendation System

11 User Recommendation System 464 books on XML 464 books on XML Which one to read? Which one to read? –The one that my colleagues and friends recommend?

12 Amazon’s Recommendation System 1 – 5 star rating by individual users 1 – 5 star rating by individual users Books can be sorted by “average user rating” Books can be sorted by “average user rating”

13 My Typical Scenario Sort books by their average user rating Sort books by their average user rating Browse top 20 books to decide what to read Browse top 20 books to decide what to read

14 Questions Is “5 star” by one user better than “4.9 star” by 100 users? Is “5 star” by one user better than “4.9 star” by 100 users? –Intuitively, I prefer 4.9 star by 100 users –More “reliable” rating How much can I trust the rating of a particular person? How much can I trust the rating of a particular person? –How do I know that the person’s rating is reliable

15 Our Approach “Inherent quality” or “rating” of a book “Inherent quality” or “rating” of a book –How many users recommend the book (i.e., give high rating) if all users have read the book? More user rating  More information on the “quality” of the book More user rating  More information on the “quality” of the book –An average user is likely to give high rating for a high-quality book

16 Probabilistic Rating Model How likely is the book of “4 star rating”? How likely is the book of “4 star rating”? –Rating probability distribution Book rating/quality Probability density

17 Update of Rating Probability As more users provide rating, we update our probability distribution As more users provide rating, we update our probability distribution Book rating/quality Probability density

18 Update of Rating Probability As more users provide rating, we update our probability distribution As more users provide rating, we update our probability distribution Book rating/quality Probability density After five-star rating by a user

19 Update of Rating Probability As more users provide rating, we update our probability distribution As more users provide rating, we update our probability distribution Book rating/quality Probability density After one-star rating by a user

20 Update of Rating Probability As more users provide rating, we update our probability distribution As more users provide rating, we update our probability distribution Book rating/quality Probability density After many ratings

21 Bayesian Inference Theory Given a user rating UR, what is the inherent rating IR? Given a user rating UR, what is the inherent rating IR? )( )()|( )|( URP IRP URP IRP  Probability of book rating BEFORE user rating Probability of book rating AFTER user rating

22 User Model The characteristics of a user The characteristics of a user Sensitivity: Slope of the curve Sensitivity: Slope of the curve +1: good, –1 : bad, 0: not useful Good Bad Book quality User rating Book quality User rating

23 User Model The characteristics of a user The characteristics of a user Bias: Average “height” of the curve Bias: Average “height” of the curve Positive bias Negative bias Book quality User rating Book quality User rating

24 Iterative Model Refinement As more users rate a book, we get better estimates on book quality As more users rate a book, we get better estimates on book quality As we estimate a book quality better, we get better idea on a user’s sensitivity and bias As we estimate a book quality better, we get better idea on a user’s sensitivity and bias

25 Iterative Model Refinement User-provided Rating Book Rating Estimate User Characteristics

26 Final Recommendation Recommend the book with the highest expected rating Recommend the book with the highest expected rating

27 Initial Results Our system prefers a 4.9-star book by 100 people to a 5-star book by 1 user Our system prefers a 4.9-star book by 100 people to a 5-star book by 1 user If a user gives random ratings, the system ignores the user’s rating If a user gives random ratings, the system ignores the user’s rating More thorough evaluation on the way More thorough evaluation on the way

28 Other Projects Web Archive Web Archive Hidden Web Integration Hidden Web Integration Page Ranking Algorithm Page Ranking Algorithm

29 Ph.D. Students on the Projects Alex Ntoulas Rob Adams Victor Liu –In Dr Chu’s group

30 Thank You Questions? Questions?

1 Information Management on the World-Wide Web Junghoo “John” Cho UCLA Computer Science.

Similar presentations

Presentation on theme: "1 Information Management on the World-Wide Web Junghoo “John” Cho UCLA Computer Science."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Information Management on the World-Wide Web Junghoo “John” Cho UCLA Computer Science.

Similar presentations

Presentation on theme: "1 Information Management on the World-Wide Web Junghoo “John” Cho UCLA Computer Science."— Presentation transcript:

Similar presentations

About project

Feedback