Presentation is loading. Please wait.

Presentation is loading. Please wait.

„IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,

Similar presentations


Presentation on theme: "„IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,"— Presentation transcript:

1 „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc Spaniol, Gerhard Weikum

2 September 25, 2008Dagstuhl Perspectives Workshop Web 2.0 Social Tagging Networks Definition: Social Tagging Network Website where people publish + tag information review + rate information publish their interests maintain network of friends interact with friends Common examples: Flickr (images) YouTube (videos) del.icio.us (bookmarks) Librarything (books) Discogs (CDs) CiteULike (papers) Facebook Myspace (media)

3 Part 1: Search in Social Tagging Networks (long)

4 September 25, 2008Dagstuhl Perspectives Workshop Web 2.0 Some Statistics Flickr: (as of Nov 2007) 2+ billion photos Facebook: (as of Apr 2007) 1.8 billion photos 31 million active users 100,000 new users per day Myspace: (as of Apr 2007) 135 million users (6th largest country on Earth) 2+ billion images (150,000 req/s), millions added daily 25 million songs 60TB videos Huge volume of highly dynamic data

5 September 25, 2008Dagstuhl Perspectives Workshop Web 2.0 Showcase: librarything.com Ratings Tags Books Others

6 September 25, 2008Dagstuhl Perspectives Workshop Web 2.0 librarything.com: Social Interaction Explicit Friends Similar Users Comments

7 September 25, 2008Dagstuhl Perspectives Workshop Web 2.0 librarything.com: Tag Clouds

8 September 25, 2008Dagstuhl Perspectives Workshop Web 2.0 librarything.com: Search Search results independent of the querying user (and the social context)

9 September 25, 2008Dagstuhl Perspectives Workshop Web 2.0 Outline Introduction Modelling Social Tagging Networks –Graph Model –Different Information Needs Effective Query Scoring Efficient Query Evaluation Summary & Further Challenges

10 September 25, 2008Dagstuhl Perspectives Workshop Web 2.0 Social Network Model travel Norway travel China queueing theory USERSUSERS ITEMSITEMS TAGSTAGS

11 September 25, 2008Dagstuhl Perspectives Workshop Web 2.0 Social Network Model travel Norway travel China queueing theory USERSUSERS ITEMSITEMS TAGSTAGS

12 September 25, 2008Dagstuhl Perspectives Workshop Web 2.0 Social Network Model travel Norway travel China queueing theory USERSUSERS ITEMSITEMS TAGSTAGS travel trip vldb travel probability queues travel probability harry potter

13 September 25, 2008Dagstuhl Perspectives Workshop Web 2.0 Information Need 1: Global travel Norway travel China queueing theory USERSUSERS ITEMSITEMS TAGSTAGS travel trip vldb travel probability queues travel probability harry potter Tags by all users equally important

14 September 25, 2008Dagstuhl Perspectives Workshop Web 2.0 Information Need 2: Similar Users travel Norway travel China queueing theory USERSUSERS ITEMSITEMS TAGSTAGS travel trip vldb travel probability queues travel probability harry potter travel ? Tags by users with similar tags/items more important

15 September 25, 2008Dagstuhl Perspectives Workshop Web 2.0 Information Need 3: Trusted Friends travel Norway travel China queueing theory USERSUSERS ITEMSITEMS TAGSTAGS travel trip vldb travel probability queues travel probability harry potter probability ? Tags by closely related users more important

16 September 25, 2008Dagstuhl Perspectives Workshop Web 2.0 Wishlist for Social-Aware Social Search Search results depend on –Global popularity of items –Collection context of the querying user (books, tags) –Social context of the querying user (trusted friends) Automatic tag expansion (beyond synonyms) Scalable query processing Explanation of results (similar wishlist for social recommendations)

17 September 25, 2008Dagstuhl Perspectives Workshop Web 2.0 Fast Forward… Imagine a 20 minutes talk about quantified friendship measures, personalized scoring models, dynamic tag expansion, scalable query processing, … Essence: Context-aware personalized search Tags from closely related users are more important Different kinds of „relatedness“ possible [SIGIR 2008]

18 September 25, 2008Dagstuhl Perspectives Workshop Web 2.0 Experimental Evaluation: Effectiveness Systematic evaluation of result quality difficult Three possible setups: Manual queries + human assessments Queries+assessments derived from external info (ex: DMOZ categories) Automated assessments from context of user –Items tagged by friends –Items tagged in the future   ?

19 September 25, 2008Dagstuhl Perspectives Workshop Web 2.0 Prototype Implementation [SIGIR Demo 2008], [VLDB Demo 2008]

20 September 25, 2008Dagstuhl Perspectives Workshop Web 2.0 Preliminary User Study LibraryThing user study: [Data Engineering Bulletin, June 2008] 6 librarything users with reasonably large library and friend sets Overall 49 queries Crawled (part of) librarything: ~1,3 mio books, ~15 mio tags, ~12,000 users, ~18,000 friends Measured NDCG[10] 0.00.20.50.81.0 0.00.5460.5720.5680.565 0.20.5640.5720.5790.581- 0.50.5390.5520.559-- 0.80.5150.546--- 1.00.465---- (1-α)  (graph) (1-α) (content) Authors of the paper

21 September 25, 2008Dagstuhl Perspectives Workshop Web 2.0 We need a benchmark collection, but… Everybody „has“ data from Flickr, librarything Data contains private information by definition Data cannot be successfully anonymized (AOL) Data must not be anonymized (we need the users to assess results) Data must be large scale (a few volunteers are not enough) Collection must be completely offline available for stability of results (including images,…)

22 Part 2: Web Archiving (very short)

23 September 25, 2008Dagstuhl Perspectives Workshop Web 2.0 Online Information is Volatile Huge amount of information available online only today Easily lost (hardware failure, software failure, human failure, deletion, attack, …) Easily unaccessible (anybody knows Interleaf?) Easily manipulated How will historians learn about the 21th century? Strong need for long-term preservation of the evolving Web


Download ppt "„IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,"

Similar presentations


Ads by Google