Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deriving Emergent Web Page Semantics D.V. Sreenath*, W.I. Grosky**, and F. Fotouhi* *Wayne State University **University of Michigan-Dearborn.

Similar presentations


Presentation on theme: "Deriving Emergent Web Page Semantics D.V. Sreenath*, W.I. Grosky**, and F. Fotouhi* *Wayne State University **University of Michigan-Dearborn."— Presentation transcript:

1 Deriving Emergent Web Page Semantics D.V. Sreenath*, W.I. Grosky**, and F. Fotouhi* *Wayne State University **University of Michigan-Dearborn

2 Semantics The semantics of a web page is potentially richer than can be defined by the page’s author(s) –Some semantics emerge through context –A multimedia document has multiple semantics through being placed in multiple contexts

3 Content-Based Retrieval Development of feature-based techniques for content-based retrieval is a mature area, at least for images CBR researchers should now concentrate on extracting semantics from multimedia documents so that retrievals using concept-based queries can be tailored to individual users –The semantic gap (Semi)-automated multimedia annotation

4 Multimedia Annotation(s) Multimedia annotations should be semantically rich –Multiple semantics This can be discovered by placing multimedia information in a natural, context-rich environment –A social theory based on how multimedia information is used

5 Context-Rich Environments Structural context – Author’s contribution –Document’s author places semantically similar pieces of information close to each other Dynamic context – User’s contribution –Short browsing sub-paths are semantically coherent

6 Context-Rich Environments The WEB is a perfect example of a context- rich environment Develop multimedia annotations through cross-modal techniques –Audio –Images –Text –Video

7 Goal Derive document semantics based on user browsing behavior –The same document has multiple semantics »Different people see different meanings in the same document –Over short browsing paths, an individual user’s wants and needs are uniform »The pages visited over these short paths exhibit semantics in congruence with these wants and needs

8 Questions How can the semantics of a web page be derived given a set of user browsing paths that end at that page? How can we characterize the semantics of a user browsing path? How can web page semantics help us in navigating the web more efficiently? How can our approach actually be implemented in the real web world?

9 Our Approach We use actual browsing paths to find the latent semantics of web pages –Textual features –Image features –Structural features We hope to find general concepts comprising various textual and image features which frequently co-occur

10 Semantic Coherence We believe that a user’s browsing path exhibits semantic coherence –While the user’s entire path exhibits multiple semantics, especially pages far from each other on the path, neighboring pages, especially the portions close to the links taken, are semantically close to each other

11 Semantic Break Points We would like to characterize the contiguous sub-paths of a user’s browsing path that exhibit similar semantics and detect the semantic break points along the path where the semantics appreciably change –Collect these sub-paths into a multiset

12 Web Page Semantics We categorize the semantics of each web page based on a history of the semantically-coherent browsing paths of all users which end at that page A browsing path will be represented by a high- dimensional vector The various positions of the vector correspond to the presence of –textual keywords –image features (visual keywords) –structural features (structural keywords)

13 Deriving Emergent Web Page Semantics From the complete set of web pages under consideration, we extract a set of textual, visual, and structural keywords For each multiset, M, of sub-paths that we are to analyze, we form three matrices –term-path matrix –image-path matrix –structure-path matrix

14 Deriving Emergent Web Page Semantics The (i,j) th element of these matrices are determined by –Strength of the presence of i th keyword along the j th browsing path »Determined by How many times this term occurs on the pages along the path How much time the user spends examining these pages How close each occurrence of the i th keyword is to both the outgoing and incoming anchor positions –How many times this browsing path occurs in M

15 Deriving Emergent Web Page Semantics These matrices may be concatenated together in various ways to produce an overall keyword-path matrix Perform latent-semantic analysis to get concepts A page is then represented by a set of concept classes

16 Architecture

17 Vantage Points

18 Local Iterative Technique

19 Bob Hope Path – Page 1

20 Bob Hope Path – Page 2

21 Bob Hope Path – Page 3

22 Bob Hope Path – Page 4

23 Bob Hope Path – Page 5

24 Bob Hope Path – Page 6

25 Bob Hope Path – Page 7

26 Bob Hope Path– Page 8

27 Bob Hope Path – Page 9

28 Vaudeville

29 Broadway

30 Radio

31 Troops

32 Experiment 1 – Paths/Paths Bob Hope Broadway Golf Movies Radio Troops Vaudeville

33 Experiment 2 – Paths/URLs Bob Hope Broadway Golf Movies Radio Troops Vaudeville

34 Experiment 3 – URLs/URLs Bob Hope Broadway Golf Movies Radio Troops Vaudeville

35 Issues Data capture – privacy issues Compute intensive SVD updating Dynamic content Constantly evolving websites


Download ppt "Deriving Emergent Web Page Semantics D.V. Sreenath*, W.I. Grosky**, and F. Fotouhi* *Wayne State University **University of Michigan-Dearborn."

Similar presentations


Ads by Google