Download presentation
Presentation is loading. Please wait.
Published byJasmine Barrett Modified over 8 years ago
1
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 1 of 63 cstaff@cs.um.edu.mt CSA4080: Adaptive Hypertext Systems II Dr. Christopher Staff Department of Computer Science & AI University of Malta Topic 3: Hypertext
2
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 2 of 63 cstaff@cs.um.edu.mt Aims and Objectives DHRM revisited Why the Web isn’t a good example of hypertext Links and Queries What the organisation of information can tell us Context
3
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 3 of 63 cstaff@cs.um.edu.mt Aims and Objectives Hypertext and Robots HyperContext Topic Segmentation Link analysis Context and HyperContext Document Feature Extraction
4
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 4 of 63 cstaff@cs.um.edu.mt Hypertext A hypertext system is simply a collection of documents and links Usually, one or more human authors create content and decide when two documents should be linked Ted Nelson assumed that users would need assistance in navigating through Xanadu
5
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 5 of 63 cstaff@cs.um.edu.mt Hypertext DHRM also assumes that users may need assistance –by making nodes searchable (resolver function) WWW assumes that users know URL of required document –Search support provided by 3rd parties!
6
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 6 of 63 cstaff@cs.um.edu.mt Hypertext What about node content? Can the apparent content of a document change? –DHRM Presentation specification Composite nodes –Xanadu “Compound Windowing Documents” “Versioning by inclusion”
7
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 7 of 63 cstaff@cs.um.edu.mt Hypertext –Web Through dynamic content 3rd party search engines cannot (easily) index dynamic web pages! (Dark/Deep Web)
8
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 8 of 63 cstaff@cs.um.edu.mt WWW WWW is single largest, most popular hypertext It has inherent problems that make it a bad hypertext Next generation Semantic Web may address some problems
9
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 9 of 63 cstaff@cs.um.edu.mt WWW Many AHSs/user-adaptive systems assume that Web acts as delivery platform Much research to “patch” Web to support user-adaptive systems –Link analysis, Queries and Links –Context analysis –Topic Segmentation/Document Classification & Clustering –Information Extraction
10
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 10 of 63 cstaff@cs.um.edu.mt Why the Web is a bad hypertext See, for instance, http://ted.hyperland.com/buyin.txt http://ted.hyperland.com/buyin.txt The Problems of Hypertext (Literary Machines 3/8) Discuss
11
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 11 of 63 cstaff@cs.um.edu.mt So, what is “good” hypertext? Nelson concerned with usability, system integrity (typed links), copyright issues... Nielson concerned with usability (‘lost in hyperspace’ problem, e.g., p296-nielsen.pdf ) Dexter (DHRM) more concerned with integrity of hypertext structure
12
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 12 of 63 cstaff@cs.um.edu.mt So, what is “good” hypertext? Links separate from document ‘Manageable’ number of links per document or adaptive support Assist user with context/location/history No difference between author/user? Link integrity Typed links?...
13
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 13 of 63 cstaff@cs.um.edu.mt “Patching” the Web URLs and Web links are “fixed” –XPointer and XLink are meant to fix this –Also see Open Distributed Hypertext, University of Southampton The creator of a link must have edit permissions on the document containing the link –Need to separate links and documents
14
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 14 of 63 cstaff@cs.um.edu.mt What is a link? Why do authors create links? –Minimally, because there is some relationship between source and destination –Frequently, to help users re-orient themselves Especially because a search engine will merely dump a user into a page And there are no standard mechanisms for finding out where you are
15
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 15 of 63 cstaff@cs.um.edu.mt What is a link? But are those really the only reasons why links are created? –Identify others...
16
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 16 of 63 cstaff@cs.um.edu.mt What is a link? Textnet (Trigg, 1983) has dozens of link types He moved to Xerox Parc, where with Frank Halasz, he developed NoteCards Also had typed links, but studies showed that users didn’t assign types –in case they assigned the “wrong” one Xanadu also supports link types Brusilovsky has identified several implicit link types in AHS
17
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 17 of 63 cstaff@cs.um.edu.mt Link analysis In the Web, it helps if we can understand the relationship between two linked documents
18
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 18 of 63 cstaff@cs.um.edu.mt Typical Link Types Trigg (TextNet) –http://www.workpractice.com/trigg/thesis- chap4.htmlhttp://www.workpractice.com/trigg/thesis- chap4.html –extracts “semantic content from text by making the relationships between nodes explicit... [using] typed links” –“Normal” and “commentary” link types –About 80 in all!
19
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 19 of 63 cstaff@cs.um.edu.mt Typical Link Types Brusilovsky (UMUAI.pdf) –Describes implicit link types that are meaningful to adaptive systems –Local non-contextual, contextual or real, index, table of content, map
20
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 20 of 63 cstaff@cs.um.edu.mt Typical Link Types Mizuuchi et al (p13-mizuuchi.pdf) –Attempts to find ‘context paths’ for web pages –Identifies (link patterns) intradirectory, downward, upward, sibling, intersite, (link roles) entrance, back, jump
21
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 21 of 63 cstaff@cs.um.edu.mt Hypertext organisation The organisation of documents in hyperspace can also help us recover semantic information about the relationship between documents –Rather than looking at the relationship between just two documents, we investigate “clusters” of documents
22
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 22 of 63 cstaff@cs.um.edu.mt Hypertext organisation What sorts of semantic information can we recover from the organisation of information? –DBMS Two simple examples –PageRank –Web Communities
23
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 23 of 63 cstaff@cs.um.edu.mt Google’s PageRank ‘Bringing order to the Web’ brin.pdf Takes advantage of implicit ‘citation’ link type Essentially counts number of inlinks Pages with high inlinks are important and can be prioritised in the results list PR(A) = (1-d) + d (PR(T1)/C(T1) +... + PR(Tn)/C(Tn))
24
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 24 of 63 cstaff@cs.um.edu.mt Web Communities Pages that have a high incidence of outlinks (hubs) can identify pages/sites that are similar/related If these pages also have high PageRank, then they are authoritative 4-2.pdf
25
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 25 of 63 cstaff@cs.um.edu.mt Topics and Context If all documents contained only one topic... and the meaning of statements in a document always meant the same thing...... life would be easy... but they don’t, they don’t, and it isn’t
26
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 26 of 63 cstaff@cs.um.edu.mt Topic Segmentation (Web) documents may contain information related to 1 or more topics In early hypertext systems, debate focused on how much info should be stored in nodes –HyperCard, NoteCards, etc.: one topic per card, and only as much info as would fit onto screen –KMS, DHRM, etc.: supported full freedom
27
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 27 of 63 cstaff@cs.um.edu.mt Topic Segmentation Not too much of a problem for human readers –Although we may have to read through much before we encounter relevant info Web vs. DHRM: span-to document vs. span-to-span links Big problem for robots, and for adaptive user interfaces though!
28
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 28 of 63 cstaff@cs.um.edu.mt Topic Segmentation Approaches –based loosely on passage-level retrieval –HyperContext (simple) –C99 –TextTiling
29
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 29 of 63 cstaff@cs.um.edu.mt Topic Segmentation HyperContext –Find the ‘context window’ around link in parent Using HTML tags (now can also use DOM) –Construct weighted vector of terms in context window –Divide child into ‘context blocks’ and prepare weighted term vector for each (& hierarchy) –Most similar context blocks and context windows belong to the same topic
30
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 30 of 63 cstaff@cs.um.edu.mt Topic Segmentation C99 –Doesn’t require context provided by other documents –Splits current document into topics using ‘topic shift detection’ algorithms based on similarity scores between sentences and the location of similar sentences in the text choi00advances.pdf
31
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 31 of 63 cstaff@cs.um.edu.mt Topic Segmentation TextTiling (hearst94multiparagraph.pdf) –subdivide text into chunks of size w (w = 20) –keep record of where and how often each stemmed term occurs –compare similarity between adjacent blocks –detect boundaries Assumes that same author will use same phraseology
32
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 32 of 63 cstaff@cs.um.edu.mt Context Choi and Hearst detect topics within a document, with no reference to others HyperContext uses “overlap” between a parent and child to determine which blocks are about same topic –Different blocks in the child may be combined depending on the info in the parent’s window! –Different users potentially interested in different topics depending on access path
33
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 33 of 63 cstaff@cs.um.edu.mt Context Many different types of “context” –HyperContext: document access –Discourse analysis –McCarthy: context in which information exists/will be used –Context of the accessor vs. context of ‘where things are’ (situation theory) vs. context in which task is to be performed –Mizuuchi: Context path of Web pages –Nelson: the Framing Problem
34
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 34 of 63 cstaff@cs.um.edu.mt Philosophies of Context “The King of France wears a wig” –What does it mean? –Is it true? HCTCh6.pdf
35
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 35 of 63 cstaff@cs.um.edu.mt Philosophies of Context What is “context”? –McCarthy won’t define it, though he describes what contexts do (McCarthy96.pdf) –Context can only be spoken of in reference to its use (context97report.pdf)
36
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 36 of 63 cstaff@cs.um.edu.mt Philosophies of Context What is “context”? “Context is something surrounding an item and giving meaning to this item... context acts then on the relationships between items [rather] than on the items themselves” (context97report.pdf)
37
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 37 of 63 cstaff@cs.um.edu.mt Philosophies of Context What is “context”? “we will accept a very general notion of context as a collection of ‘things’ (parameters, assumptions, presuppositions,...) a representation depends on” (9701-07.ps)
38
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 38 of 63 cstaff@cs.um.edu.mt Philosophies of Context What is “context”? “a beliefs environment, a structure of nested belief-spaces for supporting the interpretation and production of natural language utterances (and other actions)” shelmrei.pdf
39
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 39 of 63 cstaff@cs.um.edu.mt Philosophies of Context What is “context”? “a context c [is] that subset of the complete state of an individual that is used for reasoning about a given goal” (9211-20.ps)
40
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 40 of 63 cstaff@cs.um.edu.mt Philosophies of Context What is “context”? “the explicit use of context limits the domain of validity of the acquired knowledge and indicates the correct moment of use.” (usekincontext.pdf)
41
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 41 of 63 cstaff@cs.um.edu.mt Philosophies of Context Pragmatic Context: Is a thing a thing because it has some innate thingness? –Bar Hillel, Kaplan Cognitive Context: Is a thing a thing in the mind of the beholder? –McCarthy, Sperber and Wilson, Kokinov... The Kuleshov Effect 9705-19.ps
42
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 42 of 63 cstaff@cs.um.edu.mt The Kuleshov Effect “Around 1920, the great Russian filmmaker Lev Kuleshov took a close-up of an actor with a completely neutral expression on his face and intercut it with three different shots: a bowl of soup, a woman in a coffin, and a little girl playing. An audience praised the actor for his wonderful, subtle acting. The look of hunger! The grief for his dead wife! The love for his daughter!” Galen Fott, 2001, “Tempting Text: Creating Professional Titles”, http://www.macworld.com/2001/02/features/text/http://www.macworld.com/2001/02/features/text/ “the juxtaposition of images creat[ed] the context for meaning” http://www.channel4000.com/sh/technology/techviews/digitalculture/national-technology- digitalculture-990923-210120.html http://www.channel4000.com/sh/technology/techviews/digitalculture/national-technology- digitalculture-990923-210120.html
43
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 43 of 63 cstaff@cs.um.edu.mt Document Feature Extraction We are looking at user-adaptive systems, and in particular at adaptive hypertext systems Somewhere along the line we need to know: –what the user is interested in –what about the document is of interest to the user
44
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 44 of 63 cstaff@cs.um.edu.mt Document Feature Extraction As a user browses, how might we tell what the user is interested in? What topics in the document might be of interest to the user?
45
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 45 of 63 cstaff@cs.um.edu.mt Document Feature Extraction If we can analyse the links, we might tell what sort of information the user hopes to find by following a link We can also build a “context” in which the user is seeking information, by pulling in relevant information that the user has seen while browsing
46
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 46 of 63 cstaff@cs.um.edu.mt Document Feature Extraction We can create indexes of combinations of context paths (or partial paths) so that they are searchable Can we automatically recreate queries to derive the user’s information seeking task?
47
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 47 of 63 cstaff@cs.um.edu.mt Document Feature Extraction Examples –‘Silk from a Sow’s Ear’ –ParaSite –HyperContext
48
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 48 of 63 cstaff@cs.um.edu.mt ‘Silk from a Sow’s Ear’ To assist with visualization of and navigation through complex hyperspaces Annotating Web pages with functional type (node typing) Aggregating nodes into collections pirolli96silk.pdf
49
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 49 of 63 cstaff@cs.um.edu.mt ‘Silk from a Sow’s Ear’ Node types: –Index, Source Index, Reference, (Destination Reference/Sink), Head (Organisation Home Page, Personal Home Page), Content Represent the following as networks: –Links between pages in a locality –Similarity between linked pages –User traffic flow across links
50
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 50 of 63 cstaff@cs.um.edu.mt ParaSite Uses “link geometry” to find overlaps between topics paraSite.pdf
51
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 51 of 63 cstaff@cs.um.edu.mt ParaSite Distinguishes between link types as “not all links are equally useful” –Upward, downward, crosswise, outward (external) Finding pages related to some subset P indicated by a user –Find pages R that point to a maximal subset of P and return to user pages that point to R
52
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 52 of 63 cstaff@cs.um.edu.mt HyperContext We segment a document into topics in the context of the documents linking into it This forms the basis of a description of the document in context, or an interpretation A document’s interpretation is used to update a model of the user’s interests as a user navigates
53
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 53 of 63 cstaff@cs.um.edu.mt HyperContext If we index the interpretations, then an information retrieval system can perform information retrieval-in-context, placing the user in the correct context to receive relevant information Can provide better results than “normal” IR, because potentially non-relevant, but rank- effecting information is not present in interpretation
54
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 54 of 63 cstaff@cs.um.edu.mt Context and the Web Examples of other approaches that use “context” to assist with search/navigation –Mizuuchi et al –Kim et al
55
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 55 of 63 cstaff@cs.um.edu.mt Context and the Web Web pages are written by authors who assume that they are read by humans, and that humans follow paths –Dig out Web browsing behaviour info But Web is utilised in two main ways –Directed from search engines –Traversed by Web robots Mizuuchi (p13-mizuuchi.pdf)
56
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 56 of 63 cstaff@cs.um.edu.mt Context and the Web The information:document problem Information is not always contained in a document, but may be contained in a path –Search engines (mainly) index documents as individual entities Two linked documents containing precise info will be ignored A single document may contain information about multiple topics –Doc may have its rank effected
57
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 57 of 63 cstaff@cs.um.edu.mt Context and the Web We write Web pages assuming that readers have accessed them from other pages we have linked from But anybody can create link to page Web IR systems index individual pages
58
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 58 of 63 cstaff@cs.um.edu.mt Context and the Web Terms can be ambiguous: the vocabulary problem (p964-furnas.pdf, furnas85experience.pdf) How can the intended meaning of term in documents and queries be discovered? –Context: the juxtaposition of terms in context
59
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 59 of 63 cstaff@cs.um.edu.mt Context and the Web (Weak) Assumption that (within a single topic) ambiguous terms used consistently –An ambiguous term, once used in one particular word sense, will not be re-used in another Find the senses of unambiguous terms in a topic, and give ambiguous terms in the same topic segment the same sense Problem disambiguating query terms, cos so few p258-kim.pdf
60
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 60 of 63 cstaff@cs.um.edu.mt Context and the Web Alternatively... Vocabulary problem almost implies that documents containing two (or more) ambiguous terms describing the same concept will be sparse Can we take advantage of that to learn e.g., synonyms from queries or Web page parents?
61
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 61 of 63 cstaff@cs.um.edu.mt Conclusion The Web is very different from the ideas of what constitutes a “good” hypertext But adaptive techniques need to understand both the domain and the user We’ll look at Semantic Web in future lectures, but here we’ve seen how heuristics can help bring order to and allow systems to reason about the loosely structured space that is the Web
62
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 62 of 63 cstaff@cs.um.edu.mt Conclusion We covered some approaches to link and node typing –taking advantage of “citation” links to boost relevance using Google’s PageRank – and finding information that is “missing” from documents using their context path Topic segmentation to identify what might really be of interest to a user
63
University of Malta CSA4080: Topic 3 © 2004- Chris Staff 63 of 63 cstaff@cs.um.edu.mt Conclusion Finally, we looked at contextual information to see how we can take advantage of it to learn more accurately about the user, and to better direct the user to relevant information
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.