Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSA3212: Adaptive Hypertext Systems Dr. Christopher Staff Department of Intelligent Computer Systems University of Malta Topic 4: Hypertext.

Similar presentations


Presentation on theme: "CSA3212: Adaptive Hypertext Systems Dr. Christopher Staff Department of Intelligent Computer Systems University of Malta Topic 4: Hypertext."— Presentation transcript:

1 CSA3212: Adaptive Hypertext Systems Dr. Christopher Staff Department of Intelligent Computer Systems University of Malta Topic 4: Hypertext

2 2 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Aims and Objectives Hypertext timeline Hypertext issues and the Web Formal Models of hypertext Understanding hypertext What’s a “good” hypertext? Links and queries Implicit semantic information in the organisation of hyperspace Context

3 3 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Aims and Objectives Philosophies of Context Extracting useful information from documents in hyperspace

4 Part I: Hypertext Timeline

5 5 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Definition “[B]y adaptive hypermedia we mean all hypertext and hypermedia systems which reflect some features of the user in the user model and apply this model to adapt various visible aspects of the system to the user” Brusilovsky, P. (1996). Methods and techniques of adaptive hypermedia, in User Modeling and User-Adapted Interaction 6 (2-3), pp. 87-129. Available on-line at: http://www.contrib.andrew.cmu.edu/~plb/UMUAI.ps / UMUAI.pdfhttp://www.contrib.andrew.cmu.edu/~plb/UMUAI.ps

6 6 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Hypertext Timeline What is hypertext? “non-sequential writing--text that branches and allows choices to the reader” Ted Nelson, 1987. Literary Machines, Edition 87.1. “Hypertext is text which is not constrained to be linear. Hypertext is text which contains links to other texts.” http://www.w3.org/WhatIs.html http://www.w3.org/WhatIs.html

7 7 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Hypertext Timeline 1945: Vannevar Bush, Memex 1965: Ted Nelson, coins term “Hypertext” p84-nelson.pdf 1967: Ted Nelson, Xanadu 1967: Andy van Dam, HES and FRESS 1968: Doug Engelbart, NLS 1975: CMU: ZOG/KMS Balasubramian94state.pdf

8 8 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Hypertext Timeline 1985: Brown Uni, Intermedia 1986: Peter Brown, OWL/Guide 1987: Apple Inc., HyperCard 1987: ACM, First major Conf. on Hypertext 1990: NIST, Dexter, HAM, etc. 1991: Tim Berners-Lee, WWW 1993: NCSA, Mosaic

9 9 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Adaptive Hypertext Timeline 1990: HypAdapter, Böcker et al 1990: Lisp-Critic, Fischer et al 1992: Manuel Excel, de La Passardiere & Dufresne 1993: ITEM/PG: Brusilovsky et al 1993: HYPERFLEX: Kaplan et al 1994: 1st Workshop on AH UMUAI.pdf, brusilovsky2001.pdf

10 10 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Adaptive Hypertext Timeline 1994: MetaDoc, Boyle & Encarnacion 1994: Adaptive Hyperman, Mathé & Chen 1994: KN-AHS, Kobsa et al 1995: Webwatcher, Armstrong et al 1995: Letizia, Lieberman 1996: Special issue of UMUAI on AH 1996: Syskill & Webert, Pazzani et al

11 11 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Adaptive Hypertext Timeline 1996: Personal WebWatcher, Mladenic 1997: ELM-ART, Weber & Specht 1998: Interbook, Brusilovsky et al 1998: AHA!, De Bra & Calvi 1999: ART-Web, Weber 2000: First major conference on AH, Italy

12 12 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Current and Future Work I stopped Hypertext timeline at launch of the graphical Web, and AH timeline in 2000 with first major AH conference Much of the intervening and current hypertext and AH research is concerned with making the Web a better hypertext to use! Semantic Web and Adaptive Web

13 Part II: Hypertext Issues and the WWW

14 14 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Issues in Hypertext We’ve looked at some problems that we (and Douglas Adams) face when using Hypertext/IR Halasz wrote “Reflections on Notecards…” in 1987 It re-surfaces frequently at conferences on Hypertext Provoking much discussion and updating Halasz believed that “hypertext” would “disappear”, becoming an underlying mechanism for storing and linking information But hypertext is still very much “in our face”…

15 15 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Issues in Hypertext “Seven Issues” References: Reflections on NoteCards: seven issues for the next generation of hypermedia systems Frank,G. Halasz July 1988 Communications of the ACM, Volume 31, Issue 7 ACM Journal of Computer Documentation (JCD), Volume 25, Issue 3 (http://portal.acm.org/toc.cfm?id=507317&type=issue&coll=ACM&dl=A CM&CFID=14254782&CFTOKEN=22435962). Entire issue devoted to “Seven Issues”http://portal.acm.org/toc.cfm?id=507317&type=issue&coll=ACM&dl=A CM&CFID=14254782&CFTOKEN=22435962 Seven Issues, Revisited. Panel Session, Hypertext ‘02.

16 16 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff The Seven Issues Search and Query Composites Virtual Structures and dynamic information Computation Versioning Support for collaborative work Extensibility and Tailorability

17 17 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Issues in Hypertext Search and Query as part of the hypertext model! Current generation web has 3rd party search engines Semantic Web *may* be able to refer to objects via their content, rather than URL (or at least, do it seamlessly!)

18 18 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Issues in Hypertext Composites Web still doesn’t really support composites, though it can be achieved through dynamic HTML But watch out for the Deep and Dark Web!

19 19 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Issues in Hypertext Virtual structures and dynamic information So that the network can reconfigure itself according to the information it contains Self-repairing links, links which bind to the best destination when it becomes available Web approximates by redirecting to relocated information…

20 20 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Issues in Hypertext Computation The end of a link can be a computation The computation can decide what destination to visit, etc. Web can do… e.g.., search engines!

21 21 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Issues in Hypertext Versioning Shudder!!!! Some systems/editors provide versioning (e.g., SCCS for source code development) Web absolutely does not!

22 22 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Issues in Hypertext Support for collaborative work Web/internet is a collaborative place. We are sometimes aware of other people in this space Yet collaboration on, say, development of a web site is not possible within the Web (i.e., there is no explicit support for it). Web site updating is merely replace currently live page in Document directory No locking, etc., of files supported

23 23 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Issues in Hypertext Extensibility and tailorability The “programmable” Web Servers can be independently configured/extended Plug-ins increase support for doc types Web browsers can be configured for individual user, etc

24 24 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff WWW The WWW is the single largest example of a distributed hypertext system But is it a good example of a hypertext system? And does it really matter if it’s a good example?

25 25 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff WWW The WWW was not developed with a formal model in mind Based on the concept of a Uniform Document Identifier, HTTP, and a standard markup language (HTML) TCP/IP used as the transport protocol Link source is marked by tag, with an embedded destination Reference: Berners-Lee, T., et. al., 1994, “The World-Wide Web” in Communications of the ACM, Vol. 37, No. 8. August 1994.

26 26 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff WWW Simple model, yet powerful Can share documents across the globe Anyone can author a Web page With extensions to original model, can create pages dynamically Can manipulate multimedia data HTML still presentation markup language

27 27 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Semantic Web Next generation web attempts to overcome some of these problems Thing is, “fixes” are built on top of existing structure, rather than bottom-up re- modelling

28 28 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff WWW and Semantic Web What are the differences?

29 29 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Semantic Web WWWSemantic Web LinksUnidirectional, Unary Embedded in doc Bidirectional, n-ary Separated from document Authorship (link creation)Document ownerAnyone Dangling linksAllowed Search/Component resolution Not supportedIndirectly supported through, eg, UDDI Dynamic linksSupported “Aware” of surroundingsNode knows children only Yes, though link separation DynamicityProvided by external programs May be supported

30 30 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Semantic Web WWWSemantic Web Link semanticsNoYes, though as yet no standard Composite nodesMedia composition, Frames, HTML Objects As Web, rather than as DHRM Link maintenanceDifficultNot known yet Adding links to existing component NoNot known yet Overlapping link anchorsNoPossibly, but might be considered error Destination anchor pointDocument, offset (beginning, no end) As DHRM

31 31 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff So… does it matter? The (Semantic) Web will address some of the concerns in Seven Issues (but don’t forget about the other issues addressed by AHS!) SemWeb promises to become a knowledge base that may eventually remove the need for user navigation all together

32 Part III: Formal Models of Hypertext Dexter Hypertext Reference Model

33 33 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Aims and Objectives Adaptive Hypertext Systems are built using hypertext navigation support and information is inter-linked, hypertext style Most UASs are deployed over the Web, but the Web isn’t a particularly good example of a hypertext… So what are the properties and characteristics of “good” hypertexts?

34 34 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Aims and Objectives of Hypertext ‘Well, by “hypertext” I mean non-sequential writing--text that branches and allows choices to the reader, best read on an interactive screen’ Ted Nelson, 1987. Literary Machines, Edition 87.1. “Hypertext is text which is not constrained to be linear. Hypertext is text which contains links to other texts.” http://www.w3.org/WhatIs.html http://www.w3.org/WhatIs.html References: http://www.google.com/search?q=define:Hypertext

35 35 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Hypertext 1988 and beyond The WWW was first launched in 1991, but only gained popularity in 1993 The Hypertext community came to a head in 1990 to iron out many inconsistencies and incompatibilities in terminology Many models of hypertext were also proposed, based on petri-nets, sets, etc. The most popular model is based on graph theory

36 36 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Dexter Hypertext Reference Model Why a formal model? “The goal of the model is to provide a principled basis for comparing systems as well as for developing interchange and interoperability standards” [Halasz94] DHRM has been implemented as Amsterdam, CMIFed, AHAM, DeVise/WebVise, RHYTHM (Bologna)… References: Halasz, F. and Schwartz, M. 1994. The Dexter Hypertext Reference Model, in Communications of the ACM, 37(2), February, 1994, 30-39.

37 37 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff DHRM Fundamentals DHRM separates the representation of documents (nodes) from the linking of nodes and the navigation through hyperspace [Halasz94]

38 38 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff DHRM Fundamentals

39 39 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff DHRM Fundamentals Components DHRM components are the equivalent of nodes, and are represented in the Storage Layer Nodes were called frames, cards, documents, and articles Even today, on the Web a node is referred to as a document or more commonly, a page DHRM doesn’t really care about what happens within a component, only how the hypertext network interfaces with the component

40 40 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff DHRM Fundamentals Anchors References to locations or items within documents Components can be composites, hierarchical combinations of atomic components (DAG) Anchors can be the source or destination of links Anchors can be entire components, or spans (segments of a component)

41 41 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff DHRM Fundamentals More about anchors… Anchors have two parts Anchor ID (used by Storage Layer) Anchor Value (used by Within-Component Layer) The anchor value can be a region within a component, and the value is sensible only to the application responsible for editing/accessing the component Anchors are unique:

42 42 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff DHRM Fundamentals Links Links are represented in the Storage Layer Specify a source anchor, a destination anchor, and a direction that specifies how the link can be traversed Links can also be the destinations for other links Links, therefore, are totally separate from the components that contain them

43 43 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff DHRM Fundamentals Run-Time Layer A hypertext isn’t much good if you cannot manipulate it and navigate through it From the Run-Time Layer users can access, view, and manipulate the hypertext

44 44 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff DHRM Fundamentals All components (including links) have presentation specifications The Run-Time Layer can also impose presentation specifications on the accessed links and components to capture user preferences, for instance

45 45 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Referring to components Components have unique identifiers (UIDs) and component specifications Component specifications are essential, because a user may be able to describe a component without knowing its UID Components may be identified from their description using a resolver function, and then retrieved using an accessor function

46 46 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff DHRM Fundamentals [Halasz94]

47 47 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff DHRM Fundamentals More about links Links are first class objects Links are created by combining a component specification, anchor ID, direction, and presentation information into a specifier Direction can be FROM, TO, BIDIRECT, NONE A link is a sequence of two or more specifiers, at least one of which must be TO or BIDIRECT

48 48 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Conclusion Interesting features of DHRM Links are separate from documents containing them Anybody can be an author (link creator) Search capability is built into hypertext model (resolver function) Presentation specifications can change behaviour of component when displayed Links “know” their origin and destination Components can be composite Dangling links are not allowed (supposedly!)

49 49 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Conclusion DHRM was defined in 1990 Most existing hypertext systems were small scale, catering for individuals and small workgroups The Internet (using TCP/IP) had existed for 7 years The WWW did not yet exist…

50 50 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Conclusion DHRM has very few implementation examples The WWW, while not DHRM-conformant, is the single largest and most popular example of a distributed hypertext system There are general hypertext issues, which DHRM attempted to address The implementation of the WWW has led to other issues, which AHS attempt to address

51 51 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff WWW and DHRM What are the differences?

52 52 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff WWW and DHRM DHRMWWW LinksBidirectional, n-ary Separate from doc Unidirectional, Unary Embedded in doc Authorship (link creation)AnyoneDocument owner Dangling linksNot allowedAllowed Search/Component resolution Explicit supportNot supported Dynamic linksSupported “Aware” of surroundingsNode knows parents/children Node knows children only DynamicityBuilt into modelProvided by external programs

53 53 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff WWW and DHRM DHRMWWW Link semanticsPossible, through presentation specification No Composite nodesYes, but not implementedMedia composition, Frames, HTML Objects Link maintenanceYes, deleting component deletes dependencies Difficult Adding links to existing components YesNo Overlapping link anchorsSupportedNo Destination anchor pointDocument, span (beginning and end) Document, offset (beginning, no end)

54 Part IV: Understanding Hypertext

55 55 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Aims and Objectives Why the Web isn’t a good example of hypertext Links and Queries What the organisation of information can tell us Context

56 56 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Aims and Objectives HyperContext Topic Segmentation Link analysis Context and HyperContext Document Feature Extraction

57 57 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Hypertext A hypertext system is simply a collection of documents and links Usually, one or more human authors create content and decide when two documents should be linked Ted Nelson assumed that users would need assistance in navigating through Xanadu

58 58 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Hypertext DHRM also assumes that users may need assistance by making nodes searchable (resolver function) WWW assumes that users know URL of required document Search support provided by 3rd parties!

59 59 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Hypertext What about node content? Can the apparent content of a document change? DHRM Presentation specification Composite nodes Xanadu “Compound Windowing Documents” “Versioning by inclusion”

60 60 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Hypertext Web Through dynamic content 3rd party search engines cannot (easily) index dynamic web pages! (Dark/Deep Web)

61 61 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff WWW WWW is single largest, most popular hypertext It has inherent problems that make it a bad hypertext Next generation Semantic Web/Linked Data may address some problems

62 62 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff WWW Many user-adaptive systems assume that Web acts as delivery platform Much research to “patch” Web to support user-adaptive systems Link analysis, Queries and Links Context analysis Topic Segmentation/Document Classification & Clustering Information Extraction

63 63 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Why the Web is a bad hypertext See, for instance, http://ted.hyperland.com/buyin.txt http://ted.hyperland.com/buyin.txt The Problems of Hypertext (Literary Machines 3/8) Discuss

64 64 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff So, what is “good” hypertext? Nelson concerned with usability, system integrity (typed links), copyright issues... Nielson concerned with usability (‘lost in hyperspace’ problem, e.g., p296-nielsen.pdf ) Dexter (DHRM) more concerned with integrity of hypertext structure

65 65 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff So, what is “good” hypertext? Links separate from document ‘Manageable’ number of links per document or adaptive support Assist user with context/location/history No difference between author/user? Link integrity Typed links?...

66 66 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff “Patching” the Web URLs and Web links are “fixed” XPointer and XLink are meant to fix this Also see Open Distributed Hypertext, University of Southampton The creator of a link must have edit permissions on the document containing the link Need to separate links and documents

67 67 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff What is a link? Why do authors create links? Minimally, because there is some relationship between source and destination Frequently, to help users re-orient themselves Especially because a search engine will merely dump a user into a page And there are no standard mechanisms for finding out where you are

68 68 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff What is a link? But are those really the only reasons why links are created? Identify others...

69 69 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff What is a link? Textnet (Trigg, 1983) has dozens of link types He moved to Xerox Parc, where with Frank Halasz, he developed NoteCards Also had typed links, but studies showed that users didn’t assign types in case they assigned the “wrong” one Xanadu also supports link types Brusilovsky has identified several implicit link types in AHS

70 70 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Link analysis In the Web, it helps if we can understand the relationship between two linked documents

71 71 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Typical Link Types Trigg (TextNet) http://www.workpractice.com/trigg/thesis- chap4.html http://www.workpractice.com/trigg/thesis- chap4.html extracts “semantic content from text by making the relationships between nodes explicit... [using] typed links” “Normal” and “commentary” link types About 80 in all!

72 72 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Typical Link Types Brusilovsky (UMUAI.pdf) Describes implicit link types that are meaningful to adaptive systems Local non-contextual, contextual or real, index, table of content, map

73 73 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Typical Link Types Mizuuchi et al (p13-mizuuchi.pdf) Attempts to find ‘context paths’ for web pages Identifies (link patterns) intradirectory, downward, upward, sibling, intersite, (link roles) entrance, back, jump

74 74 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Hypertext organisation The organisation of documents in hyperspace can also help us recover semantic information about the relationship between documents Rather than looking at the relationship between just two documents, we investigate “clusters” of documents

75 75 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Hypertext organisation What sorts of semantic information can we recover from the organisation of information? DBMS Two simple examples PageRank Web Communities

76 76 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Google’s PageRank ‘Bringing order to the Web’ brin.pdf Takes advantage of implicit ‘citation’ link type Essentially counts number of inlinks Pages with high inlinks are important and can be prioritised in the results list PR(A) = (1-d) + d (PR(T1)/C(T1) +... + PR(Tn)/C(Tn))

77 77 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Web Communities Pages that have a high incidence of outlinks (hubs) can identify pages/sites that are similar/related If these pages also have high PageRank, then they are authoritative 4-2.pdf

78 78 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Topics and Context If all documents contained only one topic... and the meaning of statements in a document always meant the same thing...... life would be easy... but they don’t, they don’t, and it isn’t

79 79 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Topic Segmentation (Web) documents may contain information related to 1 or more topics In early hypertext systems, debate focused on how much info should be stored in nodes HyperCard, NoteCards, etc.: one topic per card, and only as much info as would fit onto screen KMS, DHRM, etc.: supported full freedom

80 80 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Topic Segmentation Not too much of a problem for human readers Although we may have to read through much before we encounter relevant info Web vs. DHRM: span-to document vs. span-to-span links Big problem for robots, and for adaptive user interfaces though!

81 81 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Topic Segmentation Approaches based loosely on passage-level retrieval HyperContext (simple) C99 TextTiling

82 82 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Topic Segmentation HyperContext Find the ‘context window’ around link in parent Using HTML tags (now can also use DOM) Construct vector of terms in context window Divide child into ‘context blocks’ and prepare weighted term vector for each (& hierarchy) Most similar context blocks and context windows belong to the same topic http://www.cs.um.edu.mt/~cstaff/HCT/thesis/hct6.pdf

83 83 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Topic Segmentation C99 Doesn’t require context provided by other documents Splits current document into topics using ‘topic shift detection’ algorithms based on similarity scores between sentences and the location of similar sentences in the text choi00advances.pdf

84 84 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Topic Segmentation TextTiling subdivide text into chunks of size w (w = 20) keep record of where and how often each stemmed term occurs compare similarity between adjacent blocks detect boundaries Assumes that same author will use same phraseology hearst94multiparagraph.pdf

85 85 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Context Choi and Hearst detect topics within a document, with no reference to others HyperContext uses “overlap” between a parent and child to determine which blocks are about same topic Different blocks in the child may be combined depending on the info in the parent’s window! Different users potentially interested in different topics depending on access path

86 86 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Context Many different types of “context” HyperContext: document access Discourse analysis McCarthy: context in which information exists/will be used Context of the accessor vs. context of ‘where things are’ (situation theory) vs. context in which task is to be performed Mizuuchi: Context path of Web pages Nelson: the Framing Problem

87 Part V: Philosophies of Context

88 88 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Philosophies of Context “The King of France wears a wig” What does it mean? Is it true? HCTCh6.pdf

89 89 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Philosophies of Context What is “context”? McCarthy won’t define it, though he describes what contexts do (McCarthy96.pdf) Context can only be spoken of in reference to its use (context97report.pdf)

90 90 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Philosophies of Context What is “context”? “Context is something surrounding an item and giving meaning to this item... context acts then on the relationships between items [rather] than on the items themselves” (context97report.pdf)

91 91 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Philosophies of Context What is “context”? “we will accept a very general notion of context as a collection of ‘things’ (parameters, assumptions, presuppositions,...) a representation depends on” (9701-07.ps)

92 92 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Philosophies of Context What is “context”? “a beliefs environment, a structure of nested belief-spaces for supporting the interpretation and production of natural language utterances (and other actions)” shelmrei.pdf

93 93 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Philosophies of Context What is “context”? “a context c [is] that subset of the complete state of an individual that is used for reasoning about a given goal” (9211-20.ps)

94 94 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Philosophies of Context What is “context”? “the explicit use of context limits the domain of validity of the acquired knowledge and indicates the correct moment of use.” (usekincontext.pdf)

95 95 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Philosophies of Context Pragmatic Context: Is a thing a thing because it has some innate thingness? Bar Hillel, Kaplan Cognitive Context: Is a thing a thing in the mind of the beholder? McCarthy, Sperber and Wilson, Kokinov... The Kuleshov Effect 9705-19.ps

96 96 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff The Kuleshov Effect “Around 1920, the great Russian filmmaker Lev Kuleshov took a close-up of an actor with a completely neutral expression on his face and intercut it with three different shots: a bowl of soup, a woman in a coffin, and a little girl playing. An audience praised the actor for his wonderful, subtle acting. The look of hunger! The grief for his dead wife! The love for his daughter!” Galen Fott, 2001, “Tempting Text: Creating Professional Titles”, http://www.macworld.com/2001/02/features/text/http://www.macworld.com/2001/02/features/text/ “the juxtaposition of images creat[es] the context for meaning” http://www.channel4000.com/sh/technology/techviews/digitalculture/national-technology-digitalculture-990923- 210120.html http://www.channel4000.com/sh/technology/techviews/digitalculture/national-technology-digitalculture-990923- 210120.html

97 97 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Who, about what? “Those who acquire it will cease to exercise their memory and become forgetful; they will rely on [it] to bring things to their remembrance by external signs instead of on their own resources… it shows great folly… to suppose that one can transmit or acquire clear and certain knowledge of an art through the medium …, or that … [they] can do more than remind the … [person] of what he knows on any given subject”

98 98 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff

99 Part VI: Information Extraction from Hypertext

100 100 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Document Feature Extraction We are looking at user-adaptive systems, and in particular at adaptive hypertext systems Somewhere along the line we need to know: what the user is interested in what about the document is of interest to the user

101 101 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Document Feature Extraction As a user browses, how might we tell what the user is interested in? What topics in the document might be of interest to the user?

102 102 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Document Feature Extraction If we can analyse the links, we might tell what sort of information the user hopes to find by following a link We can also build a “context” in which the user is seeking information, by pulling in relevant information that the user has seen while browsing

103 103 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Document Feature Extraction We can create indexes of combinations of context paths (or partial paths) so that they are searchable Can we automatically recreate queries to derive the user’s information seeking task?

104 104 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Document Feature Extraction Examples ‘Silk from a Sow’s Ear’ ParaSite HyperContext

105 105 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff ‘Silk from a Sow’s Ear’ To assist with visualization of and navigation through complex hyperspaces Annotating Web pages with functional type (node typing) Aggregating nodes into collections pirolli96silk.pdf

106 106 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff ‘Silk from a Sow’s Ear’ Node types: Head (Organisation Home Page, Personal Home Page), Index, Source Index (index that’s also a head), Reference (Destination Reference/Sink[inlinks, but no outlinks]), Content Represent the following as networks: Links between pages in a locality Similarity between linked pages User traffic flow across links

107 107 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff ParaSite Uses “link geometry” to find overlaps between topics, to support finding pages that have moved (or not yet indexed), finding related pages, and finding people paraSite.pdf

108 108 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff ParaSite Distinguishes between link types as “not all links are equally useful” Upward, downward, crosswise, outward (external) Finding pages related to some subset P indicated by a user Find pages R that point to a maximal subset of P and return to user pages that point to R

109 109 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff HyperContext We segment a document into topics in the context of the documents linking into it This forms the basis of a description of the document in context, or an interpretation A document’s interpretation is used to update a model of the user’s interests as a user navigates

110 110 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff HyperContext If we index the interpretations, then an information retrieval system can perform information retrieval-in-context, placing the user in the correct context to receive relevant information Can provide better results than “normal” IR, because potentially non-relevant, but rank- effecting information is not present in interpretation

111 111 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Context and the Web Examples of other approaches that use “context” to assist with search/navigation Mizuuchi et al Kim et al

112 112 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Context and the Web Web pages are written by authors who assume that they are read by humans, and that humans follow paths Dig out Web browsing behaviour info But Web is utilised in two main ways Directed from search engines Traversed by Web robots Mizuuchi (p13-mizuuchi.pdf)

113 113 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Context and the Web The information:document problem Information is not always contained in a document, but may be contained in a path Search engines (mainly) index documents as individual entities Two linked documents containing precise info will be ignored A single document may contain information about multiple topics Doc may have its rank effected

114 114 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Context and the Web We write Web pages assuming that readers have accessed them from other pages we have linked from But anybody can create link to page Web IR systems index individual pages

115 115 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Context and the Web Terms can be ambiguous: the vocabulary problem (p964-furnas.pdf, furnas85experience.pdf) How can the intended meaning of term in documents and queries be discovered? Context: the juxtaposition of terms in context

116 116 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Context and the Web (Weak) Assumption that (within a single topic) ambiguous terms used consistently An ambiguous term, once used in one particular word sense, will not be re-used in another Find the senses of unambiguous terms in a topic, and give ambiguous terms in the same topic segment the same sense Problem disambiguating query terms, cos so few p258-kim.pdf

117 117 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Context and the Web Alternatively... Vocabulary problem almost implies that documents containing two (or more) ambiguous terms describing the same concept will be sparse Can we take advantage of that to learn e.g., synonyms from queries or Web page parents?

118 118 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Conclusion The Web is very different from the ideas of what constitutes a “good” hypertext But adaptive techniques need to understand both the domain and the user We’ll look at Semantic Web in future lectures, but here we’ve seen how heuristics can help bring order to and allow systems to reason about the loosely structured space that is the Web

119 119 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Conclusion We covered some approaches to link and node typing taking advantage of “citation” links to boost relevance using Google’s PageRank and finding information that is “missing” from documents using their context path Topic segmentation to identify what might really be of interest to a user

120 120 of 120 chris.staff@um.edu.mt University of Malta CSA3212: Lecture 4 © 2005- Chris Staff Conclusion Finally, we looked at contextual information to see how we can take advantage of it to learn more accurately about the user, and to better direct the user to relevant information


Download ppt "CSA3212: Adaptive Hypertext Systems Dr. Christopher Staff Department of Intelligent Computer Systems University of Malta Topic 4: Hypertext."

Similar presentations


Ads by Google