CSA3212: Adaptive Hypertext Systems Dr. Christopher Staff Department of Intelligent Computer Systems University of Malta Topic 4: Hypertext.

Slides:



Advertisements
Similar presentations
Chapter 11 user support. Issues –different types of support at different times –implementation and presentation both important –all need careful design.
Advertisements

University of Malta CSA3080: Lecture 13 © Chris Staff 1 of 16 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
Making Links Fundamentals of Hypertext and Hypermedia Dr Nicholas Gibbins
HYPERMEDIA Chang-Yang Lin Eastern Kentucky University
XP Adding Hypertext Links to a Web Page. XP Objectives Create hypertext links between elements within a Web page Create hypertext links between Web pages.
Adaptive Hypermedia on the Web: Methods, Technology and Applications Paul De Bra Eindhoven University of Technology Eindhoven, The Netherlands Centrum.
World Wide Web1 Applications World Wide Web. 2 Introduction What is hypertext model? Use of hypertext in World Wide Web (WWW) – HTML. WWW client-server.
Adaptive Hypermedia Dr. Alexandra Cristea
© Copyright Eliyahu Brutman Programming Techniques Course.
Adaptive Hypermedia: What is it and why are we doing it? Dr. Alexandra Cristea
Hypertext Computer Science 01i Introduction to the Internet Neal Sample 6 February 2001.
Information Retrieval
REFLECTIONS ON NOTECARDS: SEVEN ISSUES FOR THE NEXT GENERATION OF HYPERMEDIA FRANK G. HALASZ.
CORE 2: Information systems and Databases HYPERTEXT/ HYPERMEDIA.
Overview of Search Engines
The Internet & The World Wide Web Notes
* The basic components of a web site are: * Content – information displayed or accepted from users * Static – content that doesn’t change for different.
BIS310: Week 7 BIS310: Structured Analysis and Design Data Modeling and Database Design.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Dobrin / Keller / Weisser : Technical Communication in the Twenty-First Century. © 2008 Pearson Education. Upper Saddle River, NJ, All Rights Reserved.
University of Malta CSA3080: Lecture 9 © Chris Staff 1 of 13 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
The WWW and HTML CMPT 281. Outline Hypertext The Internet The World-Wide-Web How the WWW works Web pages Markup HTML.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
Internet Basics Dr. Norm Friesen June 22, Questions What is the Internet? What is the Web? How are they different? How do they work? How do they.
Tutorial 1 Getting Started with Adobe Dreamweaver CS3
Internet Fundamentals Total Advantage MS Excel 97, Hutchinson, Coulthard, 1998 McGraw Introduction to HTML Chapter 7.
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Programming the Web Web = Computer Network + Hypertext.
CSA3212: User Adaptive Systems Dr. Christopher Staff Department of Computer Science & AI University of Malta Lecture 9: Intelligent Tutoring Systems.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Chapter 8 Introduction to HTML and Applets Fundamentals of Java.
University of Malta CSA3080: Lecture 7 © Chris Staff 1 of 18 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Lecture 6 Title: Web Planning, Designing, Developing for E-Marketing By: Mr Hashem Alaidaros MKT 445.
University of Malta CSA3080: Lecture 4 © Chris Staff 1 of 14 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
University of Malta CSA3080: Lecture 8 © Chris Staff 1 of 21 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
University of Malta CSA4080: Topic 3 © Chris Staff 1 of 63 CSA4080: Adaptive Hypertext Systems II Dr. Christopher Staff Department.
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 12 This presentation © 2004, MacAvon Media Productions Hypertext and Hypermedia.
ITGS Databases.
University of Malta CSA3080: Lecture 3 © Chris Staff 1 of 18 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
Introduction to Humanities Computing Geoffrey M. Rockwell x Togo Salmon 309A URL:
1 FollowMyLink Individual APT Presentation Third Talk February 2006.
University of Malta CSA4080: Topic 1 © Chris Staff 1 of 20 CSA4080: Adaptive Hypertext Systems II Dr. Christopher Staff Department.
HYPERTEXT and HYPERMEDIA By Steven Geist and Larnic Ransom.
World Wide Web “WWW”, "Web" or "W3". World Wide Web “WWW”, "Web" or "W3"
University of Malta CSA3080: Lecture 12 © Chris Staff 1 of 22 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Winter 2011SEG Chapter 11 Chapter 1 (Part 1) Review from previous courses Subject 1: The Software Development Process.
Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris.
Hypertext. Hypertext History (1) Many early attempts to organize human knowledge Many early attempts to organize human knowledge Thesaurus (Roget) Thesaurus.
Website Design, Development and Maintenance ONLY TAKE DOWN NOTES ON INDICATED SLIDES.
Peter Brusilovsky. Index What is adaptive navigation support? History behind adaptive navigation support Adaptation technologies that provide adaptive.
University of Malta CSA4080: Topic 7 © Chris Staff 1 of 15 CSA4080: Adaptive Hypertext Systems II Dr. Christopher Staff Department.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
1 FollowMyLink Individual APT Presentation First Talk February 2006.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
Expanding the Notion of Links DeRose, S.J. Expanding the Notion of Links. In Proceedings of Hypertext ‘89 (Nov. 5-8, Pittsburgh, PA). ACM, New York, 1989,
1 UNIT 13 The World Wide Web. Introduction 2 The World Wide Web: ▫ Commonly referred to as WWW or the Web. ▫ Is a service on the Internet. It consists.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Tutorial 1 Getting Started with Adobe Dreamweaver CS5.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
University of Malta CSA3080: Lecture 10 © Chris Staff 1 of 18 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
Objective % Select and utilize tools to design and develop websites.
Warm Handshake with Websites, Servers and Web Servers:
User-Adaptive Systems
Objective % Select and utilize tools to design and develop websites.
CSA3212: User Adaptive Systems
Planning and Storyboarding a Web Site
Presentation transcript:

CSA3212: Adaptive Hypertext Systems Dr. Christopher Staff Department of Intelligent Computer Systems University of Malta Topic 4: Hypertext

2 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Aims and Objectives Hypertext timeline Hypertext issues and the Web Formal Models of hypertext Understanding hypertext What’s a “good” hypertext? Links and queries Implicit semantic information in the organisation of hyperspace Context

3 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Aims and Objectives Philosophies of Context Extracting useful information from documents in hyperspace

Part I: Hypertext Timeline

5 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Definition “[B]y adaptive hypermedia we mean all hypertext and hypermedia systems which reflect some features of the user in the user model and apply this model to adapt various visible aspects of the system to the user” Brusilovsky, P. (1996). Methods and techniques of adaptive hypermedia, in User Modeling and User-Adapted Interaction 6 (2-3), pp Available on-line at: / UMUAI.pdfhttp://

6 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Hypertext Timeline What is hypertext? “non-sequential writing--text that branches and allows choices to the reader” Ted Nelson, Literary Machines, Edition “Hypertext is text which is not constrained to be linear. Hypertext is text which contains links to other texts.”

7 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Hypertext Timeline 1945: Vannevar Bush, Memex 1965: Ted Nelson, coins term “Hypertext” p84-nelson.pdf 1967: Ted Nelson, Xanadu 1967: Andy van Dam, HES and FRESS 1968: Doug Engelbart, NLS 1975: CMU: ZOG/KMS Balasubramian94state.pdf

8 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Hypertext Timeline 1985: Brown Uni, Intermedia 1986: Peter Brown, OWL/Guide 1987: Apple Inc., HyperCard 1987: ACM, First major Conf. on Hypertext 1990: NIST, Dexter, HAM, etc. 1991: Tim Berners-Lee, WWW 1993: NCSA, Mosaic

9 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Adaptive Hypertext Timeline 1990: HypAdapter, Böcker et al 1990: Lisp-Critic, Fischer et al 1992: Manuel Excel, de La Passardiere & Dufresne 1993: ITEM/PG: Brusilovsky et al 1993: HYPERFLEX: Kaplan et al 1994: 1st Workshop on AH UMUAI.pdf, brusilovsky2001.pdf

10 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Adaptive Hypertext Timeline 1994: MetaDoc, Boyle & Encarnacion 1994: Adaptive Hyperman, Mathé & Chen 1994: KN-AHS, Kobsa et al 1995: Webwatcher, Armstrong et al 1995: Letizia, Lieberman 1996: Special issue of UMUAI on AH 1996: Syskill & Webert, Pazzani et al

11 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Adaptive Hypertext Timeline 1996: Personal WebWatcher, Mladenic 1997: ELM-ART, Weber & Specht 1998: Interbook, Brusilovsky et al 1998: AHA!, De Bra & Calvi 1999: ART-Web, Weber 2000: First major conference on AH, Italy

12 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Current and Future Work I stopped Hypertext timeline at launch of the graphical Web, and AH timeline in 2000 with first major AH conference Much of the intervening and current hypertext and AH research is concerned with making the Web a better hypertext to use! Semantic Web and Adaptive Web

Part II: Hypertext Issues and the WWW

14 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Issues in Hypertext We’ve looked at some problems that we (and Douglas Adams) face when using Hypertext/IR Halasz wrote “Reflections on Notecards…” in 1987 It re-surfaces frequently at conferences on Hypertext Provoking much discussion and updating Halasz believed that “hypertext” would “disappear”, becoming an underlying mechanism for storing and linking information But hypertext is still very much “in our face”…

15 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Issues in Hypertext “Seven Issues” References: Reflections on NoteCards: seven issues for the next generation of hypermedia systems Frank,G. Halasz July 1988 Communications of the ACM, Volume 31, Issue 7 ACM Journal of Computer Documentation (JCD), Volume 25, Issue 3 ( CM&CFID= &CFTOKEN= ). Entire issue devoted to “Seven Issues” CM&CFID= &CFTOKEN= Seven Issues, Revisited. Panel Session, Hypertext ‘02.

16 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff The Seven Issues Search and Query Composites Virtual Structures and dynamic information Computation Versioning Support for collaborative work Extensibility and Tailorability

17 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Issues in Hypertext Search and Query as part of the hypertext model! Current generation web has 3rd party search engines Semantic Web *may* be able to refer to objects via their content, rather than URL (or at least, do it seamlessly!)

18 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Issues in Hypertext Composites Web still doesn’t really support composites, though it can be achieved through dynamic HTML But watch out for the Deep and Dark Web!

19 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Issues in Hypertext Virtual structures and dynamic information So that the network can reconfigure itself according to the information it contains Self-repairing links, links which bind to the best destination when it becomes available Web approximates by redirecting to relocated information…

20 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Issues in Hypertext Computation The end of a link can be a computation The computation can decide what destination to visit, etc. Web can do… e.g.., search engines!

21 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Issues in Hypertext Versioning Shudder!!!! Some systems/editors provide versioning (e.g., SCCS for source code development) Web absolutely does not!

22 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Issues in Hypertext Support for collaborative work Web/internet is a collaborative place. We are sometimes aware of other people in this space Yet collaboration on, say, development of a web site is not possible within the Web (i.e., there is no explicit support for it). Web site updating is merely replace currently live page in Document directory No locking, etc., of files supported

23 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Issues in Hypertext Extensibility and tailorability The “programmable” Web Servers can be independently configured/extended Plug-ins increase support for doc types Web browsers can be configured for individual user, etc

24 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff WWW The WWW is the single largest example of a distributed hypertext system But is it a good example of a hypertext system? And does it really matter if it’s a good example?

25 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff WWW The WWW was not developed with a formal model in mind Based on the concept of a Uniform Document Identifier, HTTP, and a standard markup language (HTML) TCP/IP used as the transport protocol Link source is marked by tag, with an embedded destination Reference: Berners-Lee, T., et. al., 1994, “The World-Wide Web” in Communications of the ACM, Vol. 37, No. 8. August 1994.

26 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff WWW Simple model, yet powerful Can share documents across the globe Anyone can author a Web page With extensions to original model, can create pages dynamically Can manipulate multimedia data HTML still presentation markup language

27 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Semantic Web Next generation web attempts to overcome some of these problems Thing is, “fixes” are built on top of existing structure, rather than bottom-up re- modelling

28 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff WWW and Semantic Web What are the differences?

29 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Semantic Web WWWSemantic Web LinksUnidirectional, Unary Embedded in doc Bidirectional, n-ary Separated from document Authorship (link creation)Document ownerAnyone Dangling linksAllowed Search/Component resolution Not supportedIndirectly supported through, eg, UDDI Dynamic linksSupported “Aware” of surroundingsNode knows children only Yes, though link separation DynamicityProvided by external programs May be supported

30 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Semantic Web WWWSemantic Web Link semanticsNoYes, though as yet no standard Composite nodesMedia composition, Frames, HTML Objects As Web, rather than as DHRM Link maintenanceDifficultNot known yet Adding links to existing component NoNot known yet Overlapping link anchorsNoPossibly, but might be considered error Destination anchor pointDocument, offset (beginning, no end) As DHRM

31 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff So… does it matter? The (Semantic) Web will address some of the concerns in Seven Issues (but don’t forget about the other issues addressed by AHS!) SemWeb promises to become a knowledge base that may eventually remove the need for user navigation all together

Part III: Formal Models of Hypertext Dexter Hypertext Reference Model

33 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Aims and Objectives Adaptive Hypertext Systems are built using hypertext navigation support and information is inter-linked, hypertext style Most UASs are deployed over the Web, but the Web isn’t a particularly good example of a hypertext… So what are the properties and characteristics of “good” hypertexts?

34 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Aims and Objectives of Hypertext ‘Well, by “hypertext” I mean non-sequential writing--text that branches and allows choices to the reader, best read on an interactive screen’ Ted Nelson, Literary Machines, Edition “Hypertext is text which is not constrained to be linear. Hypertext is text which contains links to other texts.” References:

35 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Hypertext 1988 and beyond The WWW was first launched in 1991, but only gained popularity in 1993 The Hypertext community came to a head in 1990 to iron out many inconsistencies and incompatibilities in terminology Many models of hypertext were also proposed, based on petri-nets, sets, etc. The most popular model is based on graph theory

36 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Dexter Hypertext Reference Model Why a formal model? “The goal of the model is to provide a principled basis for comparing systems as well as for developing interchange and interoperability standards” [Halasz94] DHRM has been implemented as Amsterdam, CMIFed, AHAM, DeVise/WebVise, RHYTHM (Bologna)… References: Halasz, F. and Schwartz, M The Dexter Hypertext Reference Model, in Communications of the ACM, 37(2), February, 1994,

37 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff DHRM Fundamentals DHRM separates the representation of documents (nodes) from the linking of nodes and the navigation through hyperspace [Halasz94]

38 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff DHRM Fundamentals

39 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff DHRM Fundamentals Components DHRM components are the equivalent of nodes, and are represented in the Storage Layer Nodes were called frames, cards, documents, and articles Even today, on the Web a node is referred to as a document or more commonly, a page DHRM doesn’t really care about what happens within a component, only how the hypertext network interfaces with the component

40 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff DHRM Fundamentals Anchors References to locations or items within documents Components can be composites, hierarchical combinations of atomic components (DAG) Anchors can be the source or destination of links Anchors can be entire components, or spans (segments of a component)

41 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff DHRM Fundamentals More about anchors… Anchors have two parts Anchor ID (used by Storage Layer) Anchor Value (used by Within-Component Layer) The anchor value can be a region within a component, and the value is sensible only to the application responsible for editing/accessing the component Anchors are unique:

42 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff DHRM Fundamentals Links Links are represented in the Storage Layer Specify a source anchor, a destination anchor, and a direction that specifies how the link can be traversed Links can also be the destinations for other links Links, therefore, are totally separate from the components that contain them

43 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff DHRM Fundamentals Run-Time Layer A hypertext isn’t much good if you cannot manipulate it and navigate through it From the Run-Time Layer users can access, view, and manipulate the hypertext

44 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff DHRM Fundamentals All components (including links) have presentation specifications The Run-Time Layer can also impose presentation specifications on the accessed links and components to capture user preferences, for instance

45 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Referring to components Components have unique identifiers (UIDs) and component specifications Component specifications are essential, because a user may be able to describe a component without knowing its UID Components may be identified from their description using a resolver function, and then retrieved using an accessor function

46 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff DHRM Fundamentals [Halasz94]

47 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff DHRM Fundamentals More about links Links are first class objects Links are created by combining a component specification, anchor ID, direction, and presentation information into a specifier Direction can be FROM, TO, BIDIRECT, NONE A link is a sequence of two or more specifiers, at least one of which must be TO or BIDIRECT

48 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Conclusion Interesting features of DHRM Links are separate from documents containing them Anybody can be an author (link creator) Search capability is built into hypertext model (resolver function) Presentation specifications can change behaviour of component when displayed Links “know” their origin and destination Components can be composite Dangling links are not allowed (supposedly!)

49 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Conclusion DHRM was defined in 1990 Most existing hypertext systems were small scale, catering for individuals and small workgroups The Internet (using TCP/IP) had existed for 7 years The WWW did not yet exist…

50 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Conclusion DHRM has very few implementation examples The WWW, while not DHRM-conformant, is the single largest and most popular example of a distributed hypertext system There are general hypertext issues, which DHRM attempted to address The implementation of the WWW has led to other issues, which AHS attempt to address

51 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff WWW and DHRM What are the differences?

52 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff WWW and DHRM DHRMWWW LinksBidirectional, n-ary Separate from doc Unidirectional, Unary Embedded in doc Authorship (link creation)AnyoneDocument owner Dangling linksNot allowedAllowed Search/Component resolution Explicit supportNot supported Dynamic linksSupported “Aware” of surroundingsNode knows parents/children Node knows children only DynamicityBuilt into modelProvided by external programs

53 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff WWW and DHRM DHRMWWW Link semanticsPossible, through presentation specification No Composite nodesYes, but not implementedMedia composition, Frames, HTML Objects Link maintenanceYes, deleting component deletes dependencies Difficult Adding links to existing components YesNo Overlapping link anchorsSupportedNo Destination anchor pointDocument, span (beginning and end) Document, offset (beginning, no end)

Part IV: Understanding Hypertext

55 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Aims and Objectives Why the Web isn’t a good example of hypertext Links and Queries What the organisation of information can tell us Context

56 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Aims and Objectives HyperContext Topic Segmentation Link analysis Context and HyperContext Document Feature Extraction

57 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Hypertext A hypertext system is simply a collection of documents and links Usually, one or more human authors create content and decide when two documents should be linked Ted Nelson assumed that users would need assistance in navigating through Xanadu

58 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Hypertext DHRM also assumes that users may need assistance by making nodes searchable (resolver function) WWW assumes that users know URL of required document Search support provided by 3rd parties!

59 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Hypertext What about node content? Can the apparent content of a document change? DHRM Presentation specification Composite nodes Xanadu “Compound Windowing Documents” “Versioning by inclusion”

60 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Hypertext Web Through dynamic content 3rd party search engines cannot (easily) index dynamic web pages! (Dark/Deep Web)

61 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff WWW WWW is single largest, most popular hypertext It has inherent problems that make it a bad hypertext Next generation Semantic Web/Linked Data may address some problems

62 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff WWW Many user-adaptive systems assume that Web acts as delivery platform Much research to “patch” Web to support user-adaptive systems Link analysis, Queries and Links Context analysis Topic Segmentation/Document Classification & Clustering Information Extraction

63 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Why the Web is a bad hypertext See, for instance, The Problems of Hypertext (Literary Machines 3/8) Discuss

64 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff So, what is “good” hypertext? Nelson concerned with usability, system integrity (typed links), copyright issues... Nielson concerned with usability (‘lost in hyperspace’ problem, e.g., p296-nielsen.pdf ) Dexter (DHRM) more concerned with integrity of hypertext structure

65 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff So, what is “good” hypertext? Links separate from document ‘Manageable’ number of links per document or adaptive support Assist user with context/location/history No difference between author/user? Link integrity Typed links?...

66 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff “Patching” the Web URLs and Web links are “fixed” XPointer and XLink are meant to fix this Also see Open Distributed Hypertext, University of Southampton The creator of a link must have edit permissions on the document containing the link Need to separate links and documents

67 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff What is a link? Why do authors create links? Minimally, because there is some relationship between source and destination Frequently, to help users re-orient themselves Especially because a search engine will merely dump a user into a page And there are no standard mechanisms for finding out where you are

68 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff What is a link? But are those really the only reasons why links are created? Identify others...

69 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff What is a link? Textnet (Trigg, 1983) has dozens of link types He moved to Xerox Parc, where with Frank Halasz, he developed NoteCards Also had typed links, but studies showed that users didn’t assign types in case they assigned the “wrong” one Xanadu also supports link types Brusilovsky has identified several implicit link types in AHS

70 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Link analysis In the Web, it helps if we can understand the relationship between two linked documents

71 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Typical Link Types Trigg (TextNet) chap4.html chap4.html extracts “semantic content from text by making the relationships between nodes explicit... [using] typed links” “Normal” and “commentary” link types About 80 in all!

72 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Typical Link Types Brusilovsky (UMUAI.pdf) Describes implicit link types that are meaningful to adaptive systems Local non-contextual, contextual or real, index, table of content, map

73 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Typical Link Types Mizuuchi et al (p13-mizuuchi.pdf) Attempts to find ‘context paths’ for web pages Identifies (link patterns) intradirectory, downward, upward, sibling, intersite, (link roles) entrance, back, jump

74 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Hypertext organisation The organisation of documents in hyperspace can also help us recover semantic information about the relationship between documents Rather than looking at the relationship between just two documents, we investigate “clusters” of documents

75 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Hypertext organisation What sorts of semantic information can we recover from the organisation of information? DBMS Two simple examples PageRank Web Communities

76 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Google’s PageRank ‘Bringing order to the Web’ brin.pdf Takes advantage of implicit ‘citation’ link type Essentially counts number of inlinks Pages with high inlinks are important and can be prioritised in the results list PR(A) = (1-d) + d (PR(T1)/C(T1) PR(Tn)/C(Tn))

77 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Web Communities Pages that have a high incidence of outlinks (hubs) can identify pages/sites that are similar/related If these pages also have high PageRank, then they are authoritative 4-2.pdf

78 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Topics and Context If all documents contained only one topic... and the meaning of statements in a document always meant the same thing life would be easy... but they don’t, they don’t, and it isn’t

79 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Topic Segmentation (Web) documents may contain information related to 1 or more topics In early hypertext systems, debate focused on how much info should be stored in nodes HyperCard, NoteCards, etc.: one topic per card, and only as much info as would fit onto screen KMS, DHRM, etc.: supported full freedom

80 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Topic Segmentation Not too much of a problem for human readers Although we may have to read through much before we encounter relevant info Web vs. DHRM: span-to document vs. span-to-span links Big problem for robots, and for adaptive user interfaces though!

81 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Topic Segmentation Approaches based loosely on passage-level retrieval HyperContext (simple) C99 TextTiling

82 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Topic Segmentation HyperContext Find the ‘context window’ around link in parent Using HTML tags (now can also use DOM) Construct vector of terms in context window Divide child into ‘context blocks’ and prepare weighted term vector for each (& hierarchy) Most similar context blocks and context windows belong to the same topic

83 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Topic Segmentation C99 Doesn’t require context provided by other documents Splits current document into topics using ‘topic shift detection’ algorithms based on similarity scores between sentences and the location of similar sentences in the text choi00advances.pdf

84 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Topic Segmentation TextTiling subdivide text into chunks of size w (w = 20) keep record of where and how often each stemmed term occurs compare similarity between adjacent blocks detect boundaries Assumes that same author will use same phraseology hearst94multiparagraph.pdf

85 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Context Choi and Hearst detect topics within a document, with no reference to others HyperContext uses “overlap” between a parent and child to determine which blocks are about same topic Different blocks in the child may be combined depending on the info in the parent’s window! Different users potentially interested in different topics depending on access path

86 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Context Many different types of “context” HyperContext: document access Discourse analysis McCarthy: context in which information exists/will be used Context of the accessor vs. context of ‘where things are’ (situation theory) vs. context in which task is to be performed Mizuuchi: Context path of Web pages Nelson: the Framing Problem

Part V: Philosophies of Context

88 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Philosophies of Context “The King of France wears a wig” What does it mean? Is it true? HCTCh6.pdf

89 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Philosophies of Context What is “context”? McCarthy won’t define it, though he describes what contexts do (McCarthy96.pdf) Context can only be spoken of in reference to its use (context97report.pdf)

90 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Philosophies of Context What is “context”? “Context is something surrounding an item and giving meaning to this item... context acts then on the relationships between items [rather] than on the items themselves” (context97report.pdf)

91 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Philosophies of Context What is “context”? “we will accept a very general notion of context as a collection of ‘things’ (parameters, assumptions, presuppositions,...) a representation depends on” ( ps)

92 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Philosophies of Context What is “context”? “a beliefs environment, a structure of nested belief-spaces for supporting the interpretation and production of natural language utterances (and other actions)” shelmrei.pdf

93 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Philosophies of Context What is “context”? “a context c [is] that subset of the complete state of an individual that is used for reasoning about a given goal” ( ps)

94 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Philosophies of Context What is “context”? “the explicit use of context limits the domain of validity of the acquired knowledge and indicates the correct moment of use.” (usekincontext.pdf)

95 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Philosophies of Context Pragmatic Context: Is a thing a thing because it has some innate thingness? Bar Hillel, Kaplan Cognitive Context: Is a thing a thing in the mind of the beholder? McCarthy, Sperber and Wilson, Kokinov... The Kuleshov Effect ps

96 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff The Kuleshov Effect “Around 1920, the great Russian filmmaker Lev Kuleshov took a close-up of an actor with a completely neutral expression on his face and intercut it with three different shots: a bowl of soup, a woman in a coffin, and a little girl playing. An audience praised the actor for his wonderful, subtle acting. The look of hunger! The grief for his dead wife! The love for his daughter!” Galen Fott, 2001, “Tempting Text: Creating Professional Titles”, “the juxtaposition of images creat[es] the context for meaning” html html

97 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Who, about what? “Those who acquire it will cease to exercise their memory and become forgetful; they will rely on [it] to bring things to their remembrance by external signs instead of on their own resources… it shows great folly… to suppose that one can transmit or acquire clear and certain knowledge of an art through the medium …, or that … [they] can do more than remind the … [person] of what he knows on any given subject”

98 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff

Part VI: Information Extraction from Hypertext

100 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Document Feature Extraction We are looking at user-adaptive systems, and in particular at adaptive hypertext systems Somewhere along the line we need to know: what the user is interested in what about the document is of interest to the user

101 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Document Feature Extraction As a user browses, how might we tell what the user is interested in? What topics in the document might be of interest to the user?

102 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Document Feature Extraction If we can analyse the links, we might tell what sort of information the user hopes to find by following a link We can also build a “context” in which the user is seeking information, by pulling in relevant information that the user has seen while browsing

103 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Document Feature Extraction We can create indexes of combinations of context paths (or partial paths) so that they are searchable Can we automatically recreate queries to derive the user’s information seeking task?

104 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Document Feature Extraction Examples ‘Silk from a Sow’s Ear’ ParaSite HyperContext

105 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff ‘Silk from a Sow’s Ear’ To assist with visualization of and navigation through complex hyperspaces Annotating Web pages with functional type (node typing) Aggregating nodes into collections pirolli96silk.pdf

106 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff ‘Silk from a Sow’s Ear’ Node types: Head (Organisation Home Page, Personal Home Page), Index, Source Index (index that’s also a head), Reference (Destination Reference/Sink[inlinks, but no outlinks]), Content Represent the following as networks: Links between pages in a locality Similarity between linked pages User traffic flow across links

107 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff ParaSite Uses “link geometry” to find overlaps between topics, to support finding pages that have moved (or not yet indexed), finding related pages, and finding people paraSite.pdf

108 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff ParaSite Distinguishes between link types as “not all links are equally useful” Upward, downward, crosswise, outward (external) Finding pages related to some subset P indicated by a user Find pages R that point to a maximal subset of P and return to user pages that point to R

109 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff HyperContext We segment a document into topics in the context of the documents linking into it This forms the basis of a description of the document in context, or an interpretation A document’s interpretation is used to update a model of the user’s interests as a user navigates

110 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff HyperContext If we index the interpretations, then an information retrieval system can perform information retrieval-in-context, placing the user in the correct context to receive relevant information Can provide better results than “normal” IR, because potentially non-relevant, but rank- effecting information is not present in interpretation

111 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Context and the Web Examples of other approaches that use “context” to assist with search/navigation Mizuuchi et al Kim et al

112 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Context and the Web Web pages are written by authors who assume that they are read by humans, and that humans follow paths Dig out Web browsing behaviour info But Web is utilised in two main ways Directed from search engines Traversed by Web robots Mizuuchi (p13-mizuuchi.pdf)

113 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Context and the Web The information:document problem Information is not always contained in a document, but may be contained in a path Search engines (mainly) index documents as individual entities Two linked documents containing precise info will be ignored A single document may contain information about multiple topics Doc may have its rank effected

114 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Context and the Web We write Web pages assuming that readers have accessed them from other pages we have linked from But anybody can create link to page Web IR systems index individual pages

115 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Context and the Web Terms can be ambiguous: the vocabulary problem (p964-furnas.pdf, furnas85experience.pdf) How can the intended meaning of term in documents and queries be discovered? Context: the juxtaposition of terms in context

116 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Context and the Web (Weak) Assumption that (within a single topic) ambiguous terms used consistently An ambiguous term, once used in one particular word sense, will not be re-used in another Find the senses of unambiguous terms in a topic, and give ambiguous terms in the same topic segment the same sense Problem disambiguating query terms, cos so few p258-kim.pdf

117 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Context and the Web Alternatively... Vocabulary problem almost implies that documents containing two (or more) ambiguous terms describing the same concept will be sparse Can we take advantage of that to learn e.g., synonyms from queries or Web page parents?

118 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Conclusion The Web is very different from the ideas of what constitutes a “good” hypertext But adaptive techniques need to understand both the domain and the user We’ll look at Semantic Web in future lectures, but here we’ve seen how heuristics can help bring order to and allow systems to reason about the loosely structured space that is the Web

119 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Conclusion We covered some approaches to link and node typing taking advantage of “citation” links to boost relevance using Google’s PageRank and finding information that is “missing” from documents using their context path Topic segmentation to identify what might really be of interest to a user

120 of 120 University of Malta CSA3212: Lecture 4 © Chris Staff Conclusion Finally, we looked at contextual information to see how we can take advantage of it to learn more accurately about the user, and to better direct the user to relevant information