Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,

Slides:



Advertisements
Similar presentations
Testing Relational Database
Advertisements

Design, prototyping and construction
Using XSL and XQL For Efficient, Customised Access To Dictionary Information Kevin Jansz Department of Linguistics, University.
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Chapter 11 user support. Issues –different types of support at different times –implementation and presentation both important –all need careful design.
Kirrkirr: Software for browsing and visual exploration of a structured Warlpiri dictionary Kevin Jansz Department of Linguistics,
Copyright 2004 Monash University IMS5401 Web-based Systems Development Topic 2: Elements of the Web (g) Interactivity.
Project 1 Introduction to HTML.
Managing Data Resources
1 Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,
Kirrkirr: a Bidirectional Warlpiri- English Dictionary Kristen Parton.
Russell Taylor Lecturer in Computing & Business Studies.
What is adaptive web technology?  There is an increasingly large demand for software systems which are able to operate effectively in dynamic environments.
1 The World Wide Web. 2  Web Fundamentals  Pages are defined by the Hypertext Markup Language (HTML) and contain text, graphics, audio, video and software.
Developing a Basic Web Page with HTML
1st Project Introduction to HTML.
HTML 1 Introduction to HTML. 2 Objectives Describe the Internet and its associated key terms Describe the World Wide Web and its associated key terms.
Web Design Basic Concepts.
1 Networks and the Internet A network is a structure linking computers together for the purpose of sharing resources such as printers and files Users typically.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
How FACILITY CMIS and E-Portal are used within the organisation
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
CPS120: Introduction to Computer Science The World Wide Web Nell Dale John Lewis.
Copyright © 2012 Accenture All Rights Reserved.Copyright © 2012 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are.
Tutorial 1 Getting Started with Adobe Dreamweaver CS3
Applications Software. Applications software is designed to perform specific tasks. There are three main types of application software: Applications packages.
ITEC224 Database Programming
The Internet and the World Wide Web. The Internet A Network is a collection of computers and devices that are connected together. The Internet is a worldwide.
11.10 Human Computer Interface www. ICT-Teacher.com.
HTML, XHTML, and CSS Sixth Edition Chapter 1 Introduction to HTML, XHTML, and CSS.
Kirrkirr: Transforming the representation of lexical information Experiments with endangered language dictionaries Christopher Manning Computer Science.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Introduction to HTML Tutorial 1 eXtensible Markup Language (XML)
MULTIMEDIA DEFINITION OF MULTIMEDIA
XML and Digital Libraries M. Zubair Department of Computer Science Old Dominion University.
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney
User Support Chapter 8. Overview Assumption/IDEALLY: If a system is properly design, it should be completely of ease to use, thus user will require little.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Kirrkirr: A flexible and approachable software interface to indigenous dictionaries Christopher Manning & Kristen Parton Computer Science and Linguistics,
Kirrkirr: Software for the Flexible and Interactive Visualization of a Structured Warlpiri Dictionary Christopher Manning Computer Science and Linguistics,
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Elucidative Programming Kurt Nørmark Aalborg University Denmark SIGDOC September 2000.
Chap#11 What is User Support?
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS Instructor Ms. Arwa Binsaleh.
World Wide Web “WWW”, "Web" or "W3". World Wide Web “WWW”, "Web" or "W3"
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Tutorial 1 Developing a Basic Web Page. Objectives Learn the history of the Web and HTML Describe HTML standards and specifications Understand HTML elements.
HTML Concepts and Techniques Fifth Edition Chapter 1 Introduction to HTML.
ICT in Classroom Prepared by: Ymer LEKSI Kukes
Web Page Design 1 Information Technology ClassAct SRS enabled. Web Page Design This presentation will explore: creating web pages structure, formatting.
Chapter – 8 Software Tools.
Content Management Systems. Agenda Week overview Web-page basics The why and what of CMS Typo3.
Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,
© STZ Language Learning Media Telos Language Partner (TLP Pro) TLP Pro combines communication-oriented interactive self-study activities with intuitive.
Managing Data Resources File Organization and databases for business information systems.
Human Computer Interaction Lecture 21 User Support
The Client-Server Model
Human Computer Interaction Lecture 21,22 User Support
Week 12 Option 3: Database Design
Managing and Printing Documents
European Network of e-Lexicography
Chapter 1 Database Systems
Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,
Transforming the Representation of Lexical Knowledge
Planning and Storyboarding a Web Site
Chapter 1 Database Systems
Presentation transcript:

Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney, Australia Christopher Manning Computer Science and Linguistics, Stanford University, USA Nitin Indurkhya School of applied Science, Nanyang Technological University, Singapore

Project Objectives n providing innovative ways for representing a dictionary, through creative use of the medium of computers n providing practical educationally useful programs as a result (at low labour cost) n examining the richness of lexical structure Initial target: the Warlpiri dictionary.

Talk Outline n The research agendas n Kirrkirr: A Warlpiri dictionary browser n The (XML) Lexical Database –exploiting the strengths of XML –indexing XML data n Visualisation of Dictionary Content n User studies

Research Program: Lexicon n A language is more than individual words with a definition –it is a vast network of associations between words and within and across the concepts represented by words n The aim of this work is to provide people with a better understanding of this conceptual map. n Traditional paper dictionaries offer very limited ways for making such networks visible n On a computer, there are no such limitations to the way information can be displayed

Research: Computational Lexicography n Dictionaries on computers are now commonplace –But there has been little attempt to utilise the potential of the new medium –Many present a plain, search-oriented representation of the paper version n Goal: fun dictionary tools that are effective for language learning, browsing –Like flicking through pages of a paper dictionary –Difference is words are grouped by meaning rather than spelling

MRD Structure n The internal structures of current Machine Readable Dictionaries (MRDs) usually merely mimic the structure of the printed form (Boguraev 1990) n Some work, notably WordNet (Miller 1995) has involved a fundamental rethinking of dictionary content and organisation (in WordNet, organisation via “synsets” which are related via links of part, subkind, opposite) n But there has been little in the way of software to make such research truly usable by different communities of users.

Initial focus: Warlpiri n Warlpiri is an Australian Aboriginal language spoken in the Tanami desert (NW of Alice) n There are a number of factors influencing this choice: –Rich lexical materials have been collected by linguists over decades (Ken Hale, MIT, from 1950’s) resulting in one of the most comprehensive lexical databases for any Australian Language –There is a relatively large community of people interested in learning their traditional language –Until now, results haven’t been produced in a format usable by the community (only raw printouts)

Educational goals n Dictionary structure and usability are often dictated by professional linguists, while the needs of others (speakers, semi-speakers, young users, second language learners) are not met n The low level of literacy in the region makes an e- dictionary potentially more useful than a paper edition –less dependent on good knowledge of spelling and alphabetical order. –Making it fun and easy to use, and providing multimedia content and the pronunciations of words is a considerable help as well.

Target user community

Kirrkirr: A Warlpiri dictionary browser (Jansz 1998; Jansz, Manning and Indurkhya 1999) n An environment for the interactive exploration of dictionaries. n Although our current work has just been with Warlpiri, the design is general (Arrernte coming soon!) n Attempts to more fully utilise graphical interfaces, hypertext, multimedia, and different ways of indexing and accessing information n Written in Java, it can either be run over the web [high bandwidth] or run locally (here Java’s main advantage is cross-platform support).

Overview Kirrkirr provides various modules n Animated Graph layout of word relationships n Formatted dictionary entries n A notes facility for ‘jotting in the margin’ n Multimedia: audio, pictures n Advanced searching interfaces n others in planning: formatting (XSL) editing, figuration patterns, semantic domain browsing n These attempt to cater to users with different interests and competence levels

(Kirrkirr screen shot)

The lexical database n Original materials are stored in an ad hoc format of markup using backslash codes with some (rather odd) nesting of structural tags n These were converted to XML using an error- correcting stack-based parser (written in PERL). –The inconsistency and flexibility of dictionary entries actually made this a surprisingly difficult task. –But parser tries to impose data integrity n Use of XML gives a clear structure to the lexical data, and makes available many (free) tools

XML n XML separates the structure of the data from its presentation n Much of the recent enthusiasm for XML has centred around representing simple and rigid structures such as database records n Dictionary entries are thoroughly suited to XML –rich hierarchical structure –entires vary greatly depending on the word being defined n Result remains a portable, tangible text file

Alternative: a standard database n Has clear advantages: structure, indexing, query language, relationships, integrity. n Many people have suggested using a database for lexical data and some have actually done it (IITLEX, Austin and Nathan) n But in general lexicographers oppose the rigidity, and, in practice, standard relational databases are quite ill-suited to dictionaries –Dictionary entries vary enormously in structure –A Database model is inflexible to extending the dictionary structure –Lessens portability

Alternative: Object Databases n Dictionary can be viewed as a set of entries (objects) n Problems: off-the-shelf products not widely accepted –retrieval via customised query languages –Proprietary storage formats reduce portability –ObjectStore, Versant, Objectivity the main big vendors –Restricted API places limits on extensibility –Generic object browsers not suitable for dictionaries

XML database n Document Object Model widely accepted n XML document can be searched and accessed n XML tools such as XML Parsers, XSL processors are freely available and easy to use n Query languages on the way –XQL: a recent (and evolving) W3C proposal for querying XML documents

XQL - Potential n An alternative to investigate for the future is using a standard query language – such as XQL – to get material out of the XML dictionary, rather than using our ad hoc index. n At the moment not a huge issue since most retrieval is focussed on components of a particular word n XQL standard not stable yet n Very preliminary implementations from vendors

XML indexing - challenges n Despite the various XML parsers available, it is surprising that there has been little consideration in making single entries retrievable from the file n Present XML Parsers tend to put the entire XML document in memory (or its parsed tree form), before the data extraction process begins n This is not practical when parsing significant XML databases (e.g., the Warlpiri dictionary is approx. 10Mb).

XML Indexing - solutions n The hierarchical structure of XML lends itself to indexing, as each separate entry in the XML file can be considered as a separate entity n To make the Warlpiri dictionary usable for Kirrkirr an ad hoc indexing system was developed –Uses a slightly modified Ælfred XML parser –Entries are indexed by headword in a separate index file n The system returns an XML document object containing the single dictionary entry, facilitating: –processing for related words (Graph layout) –XSL processing to HTML

Kirrkirr Dictionary Browser headword  file position XML Formatted Warlpiri dictionary file Index in Memory XML Parser Across file system or web Kirrkirr’s XML Index Process XML Document Object

Kirrkirr Index Processing n The use of the XML indexing process considerably improves efficiency as only requested entries are parsed, hence conserving time and bandwidth n Once whole entries are parsed, they are kept temporarily in a cache n Thus Kirrkirr uses XML as a median between the structure and indexing of a relational database, with the freedom and functionality of text.

Visualisation of dictionary information n For dictionaries with simple textual content behind them, there is little that can be done but an on-line reflection of a printed page n We present much more than just definitions of words –we want to know their relationships to other words, and the patterning in these relationships n In a computational approach, the program can mediate between the lexical data and the user n The interface can select from and choose how to present information (according to the user’s preferences) – in many different ways

Previous work n Current systems present the search-dominated interface of classic Information Retrieval systems: you type a word in a search box n Results try to mimic, but are generally inferior to, the printed version of the dictionary n But these systems do little to utilise the captivating qualities of computers: interactivity, user control and adaptability (Brown 1985).

Previous work (2) n Search-oriented systems are only effective when user has a clearly specified information need – even here, we are ignoring the distinction between information gained and knowledge sought (Sharpe 1995) n Lack browsing, and chances for incidental or curiosity driven learning n We wish to exploit the essence of hypertext, which is “click to explore” browsing

Graph-based visualisation n There is a little previous work on graphical representations of dictionaries n For instance, the visual-thesaurus by plumbdesign derived from WordNet n But it is also a good demonstration of how chaotic and confusing graphical interfaces can become.

Perils of visualisation

Graph-based visualisation (Jansz 1998; Jansz, Manning and Indurkhya 1999) n Classic graph layout problem n Adapts work by Eades et al. (1998) and Huang et al. (1998) on visualisation and navigation of WWW document linkages n Uses the spring algorithm. Big advantage is that it is an iterative updating algorithm, and so gives an easy interactivity: –it wiggles and people can play with it. n Clarity and simplicity of graph: Software maintains a set of focus nodes to prevent overcrowding

Educational advantages n Alphabetical order is important, but n A web of words offers other effective opportunities for learning n A student can opportunistically explore words that are related in various ways n Important semantic relationships can be understood

Kirrkirr network display

Formatted dictionary entries n Are produced automatically from the XML by using XSL (via James Clark’s XT) n XSL allows easy modelling of some user preferences. n Most trivially, one can leave out information such as part of speech, or detailed definitions, which we do by providing several stylesheets to choose from n This is useful as many users find information overload quite confusing and demotivating n Can produce bilingual or monolingual dictionary n Opportunities for various output styles, and formats such as RTF or TeX for printing.

Formatted dictionary entries

Rich typology of link types n The semantically rich types of linkages present in a dictionary (synonym, antonym, hyponym, subheadword, variant, coverbs, …) solves one of the major problems of the web: we have many link types with a clear semantic interpretation n Use consistent colour-coded text and edges to show these link types n Gives a richer browsing experience n Unlike HTML, you can tell where you are going before clicking

Browsing n Work (at PARC and elsewhere: Pirolli et al. 1996) has stressed role for browsing as well as searching in information access n It provides a context for learning n We provide browsing in several ways: –network-based display of words –conventional hypertext but with rich semantically-interpreted links their colour-coding matches network edges n Other methods being investigated: –browsing through semantic domains –deriving terminology sets (words that are used together in culturally important activities) automatically from text corpora

Other components n Multimedia (currently pictures and audio) –Can hear pronunciations - enables better understanding than phonetic symbols –pictures of plants and animals are more intelligible than descriptions –(future: videos of Warlpiri sign language …) n Advanced search page –search various fields, regular expressions, fuzzy spelling etc. n Notes: one can annotate dictionary entries (to correct or personalise)

User study Mim Corris (Yuendumu, Willowra) Jane Simpson (Lajamanu) n User testing with primary and (lower) secondary students n Observation of trainee Warlpiri literacy workers n Comments from teachers, other adults etc. n Purely qualitative observational study of dictionary use. (Doing anything much else would be difficult.) n Initial reactions are very enthusiastic n Could use as a basis for classroom activities (better with some further development: games and puzzles)

A positive anecdote “One of the introductory Warlpiri literacy students, who had not been very interested in the literacy class, spent nearly 3/4 hour looking at Kirrkirr apparently in absorbed concentration. She wasn’t especially interested in the sound and picture possibilities. She moved between words, scrolling along the list, typing in the search, clicking on the words in the network pane. She wasn’t even put off when the dictionary definitions stopped appearing – looking at the networks of words instead. This is quite unlike her attitude to the backslash coded electronic dictionary (where she lost interest quickly because of the difficulty for her of narrowing down searches). After the Kirrkirr demo she asked if she could have a printed dictionary to take away with her to use in camp to learn the words. I interpret this as a desire to learn words in her own time and place.”

Conclusions n Kirrkirr is just a prototype of what one can do to develop new ways to visualise lexicons n We have addressed the challenge of making dictionary information usable in the creation of an application which mediates between well-structured data and users’ needs for searching/browsing and presentation n While we have focused our research on Warlpiri, the system can be easily applied to other languages

Conclusions (cont.) n “... The best future applications of MRDs in education will be those most able to respond to the insights and needs of their users” (Kegl 1995) n Kirrkirr can be seen as a step towards the future of e­ dictionaries

Links Kevin’s Thesis Homepage: Kirrkirr homepage:

Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney, Australia Christopher Manning Computer Science and Linguistics, Stanford University, USA Nitin Indurkhya School of applied Science, Nanyang Technological University, Singapore