Kirrkirr: Software for browsing and visual exploration of a structured Warlpiri dictionary Kevin Jansz Department of Linguistics,

Slides:



Advertisements
Similar presentations
Using XSL and XQL For Efficient, Customised Access To Dictionary Information Kevin Jansz Department of Linguistics, University.
Advertisements

DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
Project 1 Introduction to HTML.
Information Retrieval in Practice
Managing Data Resources
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
XML Prashant Karmarkar Brendan Nolan Alexander Roda.
Introduction to Web Application Architectures Web Application Architectures 18 th March 2005 Bogdan L. Vrusias
Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,
Tutorial 1 Developing a Basic Web Page
The C++ Tracing Tutor: Visualizing Computer Program Behavior for Beginning Programming Courses Rika Yoshii Alastair Milne Computer Science Department California.
Kirrkirr: a Bidirectional Warlpiri- English Dictionary Kristen Parton.
Chapter 14 The Second Component: The Database.
© Prentice Hall CHAPTER 3 Computer Software.
Chapter 1 Understanding the Web Design Environment
Developing a Basic Web Page with HTML
Mgt 240 Lecture Website Construction: Software and Language Alternatives March 29, 2005.
Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,
1st Project Introduction to HTML.
4.01B Authoring Languages and Web Authoring Software 4.01 Examine webpage development and design.
Overview of Search Engines
HTML 1 Introduction to HTML. 2 Objectives Describe the Internet and its associated key terms Describe the World Wide Web and its associated key terms.
Chapter ONE Introduction to HTML.
Web Design Basic Concepts.
CASE Tools And Their Effect On Software Quality Peter Geddis – pxg07u.
1 Introduction to Web Development. Web Basics The Web consists of computers on the Internet connected to each other in a specific way Used in all levels.
Class 6 Data and Business MIS 2000 Updated: September 2012.
Chapter 1 Variables in the Web Design Environment.
Chapter 1 Variables in the Web Design Environment
INTRODUCTION TO WEB DATABASE PROGRAMMING
1 Networks and the Internet A network is a structure linking computers together for the purpose of sharing resources such as printers and files Users typically.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
CPS120: Introduction to Computer Science The World Wide Web Nell Dale John Lewis.
Zhonghua Qu and Ovidiu Daescu December 24, 2009 University of Texas at Dallas.
Copyright © 2012 Accenture All Rights Reserved.Copyright © 2012 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are.
1 Chapter 2 & Chapter 4 §Browsers. 2 Terms §Software §Program §Application.
Applications Software. Applications software is designed to perform specific tasks. There are three main types of application software: Applications packages.
ITEC224 Database Programming
Components of Database Management System
HTML, XHTML, and CSS Sixth Edition Chapter 1 Introduction to HTML, XHTML, and CSS.
Kirrkirr: Transforming the representation of lexical information Experiments with endangered language dictionaries Christopher Manning Computer Science.
Introduction to HTML Tutorial 1 eXtensible Markup Language (XML)
1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Kirrkirr: A flexible and approachable software interface to indigenous dictionaries Christopher Manning & Kristen Parton Computer Science and Linguistics,
Kirrkirr: Software for the Flexible and Interactive Visualization of a Structured Warlpiri Dictionary Christopher Manning Computer Science and Linguistics,
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
1 MSCS 237 Overview of web technologies (A specific type of distributed systems)
Elucidative Programming Kurt Nørmark Aalborg University Denmark SIGDOC September 2000.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS Instructor Ms. Arwa Binsaleh.
Computing System Fundamentals 3.1 Language Translators.
C OMPUTING E SSENTIALS Timothy J. O’Leary Linda I. O’Leary Presentations by: Fred Bounds.
World Wide Web “WWW”, "Web" or "W3". World Wide Web “WWW”, "Web" or "W3"
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Tutorial 1 Developing a Basic Web Page. Objectives Learn the history of the Web and HTML Describe HTML standards and specifications Understand HTML elements.
4.01B Authoring Languages and Web Authoring Software 4.01 Examine webpage development and design.
HTML Concepts and Techniques Fifth Edition Chapter 1 Introduction to HTML.
Chapter – 8 Software Tools.
Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,
Jacob (Jack) Gryn - Presented November 28, Semi-Structured Data and XML.
Introduction. Internet Worldwide collection of computers and computer networks that link people to businesses, governmental agencies, educational institutions,
Project 1 Introduction to HTML.
The Client-Server Model
Chapter 1 Introduction to HTML.
Project 1 Introduction to HTML.
European Network of e-Lexicography
Lecture 1: Multi-tier Architecture Overview
Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,
Database Systems Instructor Name: Lecture-3.
Planning and Storyboarding a Web Site
Presentation transcript:

Kirrkirr: Software for browsing and visual exploration of a structured Warlpiri dictionary Kevin Jansz Department of Linguistics, University of Sydney, Australia Christopher Manning Departments of Computer Science and Linguistics, Stanford University, USA Nitin Indurkhya School of Applied Science, Nanyang Technological University, Singapore

Objectives n Provide innovative ways for representing a dictionary, through creative use of web technology n Provide practical, educationally useful access to information that can be customised to suit the needs of many users (at low labour cost) n Examine the richness of lexical structure Initial target: the Warlpiri dictionary.

Research Program: Lexicon n A language is more than individual words with a definition –it is a vast network of associations between words and within and across the concepts represented by words n Aim to provide people with a better understanding of this conceptual map. n Traditional paper dictionaries offer very limited ways for making such networks visible n There are no such limitations on a computer

Research: Computational Lexicography n Dictionaries on computers are now commonplace –Few utilise the potential of the new medium –Many present a plain, search-oriented representation of the paper version n Goal: fun dictionary tools that are effective for language learning, browsing –Like flicking through pages of a paper dictionary –Words are grouped by their meaning and their association with each other –Key to the effectiveness of this browsing is that the user has control over the way this is presented.

Initial focus: Warlpiri n Warlpiri is an Australian Aboriginal language spoken in the Tanami desert (NW of Alice) n There are a number of factors influencing this choice: –One of the most comprehensive lexical databases for any Australian Language (Laughren & Nash 1983) –Relatively large community of people interested in learning their traditional language –Until now, results haven’t been produced in a format usable by the community (only raw printouts)

Educational goals n Dictionary structure and usability are often dictated by professional linguists, while the needs of others (speakers, semi-speakers, young users, second language learners) are not met n The low level of literacy in the region makes an e- dictionary potentially more useful than a paper edition –less dependent on good knowledge of spelling and alphabetical order. –Making it fun and easy to use, and providing multimedia content and the pronunciations of words is a considerable help as well

Target user community

Kirrkirr: A Warlpiri dictionary browser (Jansz 1998; Jansz, Manning and Indurkhya 1999) n An environment for the interactive exploration of dictionaries. n Current work has just been with Warlpiri, the design is general (Arrernte coming soon!) n Attempts to more fully utilise graphical interfaces, hypertext, multimedia, and different ways of indexing and accessing information n It can either be run over the web [high bandwidth] or run locally (here Java’s main advantage is cross- platform support).

Overview n Animated Graph layout of word relationships

Overview n Graph layout n Formatted entries

Overview n Graph layout n Formatted entries n A Notes facility for ‘jotting in the margin’

Overview n Graph layout n Formatted entries n Notes n Multimedia: audio, pictures

Overview n Graph layout n Formatted entries n Notes n Multimedia n Advanced searching interfaces

Overview n Graph layout n Formatted entries n Notes n Multimedia n Advanced searching n Semantic Domain Browsing

Overview n Graph layout n Formatted entries n Notes n Multimedia n Advanced searching n Semantic Domain Browsing n Others in planning: formatting (XSL) editing, figuration patterns. n These attempt to cater to users with different interests and competence levels

MRD Structure n The internal structures of current Machine Readable Dictionaries (MRDs) usually merely mimic the structure of the printed form (Boguraev 1990) n Some work, notably WordNet (Miller 1995) has involved a fundamental rethinking of dictionary content and organisation (in WordNet, organisation via “synsets” which are related via links of part, subkind, opposite) n But there has been little in the way of software to make such research truly usable by different communities of users.

The lexical database n Original materials stored in an ad hoc format of markup using backslash codes with some (rather odd) nesting of structural tags n These were converted to XML using an error- correcting stack-based parser (written in PERL). –The inconsistency and flexibility of dictionary entries actually made this a surprisingly difficult task. –But parser tries to impose data integrity n Use of XML gives a clear structure to the lexical data, and makes available many (free) tools n Result remains a portable, tangible text file

XML indexing - challenges n Few XML parsers make single entries retrievable from the file n Typically, the entire XML document is put in memory n This is not practical when parsing significant XML databases (e.g., the Warlpiri dictionary is approx. 10Mb).

XML Dictionary Indexing (XDI) n Hierarchical structure of XML lends itself to indexing –Each entry in the XML file can be considered as a separate entity n To make the Warlpiri dictionary usable for Kirrkirr an ad hoc indexing system was developed –Uses a slightly modified Ælfred XML parser –Entries indexed by headword in a separate index file n The system returns an XML document object containing the single dictionary entry, facilitating: –processing for related words (Graph layout) –XSL processing to HTML

headword  file position XML Formatted Warlpiri dictionary file Index in Memory Across file system or web Kirrkirr’s XML Index Process Kirrkirr Dictionary Browser XML Parser XML Document Object XSL file + XSL Processor HTML document

XDI in Kirrkirr n The XML indexing process considerably improves efficiency as only requested entries are parsed n Parsed entires are kept temporarily in a cache n Thus Kirrkirr uses XML as a median between the structure and indexing of a relational database, with the freedom and functionality of text.

XQL - Potential n An alternative to investigate for the future is using a standard query language – such as XQL – to get material out of the XML dictionary, rather than using our ad hoc index. n At the moment not a huge issue since most retrieval is focussed on components of a particular word

XQL - Optimizations n Revamp data structure – reduce redundancy, amount to load at start-up n PDOM (Persistent Document Object Model) – represents XML document as a collection of objects in a tree like model n XQL (Extensible Query Language) – query language for XML – e.g. /DICTIONARY/ENTRY[9] –DICTIONARY/ENTRY[HW='jaja']

Performance - Startup time n Impact on Startup time.

Customised Presentation of Dictionary Content n Produced dynamically from the XML by using XSL (via James Clark’s XT) n XSL allows easy modelling of some user preferences. n This is useful as many users find information overload quite confusing and demotivating n Can produce bilingual or monolingual dictionary n Opportunities for various output styles, and formats such as RTF or TeX for printing.

Performance - XSL Presentation n Creates minimal load on the application n Requires file creation permission for the applet n Takes load off file system (no need for pre- generated files) n Gives the user the opportunity to customise the formatting.

User study Mim Corris & Jane Simpson n User testing with Warlpiri children (primary and secondary students), adults and teachers. n Purely qualitative observational study of dictionary use. (Doing anything much else would be difficult) n Teachers using a domain-specific dictionary extract still found the interface more efficient to use for language tasks.

Initial reactions - enthusiastic n Despite teachers concerns that the system would be too hard for children, primary students used the software with relative ease. n Students were given the opportunity to spend ‘free time’ with Kirrkirr –time was spent looking up unfamiliar words from the day before.

Conclusions n While we have focused our research on Warlpiri, the system can be easily applied to other languages n The Key to the effectiveness of the browsing interfaces is that the user has the ability to customise their functionality due to the flexibility of the XML & Kirrkirr technology n Throughout this research, the educational interests of the user have been the highest priority. n Hope to better understand the usefulness & practicality of innovative dictionary browsing environments.

Links Kirrkirr homepage: Kevin’s Thesis Homepage:

Kirrkirr: Software for browsing and visual exploration of a structured Warlpiri dictionary Kevin Jansz Department of Linguistics, University of Sydney, Australia Christopher Manning Departments of Computer Science and Linguistics, Stanford University, USA Nitin Indurkhya School of Applied Science, Nanyang Technological University, Singapore