Zanichelli XML-based Dictionaries Editing System Daniele Fusi.

Slides:



Advertisements
Similar presentations
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
Advertisements

1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Daniele Fusi.  shared core: C# in.NET 3.5 (LINQ to XML; original version used C# 2.0)  storage: XML (UTF-8 Unicode)  Word-processor import: MS Open.
Publishing Workflow for InDesign Import/Export of XML
File Systems and Databases
Tutorial 8 Sharing, Integrating and Analyzing Data
Copyright 2003 The McGraw-Hill Companies, Inc CHAPTER Application Software computing ESSENTIALS    
Introduction to XML: Yong Choi CSU Bakersfield.
Upgrading to XHTML DECO 3001 Tutorial 1 – Part 1 Presented by Ji Soo Yoon 19 February 2004 Slides adopted from
Bar|Scan ® Asset Inventory System The leader in asset and inventory management.
Tutorial 11: Connecting to External Data
Mgt 240 Lecture Website Construction: Software and Language Alternatives March 29, 2005.
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
Reporting in EMu Crystal != Reporting or Why is reporting so difficult and can we do anything about it? Bernard Marshall KE Software.
Microsoft Share Point 2007 Lela Castaneda. Microsoft Office SharePoint Designer 2007 top 10 benefits 1)Be more productive with next-generation Microsoft.
1 Introduction to Web Development. Web Basics The Web consists of computers on the Internet connected to each other in a specific way Used in all levels.
Web Content Management Systems. Lecture Contents Web Content Management Systems Non-technical users manage content Workflow management system Different.
OCLC Online Computer Library Center CONTENTdm Migration Training Craig Yamashita Vice President, Technology and Product Development DiMeMa, Inc. July 2005.
Basic tasks of generic software Chapter 3. Contents This presentation covers the following: – The basic tasks of standard/generic software including:
Luc Audrain Hachette Livre Head of digitalization
A Scalable Application Architecture for composing News Portals on the Internet Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta Famagusta.
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Chapter 16 The World Wide Web Chapter Goals Compare and contrast the Internet and the World Wide Web Describe general Web processing Describe several.
Basics of Web Databases With the advent of Web database technology, Web pages are no longer static, but dynamic with connection to a back-end database.
Using Styles and Style Sheets for Design
What is XML? XML stands for EXtensible Markup Language
ULI101 – XHTML Basics (Part II) What is Markup Language? XHTML vs. HTML General XHTML Rules Block Level XHTML Tags XHTML Validation.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
OFC304 Excel 2003 Overview: XML Support Joseph Chirilov Program Manager.
XML – Extensible Markup Language XML eXtensible – add to language. Markup – delimit info using tags. Language – a way to express info.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Part 1. Persistent Data Web applications remember your setting by means of a database linked to the site.
CIS 451: Introduction to XML Dr. Ralph D. Westfall October, 2011.
XML & Mediators Thitima Sirikangwalkul Wai Sum Mong April 10, 2003.
XML eXtensible Markup Language. Topics  What is XML  An XML example  Why is XML important  XML introduction  XML applications  XML support CSEB.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation An Introduction to XML.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
WEB APPLICATION DEVELOPMENT For More visit:
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
ITGS Databases.
XML Introduction. Markup Language A markup language must specify What markup is allowed What markup is required How markup is to be distinguished from.
1 Tutorial 11 Creating an XML Document Developing a Document for a Cooking Web Site.
C OMPUTING E SSENTIALS Timothy J. O’Leary Linda I. O’Leary Presentations by: Fred Bounds.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
VCE IT Theory Slideshows by Mark Kelly study design By Mark Kelly, vceit.com, Begin.
Connecting to External Data. Financial data can be obtained from a number of different data sources.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
Pre-Production Meet with the client to create a project plan:
Internet Made Easy! Make sure all your information is always up to date and instantly available to all your clients.
Project 1 Introduction to HTML.
Using Access and the Web
Microsoft Office Illustrated
Databases.
Chapter 4 Application Software
Chapter 27 WWW and HTTP.
File Systems and Databases
Lecture 1: Multi-tier Architecture Overview
Objective Understand web-based digital media production methods, software, and hardware. Course Weight : 10%
What is XML?.
Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
ICT Word Processing Lesson 1: Introduction to Word Processing
Dynamicweb PIM General introduction Innovia 2018.
CIS 133 mashup Javascript, jQuery and XML
Presentation transcript:

Zanichelli XML-based Dictionaries Editing System Daniele Fusi

1 - System Requirements Multiple presentations, legacy content, operating environment

One content, multiple presentations data cd-rom / dvd web sites or services paper books e-books

Existing environment: requirements authors accustomed toWYSIWYG editing in Word processors no technical training IT point of view text as a database query and interactivity multiple media and forms editors content validation and uniformation text-based tools simple content structure designers DTP pagination flattened structure import / export

Existing content: conversion word processor documents 3rd party formats

Digital format requirements  text-based storage, both machine- and user-readable  using standard technologies (portable & durable)  open to expansion and customization  easy to manipulate  easy to transform for import/export  focused on semantics: content rather than its presentation

Content and semantics: dictionary...μα στ ό μα lemma:

Marking semantics in text: ‘fields’ lemma morphology etymon translation sample work etc...

Semantic markup: applications lemma morphology etymon translation sample work alphabetical lemmata list, normal or inverted list of lemmata grouped by grammatical category list of lemmata grouped by etymon (roots dictionary) rudimentary bidirectional dictionary look for quotation list of quoted works and authors etc... complex searches lemma morphology etymon work etc...

2 – Solution overview XML-based implementation

Implementation: XML XML Dictionary  Unicode text files  widely used standard  built for openness and transformation (XSLT)  representation of any kind of data, independently from their presentation  hierarchical model  well-fit to hierarchical model: letter, lemma, fields  typically stored as text for existing works dictionary letter lemma field

Sample: lemma and fields  lemma = dizionário  date = 1965  grammar = s.m.  translation 1 = complesso dei lemmi di un dizionario e sim.  separator  translation 2 = lista dei lemmi dizionário [1965] s.m. complesso dei lemmi di un dizionario e sim. lista dei lemmi

dictionary translation separator translation grammar date Hierarchical structure lemma lemmata... letter letters...

Minimalist structure Flat, yet extensible  smallest depth satisfies practical requirements  fields vary at will accor- ding to the dictionary language and type  variability of fields compensates for relatively flat hierarchy dictionary letter lemma field...

Structure and compromises Practical devices  fields define lemma parts: etymon, translation, grammar, samples,...  formatting is automatically derived from semantic structure (lemma = bold, grammar = italic, author = smallcaps,...)  text escapes define specific formatting for portions of field values, whenever they are not considered as semantically relevant I came by cab Focus on semantics 1 field (sample) in lemma: hierarchy needs not to be deeper, yet allow emphasis on “by”

Storage: data  XML files: one file per letter  each dictionary has its own alphabet and sorting scheme  lemmata: automatically inserted in the proper file and at the proper position according to their content  lemma ID overriding for special sorting XML files (letters) lemma à côté (du)acote ABSabiesse 10 minutestenminutes

Storage: metadata  self-descriptive dictionary: additional XML files define:  fields list and types within each dictionary  alphabet and sort order for each dictionary, including diacritics sensitivity  other support dictionary- specific resources (e.g. frequently typed symbols, preview styles) prelemma etymon abbreviation phonetics translation variant grammar category (A, B, C...) section (1, 2, 3...) separator (– ∎ ∙ ∘ ⋆... )... abcčćd dž đefghijkl lj mn nj oprsštuvzž croatian

3 – Editing Authors

Visual Editing  visual UI:  authors build lemmata visually by blocks, and are shielded from underlying XML code  XML code integrity is granted by software  typographical preview is provided for WYSIWYG accustomed authors XML data file = letter letter lemmata fields XML metadata

Editing software: editing by blocks lemmata list visual editing: fields in lemma typographical preview letter selector

Editing in distributed scenarios Web based visual editing

Web: distributed scenario  dictionaries are stored centrally in a web server  an ASP.NET web site manages accesses and versioning for different authors and works  visual editing implemented as a Silverlight RIA, running from authors own computer, yet inside a web page:  desktop-class responsiveness for application  true platform independence (Mac / PC, IE / Mozilla / Safari)  no need for software distribution and installation  centralized software maintenance

Distributed editing SQL database for managing access ASP.NET server application manages users and works versions Silverlight application runs on client computer for visual editing XML author specialized author editor

Visual editing in your web browser lemmata list visual editing: fields in lemma typographical preview letter selector

4 - Revision Editors

Content revisions and transformations merging different versions (multiple authors scenarios) editors validation and uniformation DTP pagination for printing

Automated revision and correction test selection test description results

5 - Publication Editors

One content, multiple outputs print cd/dvd mobile devices (Mobipocket) web sites

Extending the model Sample: RTL languages and root-based dictionaries

Arabic-Italian dictionary  clashing RTL/LTR text flows  special alphabetical order:  several letters share the same rank  different sorting according to level  root-based dictionary:  letter  root  lemma  field  existing dictionaries structure must be kept unchanged even if a deeper hierarchy would be required A roots are sorted according to predefined scheme, lemmata in roots are arbitrarily sorted by authors

letter Hierarchy depths Other dictionaries Arabic: roots... lemma... lemma... item = root item = lemma item = lemma = set of fields item = root = set of fields, some delimiting lemmata boundaries

Deeper hierarchy illusion: special editor Arabic-X editor  XML structure unchanged: each file is a letter containing items, each item contains fields  items are roots, not lemmata  a special field defines lemmata boundaries whithin each root  user sees letters, roots, lemmata in root, fields in lemmata; XML structure remains letter-items-fields... lemma... lemma

Specialized editor: Arabic letter selector roots in letter lemmata in root visual editing: fields in lemma typographical preview, bidirectional flows

Arabic editor trick: advantages  user experience is almost unchanged (there are 2 lists instead of 1 to choose from for editing, roots and lemmata)  XML structure unchanged: all the other editorial processes require no change so that the new dictionary fits into them easily  fields variability (already responsible for structure expandability) makes this trick possible one model, several views

Daniele Fusi