Language Tags W3C Project Review. Presenter and Agenda Addison Phillips Internationalization Architect, Yahoo! Co-Editor, Language Tag Registry Update.

Slides:



Advertisements
Similar presentations
XML-XSL Introduction SHIJU RAJAN SHIJU RAJAN Outline Brief Overview Brief Overview What is XML? What is XML? Well Formed XML Well Formed XML Tag Name.
Advertisements

How Standards Happen* *and why sometimes they dont Addison Phillips Internationalization Architect Yahoo! Inc.
Language Tags and Locale Identifiers A Status Report.
Making Sense of Language Tags 10 th Metadata Open Forum.
Whats New in Globalization Mark Davis. Unicode Character Database: UCD 5.0 Schedule Currently in β2 Due June, 2006 Major part of the Unicode Standard.
Copyright © 2003 Pearson Education, Inc. Slide 3-1 Created by Cheryl M. Hughes The Web Wizards Guide to XML by Cheryl M. Hughes.
Copyright © 2003 Pearson Education, Inc. Slide 4-1 Created by Cheryl M. Hughes, Harvard University Extension School Cambridge, MA The Web Wizards Guide.
LIS650lecture 1 XHTML 1.0 strict Thomas Krichel
Httpbis IETF 721 RFC2616bis Draft Overview IETF 72, Dublin Julian Reschke Mailing List: Jabber:
From UCS-2 to UTF-16 Discussion and practical example for the transition of a Unicode library from UCS-2 to UTF-16.
The creation of "Yaolan.com" A Site for Pre-natal and Parenting Education in Chinese by James Caldwell DAE Interactive Marketing a Web Connection Company.
EXtensible HyperText Markup Language Miruna Bădescu Finsiel Romania Copenhagen, 25 May 2004.
Advanced XSLT. Branching in XSLT XSLT is functional programming –The program evaluates a function –The function transforms one structure into another.
10. Juni 1998reto ambühler ( WELCOME TO THE GATHERING PLACE.
4. Internet Programming ENG224 INFORMATION TECHNOLOGY – Part I
Building International Applications with Visual Studio.NET Achim Ruopp International Program Manager Microsoft Corporation.
Dr. Alexandra I. Cristea XHTML.
Internet Services and Web Authoring (CSET 226) Lecture # 5 HyperText Markup Language (HTML) 1.
1 Web Services Based partially on Sun Java Tutorial at Also, XML, Java and the Future of The Web, Jon Bosak. And WSDL.
Cascading Style Sheets
Tutorial 6 Creating a Web Form
CS 898N – Advanced World Wide Web Technologies Lecture 21: XML Chin-Chih Chang
CSE 190: Internet E-Commerce Lecture 17: XML, XSL.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
HTTP Hypertext Transfer Protocol. HTTP messages HTTP is the language that web clients and web servers use to talk to each other –HTTP is largely “under.
Thayer School of Engineering Dartmouth Lecture 2 Overview Web Services concept XML introduction Visual Studio.net.
1 HTML’s Transition to XHTML. 2 XHTML is the next evolution of HTML Extensible HTML eXtensible based on XML (extensible markup language) XML like HTML.
Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building.
CST JavaScript Validating Form Data with JavaScript.
1 © 2000, Cisco Systems, Inc. DNSSEC IDN Patrik Fältström
Unicode & W3C Jataayu Software C. Kumar January 2007.
XML – Extensible Markup Language Sivakumar Kuttuva & Janusz Zalewski.
Creating a Simple Page: HTML Overview
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
Unit-2 Introduction to HTML PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE WEB APPLICATION.
Learning Web Design: Chapter 4. HTML  Hypertext Markup Language (HTML)  Uses tags to tell the browser the start and end of a certain kind of formatting.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
Language / Locale IDs M. Davis, IBM A. Phillips, webMethods.
IBM Globalization Center of Competency © 2006 IBM Corporation IUC 29, Burlingame, CAMarch 2006 Automatic Character Set Recognition Eric Mader, IBM Andy.
Sheet 1XML Technology in E-Commerce 2001Lecture 1 XML Technology in E-Commerce Lecture 1 WWW, HTML, CSS, XML, Meta-modeling.
Web Server Administration Web Services XML SOAP. Overview What are web services and what do they do? What is XML? What is SOAP? How are they all connected?
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
Presentation Topic: XML and ASP Presented by Yanzhi Zhang.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
Oracle9i Database Administrator: Implementation and Administration 1 Chapter 14 Globalization Support in the Database.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Applying eXtensible Style Sheets (XSL) Ellen Pearlman Eileen Mullin Programming.
4395bis irireg Tony Hansen, Larry Masinter, Ted Hardie IETF 82, Nov 16, 2011.
The eXtensible Markup Language (XML). Presentation Outline Part 1: The basics of creating an XML document Part 2: Developing constraints for a well formed.
XML Engr. Faisal ur Rehman CE-105T Spring Definition XML-EXTENSIBLE MARKUP LANGUAGE: provides a format for describing data. Facilitates the Precise.
Standards for Technology in Automotive Retail STAR Update Michelle Vidanes STAR XML Data Architect April 30 th, 2008.
Week 7 Lecture 2 Globalization Support in the Database.
Web Technologies Lecture 4 XML and XHTML. XML Extensible Markup Language Set of rules for encoding a document in a format readable – By humans, and –
Introduction to XML XML – Extensible Markup Language.
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES Andy Dawson LIS1510 Library and Archives Automation Issues XML and extensible systems Andy Dawson School.
CIS 228 The Internet 9/20/11 XHTML 1.0. “Quirks” Mode Today, all browsers support standards Compliant pages are displayed similarly There are multiple.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Chapter 13 A & B Programming Languages and the.
EU Inter-Community Meetup Geneva, Saturday, 13 June, 2009.
HTML CS 4640 Programming Languages for Web Applications
Unit 4 Representing Web Data: XML
CIS 228 The Internet 9/20/11 XHTML 1.0.
ALTO Protocol draft-ietf-alto-protocol-14
XML QUESTIONS AND ANSWERS
Chapter 7 Representing Web Data: XML
The Re3gistry software and the INSPIRE Registry
COMP 150-IDS: Internet Scale Distributed Systems (Spring 2016)
WebDAV Design Overview
Re3gistry Software rc1.0 INSPIRE registry service rc 5
HTML CS 4640 Programming Languages for Web Applications
Presentation transcript:

Language Tags W3C Project Review

Presenter and Agenda Addison Phillips Internationalization Architect, Yahoo! Co-Editor, Language Tag Registry Update (LTRU) Working Group – RFC 4646 Tags for Identification of Languages – RFC 4647 Matching of Language Tags

Language Tags Whats a language tag? Why the are they changing them (again)? What do we need to do?

Language Tags Enable presentation, selection, and negotiation of content Defined by BCP 47 – Widely used! XML, HTML, RSS, MIME, SOAP, SMTP, LDAP, CSS, XSL, CCXML, Java, C#, ASP, perl………. – Well understood (?)

Locale Identifiers Different ideas: – Accept-Locale vs. Accept-Language – URIs/URNs, etc. – CLDR/LDML And Requirements: – Operating environments and harmonization – App Servers – Web Services

In the Beginning Received Wisdom from the Dark Ages Locales: – japanese, french, german, C – ENU, FRA, JPN – ja_JP.PCK – AMERICAN_AMERICA.WE8ISO8859P1 Languages… … looked a lot like locales (and vice versa)

Locales and Language Tags meet Conversations in Prague… – Language tags are being locale identifiers anyway… – Not going to need a big new thing… – Just a few things to fix… … we can do this really fast

BCP 47 Basic Structure Alphanumeric (ASCII only) subtags Up to eight characters long Separated by hyphens Case not important (i.e. zh = ZH = zH = Zh) 1*8alphanum * [ - 1*8 alphanum ]

RFC 1766 zh-TW ISO (alpha2) ISO 3166 (alpha2) i-klingon Registered value

RFC 3066 sco-GB ISO (alpha 3 codes) But use… eng-GB alpha 2 codes when they exist

Problems Script Variation: – zh-Hant/zh-Hans – (sr-Cyrl/sr-Latn, az-Arab/az-Latn/az-Cyrl, etc.) Obsolence of registrations: – art-lojban (now jbo), i-klingon (now tlh) Instability in underlying standards: – sr-CS (CS used to be Czechoslovakia …

And More Problems Lack of scripts Little support for registered values in software Reassignment of values by ISO 3166 Lack of consistent tag formation (Chinese dialects?) Standards not readily available, bad references Bad implementation assumptions – 1*8 alphanum *[ - 1*8 alphanum] – 2*3 ALPHA [ - 2ALPHA ] Many registrations to cover small variations – 8 German registrations to cover two variations

RFC 4646 (3066bis) Defines a generative syntax – machine readable – future proof, extensible Defines a single source (IANA Language Subtag Registry) – Stable subtags, no conflicts – Machine readable Defines when to use subtags – (sometimes)

RFC 3066bis and LTRU sl-Latn-IT-rozaj-x-mine ISO 639-1/2 (alpha2/3)ISO script codes (alpha 4)ISO 3166 (alpha2) or UN M49Registered variants (any number) Private Use and Extension

More Examples es-419 (Spanish for Americas) en-US (English for USA) de-CH-1996 (Old tags are all valid) sl-rozaj-nedis (Multiple variants) zh-t-wadegile (Extensions) x-tim-b-lee (Private Use, opaque) en-US-x-twain (Private Use, composed)

Benefits Subtag registry in one place: one source. Subtags identified by length/content Extensible Compatible with RFC 3066 tags Stable: subtags are forever

ABNF

Registry Stability guarantees on normative information, especially subtags Fixed registration rules (no junk) Deprecation Preferred Values File and Subtag dates, deprecation dates Prefixes (what subtags go together) Descriptions and Comments

Example: Language % Type: language Subtag: in Description: Indonesian Added: Preferred-Value: id Deprecated: Suppress-Script: Latn %

Example: Variant % Type: variant Subtag: nedis Description: Natisone dialect Description: Nadiza dialect Added: Prefix : sl %

Example: Grandfathered % Type: grandfathered Tag: art-lojban Description: Lojban Added: Preferred-Value: jbo Deprecated: Comments: replaced by ISO code jbo %

Problems Matching – Does en-US match en-Latn-US ? Tag Choices – Users have more to choose from. Implementations – More to do, more to think about – (easier to parse, process, support the good stuff)

Tag Matching Uses Language Ranges in a Language Priority List to select sets of content according to the language tag. Basically what we already had, but in one place. Three Schemes – Basic Filtering – Extended Filtering – Lookup

Filtering Ranges specify the least specific item – en matches: en, en-US, en-Brai, en-boont Can select zero or more items (selects a set, including empty set)

Basic Filtering Basic matching uses plain prefixes – en-US matches: en-us, en-us-boont – en-US does NOT match: en-Latn-US, en-boont, en-x-US

Extended Filtering Extended matching can match inside bits – en-*-US matches: en-Brai-US, en-us, en-us-boont – Does NOT match: en-x-US, en-Brai Wildcard only has meaning in first position – for example: *-DE – en-US equivalent to en-*-US matches en-Brai-US!!!

Lookup Range specifies the most specific tag in a match. – en-US matches en and en-US but not en- US-boont Mirrors the locale fallback mechanism and many language negotiation schemes. Implementations MUST specify defaulting behavior.

Fallback Range to match: zh-Hant-CN-x-private1-private2 1. zh-Hant-CN-x-private1-private2 2. zh-Hant-CN-x-private1 3. zh-Hant-CN 4. zh-Hant 5. zh 6. (default)

Defaulting Language Preference List: fr-fr,zh-hant 1. fr-FR 2. fr 3. zh-Hant // next language 4. zh 5. ja-JP // now searching for the default content 6. ja 7. (implementation defined default)

Filtering vs. Lookup Filtering can produce zero or more matches – example: CSS :lang pseudo-attribute – … but can produce exactly one behavior Lookup produces exactly one match – example: resource lookup

What to Reference BCP 47 (urn:ietf:bcp:47) Tags: RFC 4646 or successor – tags Matching: RFC 4647 or successor – language ranges, language preferences (language priority list), matching schemes

References to Replace RFC 1766, RFC 3066 (tags) ISO 639, ISO 3166 (XML 1.0 4e!) [reference IANA Language Subtag Registry] RFC 2616 (HTTP 1.1, §14.4: language ranges, basic matching)

Approach Changes, Issues Reference the registry Specify well-formed or validating Choose matching schemes carefully – consider using Extended Filtering, e.g. in XPath – use Lookup for locale-like operations xml:lang= matching

What Do I Do (Content Author)? Not much. – Existing tags are all still valid: tagging is mostly unchanged. – Resist temptation to (ab)use the private use subtags. If your language typically has script variations (or if you content exhibits it): – ONLY THEN tag content with script subtag(s) Script subtags only apply to a small number of languages: zh, sr, uz, az, mn, and a very small number of others.

What Do I Do (Programmer)? Check code for compliance with 4646 – Decide on well-formed or validating implementation (note requirements well) – Implement suppress-script – Change to using the registry – Bother infrastructure folks (Java, MS, Mozilla, etc) to implement the standard

What Do I Do (End-User)? Check and update your language ranges. Tag content wisely.

LTRU Milestone Dates RFC 4645, 4646, 4647 published Coming: RFC 4646bis (3066ter) – This includes ISO support and extended language support

RFC 4646bis: What, more changes?!? Adds support for ISO (about 7000 additional alpha3 language codes) – Two flavors: language subtags and extlangs sgn-ase [ sign language, ASL ] zh-cmn [Chinese, Mandarin] zh-cms [Chinese, Cantonese] azz [Highland Puebla Nahuatl] Nothing else??

W3C and Unicode Activities W3C – LTLI (Language Tags and Locale Identifiers) – Web services (WS-I18N) – XML, HTML – Notes and Best Practices (I18N GEO WG) Unicode Consortium – LDML – CLDR

Questions/Discussion