Draft-ietf-iri-iri-3987bis-03 Issues Overview IETF 79, Beijing IRI WG Meeting 2010-11-09 Martin J. Dürst, co-Editor.

Slides:



Advertisements
Similar presentations
Httpbis IETF 721 RFC2616bis Draft Overview IETF 72, Dublin Julian Reschke Mailing List: Jabber:
Advertisements

Keys to Building a Multilingual Search Engine Thierry Sourbier.
© 1998, Progress Software Corporation 1 Migration of a 4GL and Relational Database to Unicode Tex Texin International Product Manager.
Naming, Addressing, & Discovery
Copyright 2004 Monash University IMS5401 Web-based Systems Development Topic 2: Elements of the Web (g) Interactivity.
Tutorial 6 Working with Web Forms
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
Data Representation Kieran Mathieson. Outline Digital constraints Data types Integer Real Character Boolean Memory address.
Tutorial 6 Working with Web Forms. XP Objectives Explore how Web forms interact with Web servers Create form elements Create field sets and legends Create.
XML Primer. 2 History: SGML vs. HTML vs. XML SGML (1960) XML(1996) HTML(1990) XHTML(2000)
1 HTML’s Transition to XHTML. 2 XHTML is the next evolution of HTML Extensible HTML eXtensible based on XML (extensible markup language) XML like HTML.
Course Textbook: Build Your Own ASP.Net Website: Chapter 2
RFC Baby Steps Adding UTF-8 Support Tony Hansen IETF 83 March 27, 2012.
1 © 2000, Cisco Systems, Inc. DNSSEC IDN Patrik Fältström
IDN over EPP (IDNPROV) IETF BOF, Washington DC November 2004.
Internationalized Domain Names Technical Review and Policy Implications John C Klensin APTLD Manila 23 February 2009.
Internationalized Domain Names: Overview of ICANN Activities Masanobu Katoh, Chair, IDN Committee Director, ICANN Board CDNC-CNSG-MINC IDN Joint Meeting.
Unicode & W3C Jataayu Software C. Kumar January 2007.
Web forms in PHP Forms Recap  Way of allowing user interaction  Allows users to input data that can then be processed by a program / stored in a back-end.
JavaScript & jQuery the missing manual Chapter 11
IDN Standards and Implications Kenny Huang Board, PIR
Computer System Basics 1 Number Systems & Text Representation Computer Forensics BACS 371.
China Internet Network Information Center Chinese Internet Name Services - Domain Name and Common Name China Internet Network Information Center Xiaodong.
Internationalized Domain Names (IDN) APAN Busan James Seng former co-chair, IDN Working Group.
JavaScript, Fourth Edition
10 Adding Interactivity to a Web Site Section 10.1 Define scripting Summarize interactivity design guidelines Identify scripting languages Compare common.
Names Variables Type Checking Strong Typing Type Compatibility 1.
IBM Globalization Center of Competency © 2006 IBM Corporation IUC 29, Burlingame, CAMarch 2006 Automatic Character Set Recognition Eric Mader, IBM Andy.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Chapter 8 Cookies And Security JavaScript, Third Edition.
Chapter 8 Introduction to HTML and Applets Fundamentals of Java.
1 Dr Alexiei Dingli XML Technologies XML Advanced.
DHTML AND JAVASCRIPT Genetic Computer School LESSON 5 INTRODUCTION JAVASCRIPT G H E F.
URLs and Resources Herng-Yow Chen.
Computer System Basics 1 Number Systems & Text Representation Computer Forensics BACS 371.
FIMS v1.1 Version numbers in schema Richard Cartwright Quantel July 2013.
Tutorial 6 Working with Web Forms. 2New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition Objectives Explore how Web forms interact with.
4395bis irireg Tony Hansen, Larry Masinter, Ted Hardie IETF 82, Nov 16, 2011.
StarTeam URLs: Creating and Using Persistent Links to StarTeam Artifacts  Jim Wogulis  Principal Architect, Borland Software Corporation.
CRITERIA FOR STANDARDIZING DATA CATEGORIES The Well-Formed Data Category Specification SUE ELLEN WRIGHT METADATA TDG WEBINAR
Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.
Netprog 2002 CGI Programming1 CGI Programming CLIENT HTTP SERVER CGI Program http request http response setenv(), dup(), fork(), exec(),...
Rfc3280bis-00 David Cooper, NIST Tim Polk, NIST. Development Process ● October 2004: Tim Polk requested that people submit any issues that needed to be.
Forms Collecting Data CSS Class 5. Forms Create a form Add text box Add labels Add check boxes and radio buttons Build a drop-down list Group drop-down.
Web Technologies Lecture 3 Web forms. HTML5 forms A component of a webpage that has form controls – Text fields – Buttons – Checkboxes – Range controls.
IDNAbis and Security Protocols or Internationalization Issues with Short Strings John C Klensin SAAG – 26 July 2007.
Internet Applications (Cont’d) Basic Internet Applications – World Wide Web (WWW) Browser Architecture Static Documents Dynamic Documents Active Documents.
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
Tutorial 6 Working with Web Forms. 2New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition Objectives Explore how Web forms interact with.
RADEXT WG RADIUS Attribute Guidelines Greg Weber March 21 st, 2006 IETF-65, Dallas v1 draft-weber-radius-attr-guidelines-02.txt draft-wolff-radext-ext-attribute-00.txt.
RTP Splicing Status Update draft-ietf-avtext-splicing-for-rtp-11 Jinwei Xia.
Multilingual Domain Name 22 Feb 2001 YONEYA, Yoshiro JPNIC IDN-TF.
The original Internationalized Domain Name (IDN) WG set the requirements for international characters in domain names in RFC 3454, RFC3490, RFC3491 and.
Internationalization of Domain Names James Seng CTO, i-DNS.net International co-chair, IETF IDN Working Group.
© 2001, Penn State University Encoding on the Internet Elizabeth J. Pyatt CETS.
Agenda Marc Blanchet and Chris Weber July 2011 IRI WG IETF 81 1.
Copyright © 2004, Keith D Swenson, All Rights Reserved. OASIS Asynchronous Service Access Protocol (ASAP) Tutorial Overview, OASIS ASAP TC May 4, 2004.
Keyprov PSKC spec Philip Hoyer 71-st IETF, Philadelphia.
Keyprov PSKC spec Philip Hoyer 71-st IETF, Philadelphia.
1 Internationalized Domain Names Paul Twomey 7 April 2008.
111 State Management Beginning ASP.NET in C# and VB Chapter 4 Pages
SIP wg Items Jonathan Rosenberg dynamicsoft Caller Preferences: Changes Discussion of Redirects –Previous draft only proxy –Nothing different for redirect.
Multilingual Domain Name
Director of Data Communications
RADEXT WG RADIUS Attribute Guidelines
Type Checking Generalizes the concept of operands and operators to include subprograms and assignments Type checking is the activity of ensuring that the.
Chapter 27 WWW and HTTP.
COMP 150-IDS: Internet Scale Distributed Systems (Spring 2016)
Multilingual Domain Name
Presentation transcript:

draft-ietf-iri-iri-3987bis-03 Issues Overview IETF 79, Beijing IRI WG Meeting Martin J. Dürst, co-Editor

Overview Overview/Background Issues in Groups: (discussion after each group) – Conversion by Decomposition – Registry Names – Query Parts – Bidirectionality – Weed-out – Legacy and Bugwards Compatibility 2

Background IRI: Internationalized Resource Identifier, currently RFC 3987RFC 3987 Internationalized (i.e. not-ASCII-only) version of URI (STD66, RFC 3986)STD66, RFC 3986 Updating draft-ietf-iri-3987bis-03.txtdraft-ietf-iri-3987bis-03.txt List of open issues at: SVN revision log: ietf-iri-3987bis/draft-ietf-iri-3987bis.xml ietf-iri-3987bis/draft-ietf-iri-3987bis.xml 3

IRI Examples 清华大学.中国, 清華大學.中国 中国互联网协会 بوابة. تونس 青山学院大学 4

Please Don’t Forget URIs/IRIs are a META-syntax Many pieces with different requirements get thrown together URIs/IRIs can be: – Absolute, complete from scheme to fragment id – Relative, just one or a few pieces – User-oriented (short, memorable) – Back-end (long, complicated) 5

Biggest Change from RFC 3987 Conversion: IRI ⇒ URI Old: Conversion by Decompo- sition URI Unicode in UTF-8 %-encode non-ASCII bytes IRI: Unicode Characters (electrons, ink, air waves) encode as UTF-8 6

Conversion: New scheme://iregname/ipath1/ipath2/…/ipath.ext?iquery#ifragment Decompose: Convert: Compose: schemeiregnameipathiqueryifragment as-ispunycodeUTF-8→%doc-enc → %UTF-8 → % scheme://regname/path1/path2/…/path.ext?query#fragment

New Conversion: Advantages Deal with special cases – ireg-name – Query Part Base for API (HTML5,…) 8

New Conversion: Status IRI ⇒ URI: – Basic writedown completed – Need to check details [issue #13; please help!]#13 URI ⇒ IRI: TODO [issue #14]#14 9

Registered Names: Mapping to URIs [issue #35]issue #35 STD 66/RFC 3986: Allows %-encoding in reg-name RFC 3987: Allows conversion using %-encoding for ireg-name draft-03: MUST convert to punycode 10

Advantages of -03 Approach Less variability for URIs (only.xn--fiqs8s, not also %E4%B8%AD%E5%9B%BD for 中国 ) Better resolution: For IDNA, punycode always resolves  Practical approach, but MUST too much (!?) 11

Disadvantages Conversion less uniform, more effort needed Does not deal with other systems (MDNS,…) Unnecessary restriction on schemes Does not deal with opaque syntax, domain names in query part,… Interaction with mapping Forward-compatibility %-encoding is still legal Layer violation 12

Reg-name: Mapping among Unicode To map or not to map? When to map? How to map? (IDNA 2003? RFC 5895? TR 46?) [issue #44]IDNA 2003RFC 5895TR 46issue #44 Proposal: – IRI creation: SHOULD only use U-Labels – Conversion to URI: MAY map (RFC 5895 or TR 46) 13

Query Part Encodings [issues #24, #40]#24#40 From an HTML (GET request): – Document encoding [issue #11], orissue #11 – Follow accept-charset attributeaccept-charset Input by user: – UTF-8: Preferred by big sites – Local user encoding: Preferred by some sites in Asia, non-interoperable In a document (e.g. <a src=…): – Document encoding 14

Query Part: Scheme Dependency Document encoding (where available): – – What else? [Please help!] UTF-8: – mailto: – What else? [Please help!] Schemes without query part: – What? [Please help!] 15

Legacy and Bugwards Compatibility LEIRI: Legacy Extended IRI, for XML – Problem: Main XML specs diverged on an early draft of RFC 3987 – Solution: Allow spaces,… in LEIRIs, with lots of health warnings – Status: One of the most carefully checked parts of the draft (Section 7.1) [issue #30]issue #30 16

Bugwards Compatibility: HTML5 [issues #1, #2, #3,#1#2#3 Browsers do a lot more than what the specs require Browser makers want to get the spec up to speed with reality 17

Bugwards Compatibility Examples Allow single '%'? [issue #41]issue #41 Allow '#' in fragment part [issue #42]issue #42 Illegal IRI characters [issue #43]issue #43 18

Bidi(rectionality) Basics Arabic, Hebrew,… scripts read TFEL2THGIR (in examples, we use ESAC REPPU for right-to-left) Storage is in logical order (parsing,… is easy) Display for running text is specified by Unicode TR 9 Unicode TR 9 – Directionality of punctuation follows surrounding letters – In computer syntax, stuff gets thrown around 19

Bidi IRI Goals Easily readable (for native readers) Easy to display (ideally no deviation from TR 9) Consistent conversion logical ⇔ display 20

IRI Bidi Concepts Component: String between syntax characters – Domain name label – Path component – Query parameter name/value – … Component directionality: Each component clearly one way Run: Same-directionality component sequence 21

Bidi Issues Adapt Bidi character restrictions to IDNA2008 [issue #25]issue #25 – Allow combining marks at end of component (no- brainer) – Allow digits at end of component (probably yes, issue #28)issue #28 – Establish non-jumping restrictions for IRIs (needs work, please help) Overall display strategy Move to separate document? [issue #6]issue #6 22

Bidi IRI Ordering Alternatives Overall Directi- onality Reordering by ExampleRFC 3987 Uni- code TR #9 Users# LTR →runhttp://ab.FE/DC/gh?ij=NM#LKokaypossible  1 LTR →componenthttp://ab.DC/FE/gh?ij=LK#NMbadneed ex- ception  2 RTL ←runNM#LK=gh?ij/FE/DC.  3 RTL ←componentNM#KL=ij?gh/FE/DC.ab//:httpbadneed ex- ception ?4 23 Logical: Worst-case example, shows main design choices Conflict between users (and user-oriented vendors) and security concerns

Weed-out Section 6, Use of IRIs [Please help reviewing!] Section 8, URI/IRI Processing Guidelines (Informative) [Please help reviewing!] Security Section: Replace large parts by pointers to Unicode TR 36 [issue #18]issue #18 Appendix A, Design Considerations: Replace with pointer to RFC 3987 [issue #53]issue #53 24

Other Issues (except trivial) #5: Distinguish IRI vs. "Presentation of IRI"? #5 #15: Move comparision section to separate document? #15 #20: Update Acknowledgements Section #20 #22: Fix "IRIs as identity tokens MUST" #22 #23: When to use NFC normalizing transcoder? (close?) #23 #26: No combining marks at start of component? #26 #27: Anything to say about ZWNJ/ZWJ? #27 #29: Include tag ranges in (close?) #29 #34: Incomplete sentence #34 #36: Some HTTP implementations send UTF-8 paths #36 #39: Warn about wrong conversion of non-BMP characters #39 #45: Secure comparisons #45 #46, #47: Length limits #46#47 #52: Update reference to Unicode 6.0 (patch available) #52 25