Unicode Security Mark Davis. The Unicode Consortium Software globalization standards: define properties and behavior for every character in every script.

Slides:



Advertisements
Similar presentations
New in Unicode Mark Davis, John Jenkins. Agenda Unicode UCA Regular Expressions Security Considerations Character Mapping Common Locale Data.
Advertisements

Unicode/IDN Security Mark Davis President, Unicode Consortium Chief SW Globalization Arch., IBM.
Globalization Gotchas
Whats New in Globalization Mark Davis. Unicode Character Database: UCD 5.0 Schedule Currently in β2 Due June, 2006 Major part of the Unicode Standard.
Unicode Mark Davis Unicode Consortium President IBM Chief SW Globalization Architect
Unicode Normalization Mark Davis
Mark Davis President, Unicode Consortium
Unicode Mark Davis Unicode Consortium President IBM Chief SW Globalization Architect.
CSCI N241: Fundamentals of Web Design Copyright ©2004 Department of Computer & Information Science Introducing XHTML: Module B: HTML to XHTML.
Internationalizing WHOIS Preliminary Approaches for Discussion Internationalized Registration Data Working Group ICANN Meeting, Brussels, Belgium Jeremy.
ICANN Rio Meeting IDN Authorization for TLDs with ICANN agreements 26 March, 2003 Andrew McLaughlin.
Internationalized Domain Names Introduction & Update MENOG 1 Bahrain April 3-5, 2007 By: Baher Esmat Middle East Liaison.
IDN Variant Issues Project (VIP) Project Update and Next Steps 11 April 2012.
DC Architecture WG meeting Monday Sept 12 Slot 1: Slot 2: Location: Seminar Room 4.1.E01.
Worldwide typography (and how to apply JIS-X to Unicode) Michel Suignard Microsoft Corporation.
Global Registry Services 1 INTERNATIONALIZED Domain Names Testbed presented to ITU/WIPO Joint Symposium Geneva 6-7 Dec An Overview On VeriSign Global.
4. Internet Programming ENG224 INFORMATION TECHNOLOGY – Part I
1 Character Conversions and Mapping Tables Presented By: Markus Scherer George Rhoten Raghuram (Ram) Viswanadha.
Internationalization Status and Directions: IETF, JET, and ICANN John C Klensin October 2002 © 2002 John C Klensin.
CGI & HTML forms CGI Common Gateway Interface  A web server is only a pipe between user-agents  and content – it does not generate content.
Text #ICANN50. Text #ICANN50 IDN Variant TLD Program GNSO Update Saturday 21 June 2014.
Graphics 2D 1 Subject:T0934 / Multimedia Programming Foundation Session:6 Tahun:2009 Versi:1/0.
XML Why is XML so popular? –It’s flexible –It enables you to create your own documents with elements (tags) that you define Non-XML example: This is a.
Thayer School of Engineering Dartmouth Lecture 2 Overview Web Services concept XML introduction Visual Studio.net.
Data Representation Kieran Mathieson. Outline Digital constraints Data types Integer Real Character Boolean Memory address.
1 HTML’s Transition to XHTML. 2 XHTML is the next evolution of HTML Extensible HTML eXtensible based on XML (extensible markup language) XML like HTML.
Review1 What is multilingual computing? Bilingual, trilingual, vs. Multilingual What are the fundamental issues in multi-lingual computing? –Representation.
Introducing XHTML: Module B: HTML to XHTML. Goals Understand how XHTML evolved as a language for Web delivery Understand the importance of DTDs Understand.
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
Computer Literacy BASICS: A Comprehensive Guide to IC 3, 5 th Edition Lesson 14 Sharing Documents 1 Morrison / Wells / Ruffolo.
1 © 2000, Cisco Systems, Inc. DNSSEC IDN Patrik Fältström
IDN over EPP (IDNPROV) IETF BOF, Washington DC November 2004.
Introduction to Human Language Technologies Tomaž Erjavec Karl-Franzens-Universität Graz Tomaž Erjavec Lecture: Character sets
August Chapter 1 - Essential PHP spring into PHP 5 by Steven Holzner Slides were developed by Jack Davis College of Information Science and Technology.
Internet Skills An Introduction to HTML Alan Noble Room 504 Tel: (44562 internal)
IDN Standards and Implications Kenny Huang Board, PIR
Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics.
CcTLD IDN TF Report ccTLD Meeting, Rio de Janero Mar. 25, 2003 Young-Eum Chair, ccTLD IDN TF.
Encoding and fonts Edward Garrett Software Developer, ELAR.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
NASK: at the cutting edge of technology Andrzej Bartosiewicz ITU-T SG17 meeting Moscow, 2005.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
JavaScript, Fourth Edition
Chapter 8 Cookies And Security JavaScript, Third Edition.
ccTLD IDN Report ccTLD Meeting, Montreol June 24, 2003 Young-Eum
4395bis irireg Tony Hansen, Larry Masinter, Ted Hardie IETF 82, Nov 16, 2011.
Analyzing Systems Using Data Dictionaries Systems Analysis and Design, 8e Kendall & Kendall 8.
Forms Collecting Data CSS Class 5. Forms Create a form Add text box Add labels Add check boxes and radio buttons Build a drop-down list Group drop-down.
IDNAbis and Security Protocols or Internationalization Issues with Short Strings John C Klensin SAAG – 26 July 2007.
27 Mar 2000IETF IDN-WG1 Requirements for IDN and its Implementations from Japan Yoshiro YONEYA JPNIC IDN-TF / NTT Software Co.
Multilingual Domain Name 22 Feb 2001 YONEYA, Yoshiro JPNIC IDN-TF.
The original Internationalized Domain Name (IDN) WG set the requirements for international characters in domain names in RFC 3454, RFC3490, RFC3491 and.
Page 1 of 42 To the ETS – Create Client Account & Maintenance Online Training Course Individual accounts (called a Client Account) are subsets of the Site.
Objective: To describe the evolution of the Internet and the Web. Explain the need for web standards. Describe universal design. Identify benefits of accessible.
© 2001, Penn State University Encoding on the Internet Elizabeth J. Pyatt CETS.
Tutorial 9 Working with XHTML. New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 2 Objectives Describe the history and theory of XHTML.
1 Lesson 14 Sharing Documents Computer Literacy BASICS: A Comprehensive Guide to IC 3, 4 th Edition Morrison / Wells.
Session 11: Cookies, Sessions ans Security iNET Academy Open Source Web Development.
The NSDL, OAI and Your Metadata Core Infrastructure Metadata Repository (“union catalog”) Naomi Dushay Cornell University.
IDN issues ITU-T SG16-Q7 26 July Objective Provide a set of examples why IDN study at ITU-T could be considered Demonstrate the need of ITU recommendation.
1 Internationalized Domain Names Paul Twomey 7 April 2008.
Unit 4 Representing Web Data: XML
Director of Data Communications
Welcome! To the ETS – Create Client Account & Maintenance
Characters & Fonts Digital Multimedia, 2nd edition
Programming the Web Using Visual Studio .NET
IDN Variant TLDs Program Update
XML Problems and Solutions
Characters & Fonts Digital Multimedia, 2nd edition
John C Klensin APNIC Beijing, 25 August 2009
Presentation transcript:

Unicode Security Mark Davis

The Unicode Consortium Software globalization standards: define properties and behavior for every character in every script Unicode Standard: a unique code for every character Common Locale Data Repository: LDML format plus repository for required locale data Collation, line breaking, regex, charset mapping, … Used by every major modern operating system, browser, office software, client,… Core of XML, HTML, Java, C#, C (with ICU), Javascript, …

Key issue: Identity A thinks: X = x B thinks: X x

IDN Example You get an about your paypal.com account, click on the link… You carefully examine your browser's address box to make sure that it is actually going to … But actually it is going to a spoof site: paypal.com with the Cyrillic letter p. You (System A) think that they are the same DNS (System B) thinks they are different

Examples: Visual Confusables Cross-Script p in Latin vs p in Cyrillic In-Script rn may appear at display sizes like m + typically looks identical to so̷s looks like søs Rendering Support ä with two umlauts may look the same as ä with one el ̣ is actually e + l ̣

The answer to the ultimate question of the Universe: ! Thus = 42 Western Bengali Oriya

Malicious Rendering Font technologies such as TrueType/OpenType are extremely powerful. A glyph in such a font actually may use a small program to deform the shape radically according to resolution, platform, or language. Powerful enough to change the appearance of $ on the screen to $ when printed.

Syntax Spoofing / = fraction-slash Also possible without Unicode: characters.mydomain.com

Non-Visual Attacks Exploiting Expectations Collation:X < Y, so X + H < Y + Hwrong Encoding:/ is always represented by 2F 16 wrong Casing:len(X) = len(toUpper(X)) wrong Norm.:NFC(x + y) = NFC(x) + NFC(y)wrong Buffer overflows & identity mismatches

Casing: Buffer Overflows OperationUTFFactorSample Lower 81.5X U+023A 16, 321XA U+0041 Upper / Title / Fold8, 16, 323X ΐ U+0390

Comparison Vulnerability: Example LDAP doesnt specify comparison operation Two different implementations can use different mechanisms, thus: Malfunction: The user with valid access rights to a certain resource actually cannot access it, because the binary representation of user ID used for the user registry counts as different from the one specified in the access control list. Security Hole: a new user whose ID is equivalent to another user's in the directory system can get the access right to a protected resource.

Comparison Issues Two binary Unicode orders: code point/UTF-8/UTF-32 vs UTF16 order. Case-Sensitive vs Insensitive Language-Sensitive vs Insensitive Normalized vs Not Vendor Differences Regex matching: where important for security, ensure that: conforms to the requirements of [UTS18], andUTS18 uses an up-to-date version of the Unicode Standard for its properties. See Proposed Collation Registry

Other Problems Charset Issues IANA / MIME charset names are ill-defined: vendors often convert the same charset different ways. When converting charsets, dont simply omit characters that cannot be converted. Never use Private Use characters, unassigned characters. Always tag data! Example: tag currencies with an explicit currency ID (from ISO 4217); a "naked" amount may be misinterpreted as the wrong currency. Dont assume currencies, timezones, etc can be derived from locales (but ok to default and then confirm) See Globalization Gotchas

UAX #31: Identifier & Pattern Syntax For identification of entities programming variables, resources, domain names,... Appropriate characters -- stable across versions Not all natural language words: cant U.S.A. Provides foundation: specifications can tailor it for different environments: adding or removing characters.

StringPrep Processing Map Normalize Prohibit Aa c + ¸ç + f + i & /., …

UAX #15: Unicode Normalization Forms Normalizes most visually confusable sequences to unique form c + ¸ ç + f + i Core part of StringPrep, other Identifier Profiles

Domain Names StringUTF-16Internal - IDNA 1a ät.com E F 006D xn--t-zfa.com 1b ät.com 00E E F 006D xn--t-zfa.com 2a tοp.com BF E F 006D xn--tp-jbc.com 2b tοp.com F E F 006D top.com 4a so̷s.com F E F 006D xn--sos-rjc.com 4b søs.com F E F 006D xn--ss-lka.com

UTR #36: Security Recommendations General Security Issues (not just IDN) V1 approved mid-2005; V2 in progress Describes the problems, recommends best practices Users Programmers User-Agents (browsers, , office apps) Registries Registrars

UTS #39: Security Mechanisms Supplies data /algorithms for implementations Restricted character repertoire: Based on Unicode Identifier Profile Intersect with current NamePrep Characters scripts, confusable characters Originally in UTR #36 Version 1; split out for clarity

Current NamePrep Unicode Identifiers a œ и ש س 2 … § / … http ://unicode.org/reports/tr36/idn-chars.html U3.2 Symbols (2,974) Non-Mod. (52,842) U3.2 Alphanum* (37,200) … U5.0 Alphanum* (+2,810)

Restriction Levels* Highly Restrictive Single script, or from limited combinations: Han + Hiragana + Katakana Only Identifier Profile: Letters, Numbers; no Symbols, Punctuation,… Moderately Restrictive: Allow Latin with others except Cyrillic, Greek, Cherokee: ip-.co.jpx خدمة rss.eg Minimally Restrictive: Allow arbitrary mixtures of scripts: sony-βίντεο.grигро-shop.com Subject also to restrictions on confusables

Q & A

Backup Slides

Agenda Unicode Background Security Issues

ICANN Guidelines v2 Improvement on v1,… but needs new revision: Procedurally Insufficient time for thorough review The disposition (with rationale) of comments not available Only single cycle of public review Technically Any specification needs a much clearer structure – the exact implications of a claim to adhere to the guidelines are currently impossible to measure, and useless for security #3 (script/language limitations) has far too many loopholes. #4 (symbols) is too permissive, and not well-defined #5 (registration) should use the post-namepreped form

Guideline 3 (lang./script limitations) a)Associate with script except with language and script, or except with set of languages, or except with more than one designator b)Publish set of code points, define variant code points; indicate script/language. Why language? (too fuzzy to be testable) Why script? (derivable from characters) c)Single script in label, except when language requires, except with mixed-script confusables, except with policy & table defined. Who decides when required? Allows single-script confusables. d)All registry policies documented and publicly available, with table for each set of code points Machine readable? Discursive description?

Guideline 4 (disallowed symbols) a)Line symbol-drawing characters (as those in the Unicode Box Drawing block) One small set of the many symbols b)Symbols and icons that are neither alphanumeric nor ideographic language characters, Numbers? Combining Marks? Letter modifiers? Kana length mark? Ill-defined, untestable. c)Characters with well-established functions as protocol elements / is confusable with a protocol element but isnt one. Ill-defined, untestable. d)Punctuation marks used solely to indicate the structure of sentences Em-dash? Who decides? Ill-defined, untestable. e)Punctuation marks that are used within words except essential to the language & associated with explicit prescriptive rules Ill-defined, untestable. f)Except under corresponding conditions, a single specified character may be used as a separator within a label, … by designating a functionally equivalent punctuation mark from within the script. Ill-defined, untestable.

Guideline 5 (registration) A registry will define an IDN registration in terms of both its Unicode and ASCII-encoded representations. Should use output Unicode representation (after mapping and normalization): otherwise many more visually confusable characters are present Should say ACE, not ASCII.

Unicode Recommendations Precise Specification, Mechanically Testable: Guideline 3 (script/language limitations) Publicly document the Restriction Level being enforced ( Level 4) Publicly document the enforcement policy on confusables: whether any two domain names are allowed to be whole-script or mixed script confusables according to [UTR39].UTR39 Guideline 4 (symbols) Only characters in IDN Security Profiles for Identifiers [UTR39].UTR39 Guideline 5 (registration) Define an IDN registration in terms of its: Nameprep-Normalized Unicode representation (output format) ACE representation Work with IETF to update NamePrep to Unicode 5.0