Re: New draft (Was: I-D ACTION:draft-klensin-unicode-escapes-00.txt

"Tim Bray" <tbray@textuality.com> Wed, 31 January 2007 00:45 UTC

DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=Ro+Tr85dixAYYR1zzRXbc8ul3KdCS+WZIp+wv6F2UbuRMWoJ+W9nLz3qCKoksxkCn7CoX+F9EzMxgDQw1YMR9WRFOExFVjpHbuu2PAQlsJqNpNTNHXLUUWgir8b9dOfwpLfaDdey/U6W8/MBqUaKvvLyE2IthEKRQS/Pv3tWVY4=
Message-ID: <517bf110701301645u1a0e5658v3f8beeca2a1136ce@mail.gmail.com>
Date: Tue, 30 Jan 2007 16:45:52 -0800
From: Tim Bray <tbray@textuality.com>
To: John C Klensin <john-ietf@jck.com>
Subject: Re: New draft (Was: I-D ACTION:draft-klensin-unicode-escapes-00.txt
In-Reply-To: <875A124D75A8B481E176CF06@p3.JCK.COM>
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <875A124D75A8B481E176CF06@p3.JCK.COM>
Cc: discuss@apps.ietf.org
Precedence: list
Errors-To: discuss-bounces@apps.ietf.org

Pardon me for being late to this party, I was on vacation in
Australia.  I think this is a positive contribution.

First, a detail point:  In section 5.4, it's probably relevant that
per the Java Language Specification
(http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#95413p)
it's clear that a Java character literal or variable represents, not a
Unicode character, but a UTF-16 code point.   I guess the conclusion
is that it may be OK in certain circumstances to use \uNNNN, but it's
not OK to explain that by calling out to Java.

Second: I think that the discussion shows that the syntax problems
around representing Unicode characters in ASCII and other
Unicode-oblivious texts are tricky; witness the issues with delimiters
and ABNF/case.  This is further evidence, were any needed, that IETF
Working Groups SHOULD NOT specify Internet protocols which may be used
to transfer text but are not capable of representing the Unicode
character set, either by specifying the use of either hard-wired UTF-8
or alternatively XML, both of which have cracked this nut.

So here's a proposed recasting of second para of 1.1:

  When one moves to Unicode [Unicode] [ISO10646], where characters
   occupy two or more octets and may be coded in several different
   forms, the question of escapes becomes even more complicated.  In
   particular, we have seen fairly extensive use of both hexadecimal
   representations of the UTF-8 encoding [RFC3629] of a character and
   variations on the U+NNNN[N[N]] notation commonly used in conjunction
   with the Unicode Standard.

  New protocols that are required to carry textual content SHOULD be designed
  in such a way that the full repertoire of Unicode characters may be
represented
  in that text; UTF-8 and XML are both good options.

  This document proposes that existing protocols being internationalized SHOULD
   use some contextually-appropriate variation of the U+NNNN[N[N]]
notation unless
   other considerations outweigh those described here.

New draft (Was: I-D ACTION:draft-klensin-unicode-… John C Klensin
Re: New draft (Was: I-D ACTION:draft-klensin-unic… John C Klensin
Re: New draft (Was: I-D ACTION:draft-klensin-unic… Tim Bray
Re: New draft (Was: I-D ACTION:draft-klensin-unic… John C Klensin
Re: New draft (Was: I-D ACTION:draft-klensin-unic… Tim Bray
Re: New draft (Was: I-D ACTION:draft-klensin-unic… John C Klensin
Re: New draft (Was: I-D ACTION:draft-klensin-unic… Clive D.W. Feather
Re: New draft (Was: I-D ACTION:draft-klensin-unic… Clive D.W. Feather
Re: New draft (Was: I-D ACTION:draft-klensin-unic… Stephane Bortzmeyer
I-D.klensin-unicode-escapes (was: New Draft) Frank Ellermann
I-D.klensin-unicode-escapes (was: New Draft) Frank Ellermann
ABNF (was: New draft) Frank Ellermann
Re: New draft (Was: I-D ACTION:draft-klensin-unic… Clive D.W. Feather
Re: I-D.klensin-unicode-escapes (was: New Draft) Clive D.W. Feather
Re: I-D.klensin-unicode-escapes (was: New Draft) Clive D.W. Feather
Re: ABNF (was: New draft) Clive D.W. Feather
Re: ABNF Frank Ellermann
draft-klensin-unicode-escapes-01 (was: New Draft) John C Klensin
Re: I-D.klensin-unicode-escapes Frank Ellermann
Re: I-D.klensin-unicode-escapes John C Klensin
Re: draft-klensin-unicode-escapes-01 Frank Ellermann
Re: I-D.klensin-unicode-escapes (was: New Draft) Stephane Bortzmeyer
Re: I-D.klensin-unicode-escapes (was: New Draft) John C Klensin
Re: draft-klensin-unicode-escapes-01 (was: New Dr… Clive D.W. Feather
Re: draft-klensin-unicode-escapes-01 (was: New Dr… John C Klensin
Re: draft-klensin-unicode-escapes-01 (was: New Dr… Clive D.W. Feather