[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[MMUSIC] RFC 4566bis - issue 10 - clarification of 'charset' related terminology, plus typos



On 10 Nov 2006, at 00:10, Alfred HÎnes wrote:
(10) clarification of 'charset' related terminology, plus typos

Section 5.13 makes use of legacy terminology related to character
sets and "charsets".  It should use the stable, IESG policy based
IETF terminology.

a)
Hence, the first paragraph on page 22:

   Attribute values are octet strings, and MAY use any octet value
   except 0x00 (Nul), 0x0A (LF), and 0x0D (CR).  By default, attribute
values are to be interpreted as in ISO-10646 character set with UTF-8
   encoding.  Unlike other text fields, attribute values are NOT
   normally affected by the "charset" attribute as this would make
   comparisons against known values problematic.  However, when an
   attribute is defined, it can be defined to be charset dependent, in
   which case its value should be interpreted in the session charset
   rather than in ISO-10646.

should better say:

   Attribute values are octet strings, and MAY use any octet value
   except 0x00 (Nul), 0x0A (LF), and 0x0D (CR).  By default, attribute
|  values are to be interpreted as in the ISO-10646 character set with
   UTF-8 encoding.  Unlike other text fields, attribute values are NOT
   normally affected by the "charset" attribute as this would make
   comparisons against known values problematic.  However, when an
   attribute is defined, it can be defined to be charset dependent, in
   which case its value should be interpreted in the session charset
   rather than in UTF-8.

Note: RFC 1815 has been deprecated; 'ISO-10646' is *not* considered a
"charset", whereas 'UTF-8' is.

b)
In Section 6, at the bottom of page 28, the RFC says:

      a=charset:<character set>

         This specifies the character set to be used to display the
         session name and information data.  By default, the ISO-10646
         character set in UTF-8 encoding is used.  If a more compact
         representation is required, other character sets may be used.
For example, the ISO 8859-1 is specified with the following SDP
         attribute:

            a=charset:ISO-8859-1

The above headline should say:

      a=charset:<charset>

or perhaps even better:

      a=charset:<IANA-charset>

and the text body above should be changed to say:

| This specifies the character set and encoding ("charset") to be
|        used for the session name and information data.  By default,
| the ISO-10646 character set in UTF-8 encoding (charset "UTF-8")
         is used.  If a more compact representation is required, other
|        charsets may be used.  For example, the ISO 8859-1 charset is
         specified with the following SDP attribute:
         [...]

Note: The session description does not determine how parts of it
are *displayed* (theres may be some transcoding used, and fonts
or typefaces, etc., the charset must only be specified for SDP.)

The subsequent text on page 29,

         This is a session-level attribute and is not dependent on
charset. The charset specified MUST be one of those registered with IANA, such as ISO-8859-1. The character set identifier is
         a US-ASCII string and MUST be compared against the IANA
         identifiers using a case-insensitive comparison.  If the
identifier is not recognised or not supported, all strings that
         are affected by it SHOULD be regarded as octet strings.

Note that a character set specified MUST still prohibit the use of bytes 0x00 (Nul), 0x0A (LF), and 0x0d (CR). Character sets
         requiring the use of these characters MUST define a quoting
mechanism that prevents these bytes from appearing within text
         fields.

should be changed to say (fixing typos as well):

         This is a session-level attribute and is not dependent on
charset. The charset specified MUST be one of those registered
|        with IANA, such as ISO-8859-1.  The charset identifier is a
         US-ASCII string and MUST be compared against the IANA
         identifiers using a case-insensitive comparison.  If the
identifier is not recognised or not supported, all strings that
         are affected by it SHOULD be regarded as octet strings.

|        Note that a charset specified MUST still prohibit the use of
| bytes 0x00 (NUL), 0x0A (LF), and 0x0D (CR). Charsets requiring
         the use of these characters MUST define a quoting mechanism
         that prevents these bytes from appearing within text fields.

This all sounds plausible, but I don't have the experience to comment. Any input from the working group before I ask the Applications Area Directors?

--
Colin Perkins
http://csperkins.org/



_______________________________________________
mmusic mailing list
mmusic at ietf.org
https://www.ietf.org/mailman/listinfo/mmusic