(10) clarification of 'charset' related terminology, plus typos
Section 5.13 makes use of legacy terminology related to character
sets and "charsets". It should use the stable, IESG policy based
IETF terminology.
a)
Hence, the first paragraph on page 22:
Attribute values are octet strings, and MAY use any octet value
except 0x00 (Nul), 0x0A (LF), and 0x0D (CR). By default, attribute
values are to be interpreted as in ISO-10646 character set with
UTF-8
encoding. Unlike other text fields, attribute values are NOT
normally affected by the "charset" attribute as this would make
comparisons against known values problematic. However, when an
attribute is defined, it can be defined to be charset dependent, in
which case its value should be interpreted in the session charset
rather than in ISO-10646.
should better say:
Attribute values are octet strings, and MAY use any octet value
except 0x00 (Nul), 0x0A (LF), and 0x0D (CR). By default, attribute
| values are to be interpreted as in the ISO-10646 character set with
UTF-8 encoding. Unlike other text fields, attribute values are NOT
normally affected by the "charset" attribute as this would make
comparisons against known values problematic. However, when an
attribute is defined, it can be defined to be charset dependent, in
which case its value should be interpreted in the session charset
rather than in UTF-8.
Note: RFC 1815 has been deprecated; 'ISO-10646' is *not* considered a
"charset", whereas 'UTF-8' is.
b)
In Section 6, at the bottom of page 28, the RFC says:
a=charset:<character set>
This specifies the character set to be used to display the
session name and information data. By default, the ISO-10646
character set in UTF-8 encoding is used. If a more compact
representation is required, other character sets may be used.
For example, the ISO 8859-1 is specified with the
following SDP
attribute:
a=charset:ISO-8859-1
The above headline should say:
a=charset:<charset>
or perhaps even better:
a=charset:<IANA-charset>
and the text body above should be changed to say:
| This specifies the character set and encoding ("charset")
to be
| used for the session name and information data. By default,
| the ISO-10646 character set in UTF-8 encoding (charset
"UTF-8")
is used. If a more compact representation is required, other
| charsets may be used. For example, the ISO 8859-1 charset is
specified with the following SDP attribute:
[...]
Note: The session description does not determine how parts of it
are *displayed* (theres may be some transcoding used, and fonts
or typefaces, etc., the charset must only be specified for SDP.)
The subsequent text on page 29,
This is a session-level attribute and is not dependent on
charset. The charset specified MUST be one of those
registered
with IANA, such as ISO-8859-1. The character set
identifier is
a US-ASCII string and MUST be compared against the IANA
identifiers using a case-insensitive comparison. If the
identifier is not recognised or not supported, all strings
that
are affected by it SHOULD be regarded as octet strings.
Note that a character set specified MUST still prohibit
the use
of bytes 0x00 (Nul), 0x0A (LF), and 0x0d (CR). Character
sets
requiring the use of these characters MUST define a quoting
mechanism that prevents these bytes from appearing within
text
fields.
should be changed to say (fixing typos as well):
This is a session-level attribute and is not dependent on
charset. The charset specified MUST be one of those
registered
| with IANA, such as ISO-8859-1. The charset identifier is a
US-ASCII string and MUST be compared against the IANA
identifiers using a case-insensitive comparison. If the
identifier is not recognised or not supported, all strings
that
are affected by it SHOULD be regarded as octet strings.
| Note that a charset specified MUST still prohibit the use of
| bytes 0x00 (NUL), 0x0A (LF), and 0x0D (CR). Charsets
requiring
the use of these characters MUST define a quoting mechanism
that prevents these bytes from appearing within text fields.