[Ltru] Private Use Tags

(This is a re-send of an e-mail I originally sent to the authors of a 
previous draft; I have since been educated as to the proper way to comment.)

Dear Mr. Phillips and Mr. Davis,

First, please forgive me if I'm not following proper procedure in 
commenting on this draft; while I do have a strong programmer's interest 
in this standard, I admit that I'm not typically a participant in these 
procedures and haven't thoroughly educated myself on the policies for 
submitting comments.

I would like to recommend an addition to this draft, for which I think I 
can make a rather compelling case based on hypothetical but quite 
reasonable scenarios. Personally, I hope very much that your draft 
becomes a standard, as the problems with a canonical parsing of current 
RFC 3066 language tags are well-known and bothersome to developers 
everywhere. Your draft strikes me as an excellent way to finally 
standardize the practice in a way which will be accessible to all 
developers without having to investigate thirty different standards and 
documents from ten different organizations.

Regarding Section 3.4 on extensions and extension namespace: You already 
have here a mechanism in place for extending this specification. I would 
like to suggest an extension which should probably be incorporated into 
the main specification. I believe you should define an "organization 
convention" extension for use by private companies and organizations for 
their own purposes.

I realize that a "private use" extension is already defined in section 
2.2.7. However, I maintain that the private use extension is not 
sufficient for potential development and interdevelopment among 
important organizations, as there is no way a parsing agent could assume 
anything significant about the tags which follow. And yet, the 
registration of 3.4 extensions is also insufficient because, frankly, 
you'll rapidly run out of letters if you make a sincere effort to define 
namespace for private companies and organizations.

Let's take a concrete example. Let's say that the American Library 
Association (ALA) decides to define an extension to help them classify 
books by reading level. As your specification stands, they have two 
choices: they can register a 3.4 extension (we'll say they register "L") 
and then use their subtags as follows:

en-US-L-g6: A book written in English as spoken in the United States at 
the sixth-grade reading level.

The ALA would have excellent reasons for wanting such a tag, as it would 
greatly facilitate the database querying and transfer of material to 
public schools.

However, we see the first problem: the ALA has their tag, which many 
schools would use. Then, Associated Press would want their tag to 
indicate regional assumptions. We'll give them "P" (for "press"):

en-US-P-ky: An article written in English as spoken in the United States 
which assumes readers are already familiar with names, cities, politics, 
etc., in Kentucky. (They would use this to distribute versions to 
Kentucky press where they don't have to explain that Frankfurt is the 
capital, distinguishing them from national or international versions 
which would make no such assumption and explicitly specify that 
Frankfurt is the capital.)

If we keep up like this, as I mentioned, we'll rapidly run out of 
singleton letters. Everyone will want one, some for valid reasons, 
others for silly reasons, and then your registration authority would be 
in the unenviable position of having to make value judgments regarding 
what is valid and what is silly, given such limited real estate.

Furthermore, you'll be putting the organizations themselves in a 
difficult position. For example, if the ALA decides to modify their 
convention, this is something that is only of interest to them and the 
people who use their specification. However, in order to make their own 
internal changes, they will technically have to go through the entire 
process of revising a stable specification through the registration 
authority (according to 3.4, which requires stability and canonical 
representation), something which is never recommendable.

And finally, parsing agents which have no interest in the ALA's tag 
(which will be most of them) will nonetheless have the burden of 
checking conformance.

If we take the other approach, and say, "We have the 'x' tag for private 
use. The ALA and AP can take that tag and follow it up however they 
want," then we're creating another problem. All of the parsing agents 
which do have an interest in those tags cannot be guaranteed that they 
mean what they think they mean.

For example, if the ALA decides to go with:

en-US-x-ala-g6

But subsequently the Associate Press decides that their private tag 
"x-ala" means articles of interest to Alabamans, then what's the ALA do 
to when they want to classify articles written by the AP? The problem is 
that parsing user agents will be unable to assume anything about the tag 
that follows, and once a conflict occurs, both tags become either 
useless, or subject to the type of interpretation that a human might 
perform easily but a machine cannot.

The solution is simply to define an organizational namespace. We take a 
random tag--we'll say "P" for private--and then allow companies and 
organizations to register their own namespace. Everything that follows 
their namespace tag is then interpreted according to their standard, 
whatever that may be. For example, the ALA would register "ala," the AP 
would register "ap," Microsoft would register "mcrsoft," Adobe would 
register "adobe" and so on.

Then, anyone seeing a tag like this:

en-US-P-ala-g6

could know unambiguously that whatever follows the P-ala is to be 
interpreted by the ALA's own convention, whatever that might be. Each 
registering organization could then be responsible for the stability and 
canonical representations of their own namespace without affecting the 
stability of the specification as a whole.

Parsing agents which are not interested in the AP's tags simply knows to 
ignore anything after the "P" tag that isn't an organization in which it 
has an interest. Parsing agents that are interested can now know with 
assurance that the information is what they're looking for. Companies 
and organizations can establish their own standards which can easily 
evolve to suit their needs. Private companies can establish 
compatibility standards between themselves which won't affect the 
specification as a whole.

This could be infinitely extensible merely by setting aside one of the 
organizational tags to mean "check the next set." For example, if the 
American Library association registers "ala" as above, and then later 
the Association of Libertarians and Anarchists shows up, finds that all 
the mnemonic representations of their name are already used and there's 
not much space left on the registery (and with 368 alphanumeric 
possibilities, that's not likely, but let's pretend), they could define 
their namespace as "set2-ala" (assuming we've already decided that 
"set2" is the tag when means "check the next set").

This allows all companies and organizations which have a need to define 
their own namespaces and then use them as the needs of their particular 
domain indicate in a way that is nonetheless unambiguously established 
for parsing agents which can then make error-free decisions about 
whether or not the information which follows is useful to their needs, 
all done without sacrificing the stability of the main specification.

This is the extent of my speculation on the issue. I did consider the 
possibility of using Java-package-name-like identifiers tied to domain 
registration, so that Microsoft could have the "com-microsoft" tag and 
the ALA could have the "org-ala" tag, but this would end up violating 
the eight-character rule and allow just any yahoo with a website to 
include whatever he sees fit (en-US-com-sexychicks-38D comes to mind), 
which I don't think is a desirable solution at all.

If you have found this comment at all useful, I would appreciate hearing 
back.

Sincerely,
Dylan N. Pierce
IT Coordinator, TykeTek

TykeTek/Diapositivas Gloria
Salvador Quevedo y Zubieta #821 Int. 6
Col. la Perla
C.P. 44360 Guadalajara, Jal.
MEXICO

E-Mail: dylanpierce@megared.net.mx
Telephone: +52 (33) 3617.3660
Cellular: +52 (33) 1149.7057

_______________________________________________
Ltru mailing list
Ltru@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/ltru