Copyright © 2002-2004 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use, and software licensing rules apply.
Internet Media Types are an important part of the Web architecture. This TAG Finding discusses two aspects of Internet Media Types: registration by W3C Working Groups and consistency in the communication of character encoding information.
This document is an approved Finding of the W3C Technical Architecture Group (TAG). The TAG approved this Finding at its 26 April 2004 teleconference. Although the Editor (Tim Bray) was no longer a TAG participant on this date, he approved the current updates.
The previous approved version of this Finding was published by the TAG 3 June 2002 and revised 4 September 2002. The current version takes into account new processes for W3C Working Groups to register media types with IANA and the publication of [AUTHMETA], which supersedes the (deleted) section on inconsistency between metadata and data.
This Finding was derived from discussion of TAG issues w3cMediaType-1, customMediaType-2, and nsMediaType-3 but in some cases extend beyond the specifics of the issue that was raised.
Additional TAG Findings, both approved and in draft state, may also be available. The TAG expects to incorporate this and other Findings into a Web Architecture Document that will be published according to the process of the W3C Recommendation Track.
The terms MUST, SHOULD, and SHOULD NOT are used in this document in accordance with RFC 2119 [RFC2119].
Please send comments on this Finding to the publicly archived TAG mailing list [email protected] (archive).
W3C Working Groups engaged in defining a format follow How to Register a Media Type with IANA [IANAREG] to register an Internet Media Type (defined in [RFC2046]) for the format.
Web architecture depends on applications having a shared understanding of the messages exchanged between agents (for example, clients, servers, and intermediaries) and a shared expectation of how the payload of a message -- a representation -- will be interpreted by the recipient. The Web architecture uses representation metadata, when supported by the communication protocol, to indicate the sender's intentions to the recipient. The TAG Finding Authoritative Metadata [AUTHMETA] discusses a number of problems that arise when metadata and data are inconsistent.
One such inconsistency that has been been observed on the Web is between the character encoding of XML content (in an HTTP message body) and metadata about the character encoding. When users read XML content, such inconsistencies are quickly detected; these inconsistencies may be more elusive when XML is exchanged by processors.
Section 7.1 of [RFC3023] states:
The use of the charset parameter is STRONGLY RECOMMENDED, since this information can be used by XML processors to determine authoritatively the charset of the XML MIME entity.
and states that when used it is always authoritative. However, a receiving application can, with very high reliability, determine the encoding of an XML document by reading it, without reference to any external headers and this is reflected by RFC 3023 in the following sections:
Thus there is no ambiguity when the charset is omitted, and the STRONGLY RECOMMENDED injunction to use the charset is misplaced for application/xml and for non-text "+xml" types. Consequently, for XML representations, server-side applications SHOULD only supply a charset header when there is complete certainty as to the encoding in use. Otherwise, an error will cause a perfectly usable representation to be rejected by an architecturally sound client.
We recommend that section 7.1 of [RFC3023] be amended to something like the following:
The use of the charset parameter, when the charset is reliably known and agrees with the encoding declaration, is RECOMMENDED, since this information can be used by non-XML processors to determine authoritatively the charset of the XML MIME entity.