Unicode Support¶

Background:

See UTR 17, Unicode Character Encoding Model - if you're brave enough to tackle the mysteries of CCSs, CEFs, CESs, etc.
See also Sections 3.8, 9, 10 of Unicode 5 for more punishment.
See also the ICU page for lots of detailed documentation on how Unicode is supposed to work in running, software, including discussions of what can possibly go wrong.
There are three "encoding" forms, UTF-8, UTF-16, and UTF-32; there are also UCS-2 and UCS-4.
JSON must be unicode
The default encoding form of JSON is utf-8 unicode, which effectively means it must be supported, but JSON data can also be delivered in the other two forms
SPARQL syntax is UTF-8 Unicode: "The encoding is always UTF-8 [RFC3629]. Unicode code points may also be expressed using an \uXXXX (U+0 to U+FFFF) or \UXXXXXXXX syntax (for U+10000 onwards) where X is a hexadecimal digit [0-9A-F]". In other words, the SPARQL must detect and reject non-utf-8. But it isn't clear if a conformant SPARQL parser must accept unicode expressed with escapes (which is essentially utf-7).

Requirements:

The XML header of a result should always explicitly declare the encoding
Content negotiation (Accept-Charset, Content-Type 'charset' parameter, etc.) should be used to specify encodings and forms
A SPARQL query whose Accept header specifies JSON must always return results in utf-8 if no other Charset is requested

Other:

Files (0)

Mulgara