Unicode » History » Version 4
Gregg -, 09/09/2009 03:30 AM
| 1 | 1 | Gregg - | |
|---|---|---|---|
| 2 | 4 | Gregg - | h1. Unicode Support |
| 3 | |||
| 4 | |||
| 5 | 1 | Gregg - | Background: |
| 6 | |||
| 7 | 4 | Gregg - | * See "UTR 17, Unicode Character Encoding Model":http://unicode.org/reports/tr17/ - if you're brave enough to tackle the mysteries of CCSs, CEFs, CESs, etc. |
| 8 | * See also "Sections 3.8, 9, 10":http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf of Unicode 5 for more punishment. |
||
| 9 | * See also the "ICU":http://site.icu-project.org/ page for lots of detailed documentation on how Unicode is supposed to work in running, software, including discussions of what can possibly go wrong. |
||
| 10 | * There are three "encoding" forms, UTF-8, UTF-16, and UTF-32; there are also UCS-2 and UCS-4. |
||
| 11 | * JSON must be unicode |
||
| 12 | * The default encoding form of JSON is utf-8 unicode, which effectively means it must be supported, but JSON data can also be delivered in the other two forms |
||
| 13 | * SPARQL syntax is UTF-8 Unicode: "The encoding is always UTF-8 [RFC3629]. Unicode code points may also be expressed using an \uXXXX (U+0 to U+FFFF) or \UXXXXXXXX syntax (for U+10000 onwards) where X is a hexadecimal digit [0-9A-F]". In other words, the SPARQL must detect and reject non-utf-8. But it isn't clear if a conformant SPARQL parser _must_ accept unicode expressed with escapes (which is essentially utf-7). |
||
| 14 | 1 | Gregg - | |
| 15 | |||
| 16 | Requirements: |
||
| 17 | |||
| 18 | 4 | Gregg - | * The XML header of a result should always explicitly declare the encoding |
| 19 | * Content negotiation (Accept-Charset, Content-Type 'charset' parameter, etc.) should be used to specify encodings and forms |
||
| 20 | * A SPARQL query whose Accept header specifies JSON must always return results in utf-8 if no other Charset is requested |
||
| 21 | 1 | Gregg - | |
| 22 | Other: |
||
| 23 | 4 | Gregg - | * Acceptance and conversion of other encodings for incoming data? |
| 24 | * Collations? |
||
| 25 | * Date comparisons? |
||
| 26 | * Other locale-specific logic? |