indigo. i18n. textcodec| Created | 23.07.2005 | | Last modified | 11.08.2005 |
Provides the TextCodec base class to encode and decode character encodings. Summary | | | This exception is thrown by TextCodec and its subclasses. | | | | Creates a new exception with the message msg. | | The TextCodec class is the base class of all classes used to translate character encodings. | | | | Constructs a new TextCodec and registers it. | | Encodes the input, and returns the result. | | Decodes the input, and returns the UTF-16 result. | | | | | | Returns a list of the names and aliases for all available codecs. | | Returns the list of the MIBs for all available codecs. | | Returns the codec most suitable for this locale. | | Returns the TextCodec that matches the mibEnum value mib. | | Returns the TextCodec that matches name. | | | | | | | | Returns the MIBenum value (as defined by IANA) of the encoding. | | Returns the name of the encoding. | | Returns a list of aliases for this codec. | | | | Returns true if this codec can fully encode the character ch, false otherwise. | | Returns true if this codec can fully encode the string str, false otherwise. | | Returns true if this codec can fully encode the string str, false otherwise. | | Encodes the string str, which is in UTF-16 format. | | Encodes the string str, which is in UTF-8 format. | | Decodes the string str and returns the UTF-16 encoded result. | | Decodes the string str and returns the UTF-8 encoded result. | | Returns the number of bytes that must be added to input for it to be a complete character according to this encoding. | | Returns the maximum length in bytes of a character under this encoding. |
CodecExceptionThis exception is thrown by TextCodec and its subclasses. Get an error description with toString(). Summary | | | Creates a new exception with the message msg. |
this()Creates a new exception with the message msg.
TextCodecThe TextCodec class is the base class of all classes used to translate character encodings. Summary | | | Constructs a new TextCodec and registers it. | | Encodes the input, and returns the result. | | Decodes the input, and returns the UTF-16 result. | | | | | | Returns a list of the names and aliases for all available codecs. | | Returns the list of the MIBs for all available codecs. | | Returns the codec most suitable for this locale. | | Returns the TextCodec that matches the mibEnum value mib. | | Returns the TextCodec that matches name. | | | | | | | | Returns the MIBenum value (as defined by IANA) of the encoding. | | Returns the name of the encoding. | | Returns a list of aliases for this codec. | | | | Returns true if this codec can fully encode the character ch, false otherwise. | | Returns true if this codec can fully encode the string str, false otherwise. | | Returns true if this codec can fully encode the string str, false otherwise. | | Encodes the string str, which is in UTF-16 format. | | Encodes the string str, which is in UTF-8 format. | | Decodes the string str and returns the UTF-16 encoded result. | | Decodes the string str and returns the UTF-8 encoded result. | | Returns the number of bytes that must be added to input for it to be a complete character according to this encoding. | | Returns the maximum length in bytes of a character under this encoding. |
this()Constructs a new TextCodec and registers it. This constructor needs fully working mibEnum(), name() and aliases() functions before it may be called. It is not reentrant.
convertFromUtf16()| abstract protected void[] convertFromUtf16( | inout | wchar[] | input, | | | void[] | buffer | ) |
|
Encodes the input, and returns the result. input is replaced by an in-place-slice of trailing bytes that could not be converted because they were incomplete. buffer should be used to avoid memory allocation. Subclasses of TextCodec must implement this function. If there are characters in the input that cannot be encoded, a CodecException must be thrown.
convertToUtf16()| abstract protected wchar[] convertToUtf16( | inout | void[] | input, | | | wchar[] | buffer | ) |
|
Decodes the input, and returns the UTF-16 result. input is replaced by an in-place-slice of trailing bytes that could not be converted because they were incomplete. buffer should be used to avoid memory allocation. Subclasses of TextCodec must implement this function. If there are invalid bytes in the input, a CodecException must be thrown. The returned string must be null-terminated (but the null not included in the returned array).
convertFromUtf8()| void[] convertFromUtf8( | inout | char[] | input, | | | void[] | buffer, | | inout | wchar[] | utf16Buffer | ) |
|
Works like convertFromUtf16(), but uses UTF-8 as the input encoding. This function has a default implementation that converts the input into UTF-16 and calls convertFromUtf16(); derived classes can override it if they know a faster way.
convertToUtf8()| char[] convertToUtf8( | inout | void[] | input, | | | char[] | buffer, | | inout | wchar[] | utf16Buffer | ) |
|
Works like convertToUtf16(), but uses UTF-8 as the output encoding. This function has a default implementation that calls convertToUtf16() and converts the output to UTF-8; derived classes can override it if they know a faster way.
availableCodecsReturns a list of the names and aliases for all available codecs. The list may contain many mentions of the same codec if the codec has aliases. See also codecForName(), name, aliases and availableMibs.
codecForLocaleReadReturns the codec most suitable for this locale. WriteSets the locale codec. This might be needed for some applications that want to use their own mechanism for setting the locale.
codecForMib()| static TextCodec codecForMib( | int | mib | ) |
|
Returns the TextCodec that matches the mibEnum value mib. If there is no such codec, null is returned. See also codecForName(). TODO Change implementation as soon as DMD problem is fixed.
codecForName()| static TextCodec codecForName( | char[] | name | ) |
|
Returns the TextCodec that matches name. If there is no such codec, null is returned. name is compared to the codecs’ name and aliases. See also codecForMib(). FIXME This should be case-insensitive at least, better a fuzzy matching (for example “UTF-8” should also be recognized as “utf8”). TODO Change implementation as soon as DMD problem is fixed.
opIndex()| alias codecForName opIndex |
This is an alias for codecForName().
aliasesReturns a list of aliases for this codec. Override this function in derived classes if there are aliases for the codec. The default returns an empty list. For a list of alias names see http://www.iana.org/assignments/character-sets.
canEncode(dchar)| final int canEncode( | dchar | ch | ) |
|
Returns true if this codec can fully encode the character ch, false otherwise.
canEncode(wchar[])| final int canEncode( | wchar[] | str | ) |
|
Returns true if this codec can fully encode the string str, false otherwise.
canEncode(char[])| final int canEncode( | char[] | str | ) |
|
Returns true if this codec can fully encode the string str, false otherwise.
fromUtf16()| final void[] fromUtf16( | wchar[] | str, | | | | void[] | buffer | = | null | ) |
|
Encodes the string str, which is in UTF-16 format. If you specify a buffer, it will be used to store the result into to avoid reallocation.
fromUtf8()| final void[] fromUtf8( | char[] | str, | | | | void[] | buffer | = | null | ) |
|
Encodes the string str, which is in UTF-8 format. If you specify a buffer, it will be used to store the result into to avoid reallocation.
toUtf16()| final wchar[] toUtf16( | void[] | str, | | | | wchar[] | buffer | = | null | ) |
|
Decodes the string str and returns the UTF-16 encoded result. This function throws an exception if str contains an incomplete encoding at the end.
toUtf8()| final char[] toUtf8( | void[] | str, | | | | char[] | buffer | = | null | ) |
|
Decodes the string str and returns the UTF-8 encoded result. This function throws an exception if str contains an incomplete encoding at the end.
moreNeeded()| int moreNeeded( | void[] | input | ) |
|
Returns the number of bytes that must be added to input for it to be a complete character according to this encoding. If an implementation knows the exact number of bytes required then it will return this number; if it only knows the minimum then it will return the negative of this value. If input is null or zero-length, then the return value corresponds to the minimum length of a character under the encoding. This is useful for multibyte encodings, to determine if more bytes need to be read from a stream before a character is extracted. The function should be called only on the beginning or whole of a single character, or on a null or empty string. | The default implementation returns 1 | input.length, and can be used as is for encodings where every character is a single byte. |
maxCharSize()Returns the maximum length in bytes of a character under this encoding. The default implementation simply returns 1, and can be used as is for encodings where every character is a single byte.
|