indigo. i18n. textcodec

Created23.07.2005
Last modified11.08.2005

Provides the TextCodec base class to encode and decode character encodings.

Summary
This exception is thrown by TextCodec and its subclasses.
Creates a new exception with the message msg.
The TextCodec class is the base class of all classes used to translate character encodings.
Constructs a new TextCodec and registers it.
Encodes the input, and returns the result.
Decodes the input, and returns the UTF-16 result.
Works like convertFromUtf16(), but uses UTF-8 as the input encoding.
Works like convertToUtf16(), but uses UTF-8 as the output encoding.
Returns a list of the names and aliases for all available codecs.
Returns the list of the MIBs for all available codecs.
Returns the codec most suitable for this locale.
Returns the TextCodec that matches the mibEnum value mib.
Returns the TextCodec that matches name.
This is an alias for codecForName().
Returns the MIBenum value (as defined by IANA) of the encoding.
Returns the name of the encoding.
Returns a list of aliases for this codec.
Returns true if this codec can fully encode the character ch, false otherwise.
Returns true if this codec can fully encode the string str, false otherwise.
Returns true if this codec can fully encode the string str, false otherwise.
Encodes the string str, which is in UTF-16 format.
Encodes the string str, which is in UTF-8 format.
Decodes the string str and returns the UTF-16 encoded result.
Decodes the string str and returns the UTF-8 encoded result.
Returns the number of bytes that must be added to input for it to be a complete character according to this encoding.
Returns the maximum length in bytes of a character under this encoding.

CodecException

This exception is thrown by TextCodec and its subclasses.  Get an error description with toString().

Summary
Creates a new exception with the message msg.

Functions

this()

public this(char[] msg)

Creates a new exception with the message msg.

TextCodec

The TextCodec class is the base class of all classes used to translate character encodings.

Summary
Constructs a new TextCodec and registers it.
Encodes the input, and returns the result.
Decodes the input, and returns the UTF-16 result.
Works like convertFromUtf16(), but uses UTF-8 as the input encoding.
Works like convertToUtf16(), but uses UTF-8 as the output encoding.
Returns a list of the names and aliases for all available codecs.
Returns the list of the MIBs for all available codecs.
Returns the codec most suitable for this locale.
Returns the TextCodec that matches the mibEnum value mib.
Returns the TextCodec that matches name.
This is an alias for codecForName().
Returns the MIBenum value (as defined by IANA) of the encoding.
Returns the name of the encoding.
Returns a list of aliases for this codec.
Returns true if this codec can fully encode the character ch, false otherwise.
Returns true if this codec can fully encode the string str, false otherwise.
Returns true if this codec can fully encode the string str, false otherwise.
Encodes the string str, which is in UTF-16 format.
Encodes the string str, which is in UTF-8 format.
Decodes the string str and returns the UTF-16 encoded result.
Decodes the string str and returns the UTF-8 encoded result.
Returns the number of bytes that must be added to input for it to be a complete character according to this encoding.
Returns the maximum length in bytes of a character under this encoding.

Functions and Properties

this()

protected this()

Constructs a new TextCodec and registers it.  This constructor needs fully working mibEnum(), name() and aliases() functions before it may be called.  It is not reentrant.

convertFromUtf16()

abstract protected void[] convertFromUtf16(inout wchar[] input,
void[] buffer)

Encodes the input, and returns the result. input is replaced by an in-place-slice of trailing bytes that could not be converted because they were incomplete. buffer should be used to avoid memory allocation.  Subclasses of TextCodec must implement this function.  If there are characters in the input that cannot be encoded, a CodecException must be thrown.

convertToUtf16()

abstract protected wchar[] convertToUtf16(inout void[] input,
wchar[] buffer)

Decodes the input, and returns the UTF-16 result. input is replaced by an in-place-slice of trailing bytes that could not be converted because they were incomplete. buffer should be used to avoid memory allocation.  Subclasses of TextCodec must implement this function.  If there are invalid bytes in the input, a CodecException must be thrown.  The returned string must be null-terminated (but the null not included in the returned array).

convertFromUtf8()

void[] convertFromUtf8(inout char[] input,
void[] buffer,
inout wchar[] utf16Buffer)

Works like convertFromUtf16(), but uses UTF-8 as the input encoding.  This function has a default implementation that converts the input into UTF-16 and calls convertFromUtf16(); derived classes can override it if they know a faster way.

convertToUtf8()

char[] convertToUtf8(inout void[] input,
char[] buffer,
inout wchar[] utf16Buffer)

Works like convertToUtf16(), but uses UTF-8 as the output encoding.  This function has a default implementation that calls convertToUtf16() and converts the output to UTF-8; derived classes can override it if they know a faster way.

availableCodecs

Returns a list of the names and aliases for all available codecs.  The list may contain many mentions of the same codec if the codec has aliases.

See also codecForName(), name, aliases and availableMibs.

availableMibs

Returns the list of the MIBs for all available codecs.

See also codecForMib(), mibEnum and availableCodecs.

codecForLocale

Read

Returns the codec most suitable for this locale.

Write

Sets the locale codec.  This might be needed for some applications that want to use their own mechanism for setting the locale.

codecForMib()

static TextCodec codecForMib(int mib)

Returns the TextCodec that matches the mibEnum value mib.  If there is no such codec, null is returned.

See also codecForName().

TODO Change implementation as soon as DMD problem is fixed.

codecForName()

static TextCodec codecForName(char[] name)

Returns the TextCodec that matches name.  If there is no such codec, null is returned. name is compared to the codecs’ name and aliases.

See also codecForMib().

FIXME This should be case-insensitive at least, better a fuzzy matching (for example “UTF-8” should also be recognized as “utf8”).

TODO Change implementation as soon as DMD problem is fixed.

Operators

opIndex()

alias codecForName opIndex

This is an alias for codecForName().

Properties

mibEnum

Returns the MIBenum value (as defined by IANA) of the encoding.  This function must be overloaded by derived classes.  For a list of numbers see http://www.iana.org/assignments/character-sets.

name

Returns the name of the encoding.  It should be the standard IANA name if one exists.  This function must be overloaded by derived classes.  For a list of names see http://www.iana.org/assignments/character-sets.

aliases

Returns a list of aliases for this codec.  Override this function in derived classes if there are aliases for the codec.  The default returns an empty list.  For a list of alias names see http://www.iana.org/assignments/character-sets.

Functions

canEncode(dchar)

final int canEncode(dchar ch)

Returns true if this codec can fully encode the character ch, false otherwise.

canEncode(wchar[])

final int canEncode(wchar[] str)

Returns true if this codec can fully encode the string str, false otherwise.

canEncode(char[])

final int canEncode(char[] str)

Returns true if this codec can fully encode the string str, false otherwise.

fromUtf16()

final void[] fromUtf16(wchar[] str,  
void[] buffer =  null)

Encodes the string str, which is in UTF-16 format.  If you specify a buffer, it will be used to store the result into to avoid reallocation.

fromUtf8()

final void[] fromUtf8(char[] str,  
void[] buffer =  null)

Encodes the string str, which is in UTF-8 format.  If you specify a buffer, it will be used to store the result into to avoid reallocation.

toUtf16()

final wchar[] toUtf16(void[] str,  
wchar[] buffer =  null)

Decodes the string str and returns the UTF-16 encoded result.  This function throws an exception if str contains an incomplete encoding at the end.

toUtf8()

final char[] toUtf8(void[] str,  
char[] buffer =  null)

Decodes the string str and returns the UTF-8 encoded result.  This function throws an exception if str contains an incomplete encoding at the end.

moreNeeded()

int moreNeeded(void[] input)

Returns the number of bytes that must be added to input for it to be a complete character according to this encoding.  If an implementation knows the exact number of bytes required then it will return this number; if it only knows the minimum then it will return the negative of this value.  If input is null or zero-length, then the return value corresponds to the minimum length of a character under the encoding.

This is useful for multibyte encodings, to determine if more bytes need to be read from a stream before a character is extracted.  The function should be called only on the beginning or whole of a single character, or on a null or empty string.

The default implementation returns 1input.length, and can be used as is for encodings where every character is a single byte.

maxCharSize()

size_t maxCharSize()

Returns the maximum length in bytes of a character under this encoding.

The default implementation simply returns 1, and can be used as is for encodings where every character is a single byte.

The TextCodec class is the base class of all classes used to translate character encodings.
public this(char[] msg)
Creates a new exception with the message msg.
protected this()
Constructs a new TextCodec and registers it.
abstract protected void[] convertFromUtf16(inout wchar[] input,
void[] buffer)
Encodes the input, and returns the result.
abstract protected wchar[] convertToUtf16(inout void[] input,
wchar[] buffer)
Decodes the input, and returns the UTF-16 result.
void[] convertFromUtf8(inout char[] input,
void[] buffer,
inout wchar[] utf16Buffer)
Works like convertFromUtf16(), but uses UTF-8 as the input encoding.
char[] convertToUtf8(inout void[] input,
char[] buffer,
inout wchar[] utf16Buffer)
Works like convertToUtf16(), but uses UTF-8 as the output encoding.
static TextCodec codecForMib(int mib)
Returns the TextCodec that matches the mibEnum value mib.
Returns the MIBenum value (as defined by IANA) of the encoding.
static TextCodec codecForName(char[] name)
Returns the TextCodec that matches name.
alias codecForName opIndex
This is an alias for codecForName().
final int canEncode(dchar ch)
Returns true if this codec can fully encode the character ch, false otherwise.
final int canEncode(wchar[] str)
Returns true if this codec can fully encode the string str, false otherwise.
final int canEncode(char[] str)
Returns true if this codec can fully encode the string str, false otherwise.
final void[] fromUtf16(wchar[] str,  
void[] buffer =  null)
Encodes the string str, which is in UTF-16 format.
final void[] fromUtf8(char[] str,  
void[] buffer =  null)
Encodes the string str, which is in UTF-8 format.
final wchar[] toUtf16(void[] str,  
wchar[] buffer =  null)
Decodes the string str and returns the UTF-16 encoded result.
final char[] toUtf8(void[] str,  
char[] buffer =  null)
Decodes the string str and returns the UTF-8 encoded result.
int moreNeeded(void[] input)
Returns the number of bytes that must be added to input for it to be a complete character according to this encoding.
size_t maxCharSize()
Returns the maximum length in bytes of a character under this encoding.
Returns the name of the encoding.
Returns a list of aliases for this codec.
This exception is thrown by TextCodec and its subclasses.
Returns the list of the MIBs for all available codecs.
Returns a list of the names and aliases for all available codecs.