indigo. i18n. characters

Created12.07.2005
Last modified18.07.2005

This module implements Unicode character properties.  The names of the functions are very similar to the POSIX ctype functions, like isAlpha(), isAlNum(), toLower() etc.  You can use this module instead of std.ctype or indigo.posix.ctype.

Summary
Is the UTF-16 code unit a surrogate (U+D800..U+DFFF)?
Is the UTF-16 code unit a lead surrogate (U+D800..U+DBFF)?
Is the UTF-16 code unit a trail surrogate (U+DC00..U+DFFF)?
Converts the two UTF-16 code units lead and trail, which must be a lead and a trail surrogate, to the equivalent Unicode code point.
Returns true if the Unicode code point ch is alphabetic, that is for general category “L” and some other code points.
Returns true if the Unicode code point ch is a lowercase character, that is for general category “Ll” and some other code points.
Returns true if the Unicode code point ch is an uppercase character, that is for general category “Lu” and some other code points.
Returns true if the Unicode code point ch is a whitespace.
Returns true if the Unicode code point ch is a decimal digit, that is for general category “Nd”.
Returns true if the Unicode code point ch is a hexadecimal digit.
Returns true if the Unicode code point ch is alphabetic or a decimal digit.
Returns true if the Unicode code point ch is “defined”, that means it is assigned a character.
Returns true if the Unicode code point ch is a control character.
Returns the simple lowercase mapping for ch, or ch itself if there is no such mapping.
Returns the simple uppercase mapping for ch, or ch itself if there is no such mapping.

Functions

isSurrogate()

int isSurrogate(wchar ch)

Is the UTF-16 code unit a surrogate (U+D800..U+DFFF)?

isLeadSurrogate()

int isLeadSurrogate(wchar ch)

Is the UTF-16 code unit a lead surrogate (U+D800..U+DBFF)?

isTrailSurrogate()

int isTrailSurrogate(wchar ch)

Is the UTF-16 code unit a trail surrogate (U+DC00..U+DFFF)?

toUtf32()

dchar toUtf32(wchar lead,
wchar trail)

Converts the two UTF-16 code units lead and trail, which must be a lead and a trail surrogate, to the equivalent Unicode code point.  This function does not check for wrong encodings in release mode.

isAlpha()

public int isAlpha(dchar ch)

Returns true if the Unicode code point ch is alphabetic, that is for general category “L” and some other code points.

isLower()

public int isLower(dchar ch)

Returns true if the Unicode code point ch is a lowercase character, that is for general category “Ll” and some other code points.

See also isUpper().

isUpper()

public int isUpper(dchar ch)

Returns true if the Unicode code point ch is an uppercase character, that is for general category “Lu” and some other code points.

See also isLower().

isSpace()

public int isSpace(dchar ch)

Returns true if the Unicode code point ch is a whitespace.  This does not only apply to the ASCII space, linefeed or tab, but also to a lot of other Unicode characters from category Z.

isDigit()

public int isDigit(dchar ch)

Returns true if the Unicode code point ch is a decimal digit, that is for general category “Nd”.

isHexDigit()

public int isHexDigit(dchar ch)

Returns true if the Unicode code point ch is a hexadecimal digit.

isAlNum()

public int isAlNum(dchar ch)

Returns true if the Unicode code point ch is alphabetic or a decimal digit.  This is equivalent to isAlpha() or isDigit().

isDefined()

public int isDefined(dchar ch)

Returns true if the Unicode code point ch is “defined”, that means it is assigned a character.

isControl()

public int isControl(dchar ch)

Returns true if the Unicode code point ch is a control character.  This is currently true for characters with the general category “Cc”, but that might change (expand to other characters) in the future.

toLower()

public dchar toLower(dchar ch) // TODO Insert link to our collate() function.

Returns the simple lowercase mapping for ch, or ch itself if there is no such mapping.  Note that this function is not sufficient for all languages, thus you should use collation if possible.

See also toUpper().

toUpper()

public dchar toUpper(dchar ch)

Returns the simple uppercase mapping for ch, or ch itself if there is no such mapping.  Note that this function is not sufficient for all languages, thus you should use collation if possible.

See also toLower().

int isSurrogate(wchar ch)
Is the UTF-16 code unit a surrogate (U+D800..U+DFFF)?
int isLeadSurrogate(wchar ch)
Is the UTF-16 code unit a lead surrogate (U+D800..U+DBFF)?
int isTrailSurrogate(wchar ch)
Is the UTF-16 code unit a trail surrogate (U+DC00..U+DFFF)?
dchar toUtf32(wchar lead,
wchar trail)
Converts the two UTF-16 code units lead and trail, which must be a lead and a trail surrogate, to the equivalent Unicode code point.
public int isAlpha(dchar ch)
Returns true if the Unicode code point ch is alphabetic, that is for general category “L” and some other code points.
public int isLower(dchar ch)
Returns true if the Unicode code point ch is a lowercase character, that is for general category “Ll” and some other code points.
public int isUpper(dchar ch)
Returns true if the Unicode code point ch is an uppercase character, that is for general category “Lu” and some other code points.
public int isSpace(dchar ch)
Returns true if the Unicode code point ch is a whitespace.
public int isDigit(dchar ch)
Returns true if the Unicode code point ch is a decimal digit, that is for general category “Nd”.
public int isHexDigit(dchar ch)
Returns true if the Unicode code point ch is a hexadecimal digit.
public int isAlNum(dchar ch)
Returns true if the Unicode code point ch is alphabetic or a decimal digit.
public int isDefined(dchar ch)
Returns true if the Unicode code point ch is “defined”, that means it is assigned a character.
public int isControl(dchar ch)
Returns true if the Unicode code point ch is a control character.
public dchar toLower(dchar ch) // TODO Insert link to our collate() function.
Returns the simple lowercase mapping for ch, or ch itself if there is no such mapping.
public dchar toUpper(dchar ch)
Returns the simple uppercase mapping for ch, or ch itself if there is no such mapping.