CharacterEncoding
class defines the interface of the byte and
character encodings for predicates and conversions.
deferred String name;
deferred char decode byte b;
b
, i.e. the Unicode character
corresponding to the byte b
in the receiving encoding.
deferred byte encode char c;
c
. If the byte
equivalent of the character c
does not exist in the receiving
encoding, an encoding-condition
is signaled, and the byte encoded is
the byteValue
of the object returned, or 127 if nil
is returned.
deferred boolean isAlpha byte b;
TRUE
the character denoted by the byte b
in the receiving
encoding is a letter.
deferred boolean isDigit byte b;
TRUE
the character denoted by the byte b
in the receiving
encoding is a digit.
deferred boolean isLower byte b;
TRUE
the character denoted by the byte b
in the receiving
encoding is a lowercase letter.
deferred boolean isPunct byte b;
TRUE
the character denoted by the byte b
in the receiving
encoding is a punctuation character.
deferred boolean isSpace byte b;
TRUE
the character denoted by the byte b
in the receiving
encoding is a space character.
deferred boolean isUpper byte b;
TRUE
the character denoted by the byte b
in the receiving
encoding is a uppercase letter.
deferred byte toLower byte b;
b
, according to the
receiving encoding. If the character is not in uppercase, it is
returned unharmed.
deferred byte toUpper byte b;
b
, according to the
receiving encoding. If the character is not in lowercase, it is
returned unharmed.
deferred int digitValue byte b;
b
in the
receiving encoding.
deferred int alphaValue byte b;
b
relative to the start of its
letter range. Thus, 'a' returns 0, 'f' returns 5, etc.
CharEncoding
class maintains information on on
a particular mapping for encoding a subset of Unicode characters to
8-bit bytes. An example of such mappings is iso-8859-1
,
which is the well known western european byte encoding, of which
USASCII
is a subset.
static MutableDictionary encodings;
ByteArray loadBytes int num from String name extension String ext;
num
bytes from the file with the name
and the extension
ext
(sans dot). The full path of the file is obtained from the
main
Bundle
.
instance (id) named String name;
CharEncoding
known as the name
. This always
succeeds, as a CharEncoding
reads the resources it needs on demand.
public String name;
CharArray decoding;
IntDictionary encoding;
ByteArray to_lower;
ByteArray to_upper;
ByteArray to_title;
ByteArray is_digit;
ByteArray is_letter;
ByteArray is_lower;
ByteArray is_punct;
ByteArray is_space;
ByteArray is_upper;
id init String n;
char decode byte b;
b
, i.e. the Unicode character
corresponding to the byte b
in the receiving encoding.
CharArray decoding;
decoding
map, reading it iff necessary.
byte encode char c;
c
. If the byte
equivalent of the character c
does not exist in the receiving
encoding, an encoding-condition
is signaled, and the byte encoded is
the byteValue
of the object returned, or 127 if nil
is returned.
IntDictionary encoding;
encoding
map, creating it from the decoding
map if
necessary.
protected ByteArray loadConversion String conversion;
conversion
of the
receiving encoding.
protected ByteArray loadPredicateSet String predicate;
predicate
of the
receiving encoding.
boolean isAlpha byte b;
TRUE
the character denoted by the byte b
in the receiving
encoding is a letter.
boolean isDigit byte b;
TRUE
the character denoted by the byte b
in the receiving
encoding is a digit.
boolean isLower byte b;
TRUE
the character denoted by the byte b
in the receiving
encoding is a lowercase letter.
boolean isPunct byte b;
TRUE
the character denoted by the byte b
in the receiving
encoding is a punctuation character.
boolean isSpace byte b;
TRUE
the character denoted by the byte b
in the receiving
encoding is a space character.
boolean isUpper byte b;
TRUE
the character denoted by the byte b
in the receiving
encoding is a uppercase letter.
byte toLower byte b;
b
, according to the
receiving encoding. If the character is not in uppercase, it is
returned unharmed.
byte toUpper byte b;
b
, according to the
receiving encoding. If the character is not in lowercase, it is
returned unharmed.
int digitValue byte b;
b
in the
receiving encoding.
int alphaValue byte b;
b
relative to the start of its
letter range. Thus, 'a' returns 0, 'f' returns 5, etc.
CharEncoding
used during program
initialization.
static USASCIIEncoding shared;
USASCIIEncoding
object.
instance (id) shared;
String name;
char decode byte b;
byte encode char c;
boolean isAlpha byte b;
boolean isDigit byte b;
boolean isLower byte b;
boolean isPunct byte b;
boolean isSpace byte b;
boolean isUpper byte b;
byte toLower byte b;
byte toUpper byte b;
int digitValue byte b;
int alphaValue byte b;