CHAR
signatureThe CHAR signature defines a type char of characters and provides basic operations and predicates on values of that type. There is a linear ordering supported on characters. In addition, there is an encoding of characters into a contiguous range of non-negative integers that preserves the linear ordering.
There are two structures matching the CHAR signature. The Char
structure defines a superset of the usual ASCII characters and locale-independent operations on them. For this structure, Char.maxOrd
= 255.
The optional WideChar
structure defines wide characters, which are represented by a fixed number of 8-bit words (bytes). If the WideChar is provided, it is distinct from the Char structure.
signature CHAR
structure Char
: CHAR
structure WideChar
: CHAR
eqtype char
eqtype string
val minChar : char
val maxChar : char
val maxOrd : int
val ord : char -> int
val chr : int -> char
val succ : char -> char
val pred : char -> char
val < : (char * char) -> bool
val <= : (char * char) -> bool
val > : (char * char) -> bool
val >= : (char * char) -> bool
val compare : (char * char) -> order
val contains : string -> char -> bool
val notContains : string -> char -> bool
val toLower : char -> char
val toUpper : char -> char
val isAlpha : char -> bool
val isAlphaNum : char -> bool
val isAscii : char -> bool
val isCntrl : char -> bool
val isDigit : char -> bool
val isGraph : char -> bool
val isHexDigit : char -> bool
val isLower : char -> bool
val isPrint : char -> bool
val isSpace : char -> bool
val isPunct : char -> bool
val isUpper : char -> bool
val fromString : String.string -> char option
val scan : (Char.char, 'a) StringCvt.reader -> 'a -> (char * 'a) option
val toString : char -> String.string
val fromCString : String.string -> char option
val toCString : char -> String.string
eqtype char
eqtype string
minChar
chr 0
.
maxChar
maxOrd
ord maxChar
.
ord c
chr i
maxOrd
. When chr is restricted to the interval [0,maxOrd
], these two functions denote the character encoding function and its inverse.
succ c
maxChar
. When defined, succ c
is equivalent to chr(ord c + 1)
.
pred c
minChar
. When defined, pred c
is equivalent to chr(ord c - 1)
.
c < d
c <= d
c > d
c >= d
compare (c, d)
contains s c
true
if character c occurs in the string s; otherwise false
.
Implementation note:
In some implementations, the partial application of contains to s may build a table, which is used by the resulting function to decide whether a given character is in the string or not. Hence it may be expensive to compute
val p = contains s
, but fast to computep c
for any given character c.
notContains s c
true
if character c does not occur in the string s; false
otherwise. Equivalent to not(contains s c
).
Implementation note:
As with contains, notContains may be implemented via table lookup.
toLower c
toUpper c
isAlpha c
true
if c is a letter (lowercase or uppercase).
isAlphaNum c
true
if c is alphanumeric (a letter or a decimal digit).
isAscii c
true
if c is a (seven-bit) ASCII character, i.e., 0 <= ord
c <= 127. Note that this function is independent of locale.
isCntrl c
true
if c is a control character. Equivalent to not o isPrint
.
isDigit c
true
if c is a decimal digit (0-9).
isGraph c
true
if c is a graphical character, that is, it is printable and not a whitespace character.
isHexDigit c
true
if c is a hexadecimal digit (0-9, a-f, A-F).
isLower c
true
if c is a lowercase letter.
isPrint c
true
if c is a printable character (whitespace or visible), i.e., not a control character.
isSpace c
true
if c is a whitespace character (space, newline, tab, carriage return, vertical tab, formfeed).
isPunct c
true
if c is a punctuation character: graphical but not alphanumeric.
isUpper c
true
if c is an uppercase letter.
fromString s
scan getc strm
The allowable escape sequences are:
\a Alert (ASCII 0x07) \b Backspace (ASCII 0x08) \t Horizontal tab (ASCII 0x09) \n Linefeed or newline (ASCII 0x0A) \v Vertical tab (ASCII 0x0B) \f Form feed (ASCII 0x0C) \r Carriage return (ASCII 0x0D) \\ Backslash \" Double quote \^c A control character whose encoding is C - 64, where C is the encoding of the character c, with C in the range [64,95]. \ddd The character whose encoding is the number ddd, three decimal digits denoting an integer in the range [0,255]. \f...f\ This sequence is ignored, where f...f stands for a sequence of one or more formatting characters.
toString c
#"\\"
and #"\""
, are left unchanged. Backslash #"\\"
becomes "\\\\"
; double quote #"\""
becomes "\\\""
. The common control characters are converted to two-character escape sequences:
Alert (ASCII 0x07) "\\a" Backspace (ASCII 0x08) "\\b" Horizontal tab (ASCII 0x09) "\\t" Linefeed or newline (ASCII 0x0A) "\\n" Vertical tab (ASCII 0x0B) "\\v" Form feed (ASCII 0x0C) "\\f" Carriage return (ASCII 0x0D) "\\r"The remaining characters whose codes are less than 32 are represented by three-character strings in ``control character'' notation, e.g.,
#"\000"
maps to "\\^@"
, #"\001"
maps to "\\^A"
, etc. All other characters (i.e., those whose codes are 127 or greater) are mapped to four-character strings of the form "\\ddd"
, where ddd
are the three decimal digits corresponding to a character's code.
fromCString s
toCString c
#"\\"
, #"\""
, #"?"
and #"'"
are left unchanged. Backslash #"\\"
becomes "\\\\"
; double quote #"\""
becomes "\\\""
, question mark #"?"
becomes "\\?"
, single quote #"'"
becomes "\\'"
. The common control characters are converted to two-character escape sequences:
Alert (ASCII 0x07) "\\a" Backspace (ASCII 0x08) "\\b" Horizontal tab (ASCII 0x09) "\\t" Linefeed or newline (ASCII 0x0A) "\\n" Vertical tab (ASCII 0x0B) "\\v" Form feed (ASCII 0x0C) "\\f" Carriage return (ASCII 0x0D) "\\r"All other characters are represented by one to three octal digits, corresponding to a character's code, preceded by a backslash.
In WideChar, the functions toLower, toLower, isAlpha,..., isUpper are locale-dependent. In Char, these functions are locale-independent, with the following semantics:
Add table for ISO Latin-1 characters and predicates.
Locale, MultiByte, STRING
Last Modified May 15, 1996
Copyright © 1996 AT&T