The Standard ML Basis Library

The `WORD` signature

Instances of the signature WORD provide a type of unsigned integers with arithmetic and logical operations and conversion operations. They are also meant to give efficient access to the primitive machine word types of the underlying hardware.

Note: In this text, a bit ordering is assumed as follows: The most significant bit (MSB) is the leftmost, the least significant bit the rightmost bit. This does not touch the semantics of the operations, but the (intuitive) description of the shift operators.

Synopsis

signature WORD structure Word : WORD structure Word8 : WORD structure LargeWord : WORD structure Word{N} : WORD structure SysWord : WORD

Interface

eqtype word val wordSize : int val toLargeWord : word -> LargeWord.word val toLargeWordX : word -> LargeWord.word val fromLargeWord : LargeWord.word -> word val toLargeInt : word -> LargeInt.int val toLargeIntX : word -> LargeInt.int val fromLargeInt : LargeInt.int -> word val toInt : word -> Int.int val toIntX : word -> Int.int val fromInt : Int.int -> word val orb : (word * word) -> word val xorb : (word * word) -> word val andb : (word * word) -> word val notb : word -> word val << : (word * Word.word) -> word val >> : (word * Word.word) -> word val ~>> : (word * Word.word) -> word val + : (word * word) -> word val - : (word * word) -> word val * : (word * word) -> word val div : (word * word) -> word val mod : (word * word) -> word val compare : (word * word) -> order val > : (word * word) -> bool val < : (word * word) -> bool val >= : (word * word) -> bool val <= : (word * word) -> bool val min : (word * word) -> word val max : (word * word) -> word val fmt : StringCvt.radix -> word -> string val toString : word -> string val fromString : string -> word option val scan : StringCvt.radix -> (char, 'a) StringCvt.reader -> 'a -> (word, 'a) option

Description

eqtype word

wordSize

is the number of bits in type word. wordSize need not be a power of two. Note that word has a fixed, finite precision.

toLargeWord w

toLargeWordX w

convert w to a value of type LargeWord.word. In the former case, w is converted to its equivalent LargeWord.word value in the range [0,2^(wordSize)-1]. In the latter case, w is ``sign-extended,'' i.e., the wordSize low-order bits of w and toLargeWordX w are the same, and the remaining bits of toLargeWordX w are all equal to the most significant bit of w.

fromLargeWord w

converts w of the type LargeWord.word to the value w modulo 2^(wordSize) of type word.

toLargeInt w

toLargeIntX w

convert w to a value of type LargeInt.int. In the former case, w is viewed as an integer value in the range [0,2^(wordSize)-1]. In the latter case, w is treated as a 2's complement signed integer with wordSize precision, thereby having a value in the range [-2^(wordSize-1),2^(wordSize-1)-1]. toLargeInt raises Overflow if the target integer value cannot be represented as a LargeInt.int. Since the precision of LargeInt.int is always at least wordSize, toLargeIntX will never raise an exception.

fromLargeInt i

converts i of the type LargeInt.int to a value of type word. This has the effect of taking the low-order wordSize bits of the two's complement representation of i.

toInt w

toIntX w

convert w to a value of default integer type. In the former case, w is viewed as an integer value in the range [0,2^(wordSize)-1]. In the latter case, w is treated as a 2's complement signed integer with wordSize precision, thereby having a value in the range [-2^(wordSize-1),2^(wordSize-1)-1]. Raise Overflow if the target integer value cannot be represented as an Int.int.

fromInt i

converts i of the default integer type to a value of type word. This has the effect of taking the low-order wordSize bits of the two's complement representation of i. If the precision of Int.int is less than wordSize, then i is sign-extended to wordSize bits.

orb (i, j)

returns the bit-wise OR of i and j.

xorb (i, j)

returns the bit-wise exclusive OR of i and j.

andb (i, j)

returns the bit-wise AND of i and j.

notb i

returns the bit-wise complement (NOT) of i.

<< (i, n)

shifts i to the left by n bit positions, filling in zeros from the right. When i and n are interpreted as unsigned binary numbers, returns (i * 2⁽ⁿ⁾) mod (2 ^(wordSize)). In particular, shifting by greater than or equal to the word size results in 0. This operation is similar to the ``(logical) shift left'' instruction in many processors.

>> (i, n)

shifts i to the right by n bit positions, filling in zeros from the left. When i and n are interpreted as unsigned binary numbers, returns floor (i / 2⁽ⁿ⁾). In particular, shifting by greater than or equal to the word size results in 0. This operation is similar to the ``logical shift right'' instruction in many processors.

~>> (i, n)

shifts i to the right by n bit positions. The value of the leftmost bit (the MSB) of i is filled in from the left; in a two's-complement interpretation this corresponds to sign extension. When i is interpreted as a wordSize-bit twos-complement integer and n is interpreted as an unsigned binary number, returns floor (i / 2⁽ⁿ⁾). In particular, shifting by greater than or equal to the word size results in either 0 or all 1's. This operation is similar to the ``arithmetic shift right'' instruction in many processors.

i + j

When i and j are interpreted as unsigned binary numbers, returns the sum of i and j modulo (2^(wordSize)). Does not raise Overflow.

i - j

When i and j are interpreted as unsigned binary numbers, returns the difference of i and j modulo (2^(wordSize)):

(2^(wordSize) + i - j) mod (2^(wordSize))

Does not raise Overflow.

i * j

When i and j are interpreted as unsigned binary numbers, returns the product of i and j modulo (2^(wordSize)). Does not raise Overflow.

i div j

When i and j are interpreted as unsigned binary numbers, returns the truncated quotient of i and j: floor (i / j). Raises Div when j = 0.

i mod j

When i and j are interpreted as unsigned binary numbers, returns the remainder of the division i by j:

i - j * floor (i / j).

Raises Div when j = 0.

compare (i, j)

When i and j are interpreted as unsigned binary numbers, returns LESS, EQUAL, or GREATER if and only if i is less than, equal to, or greater than j, respectively.

i > j

i < j

i >= j

i <= j

return true if and only i and j satisfy the given relation when interpreted as unsigned binary numbers.

min (i, j)

max (i, j)

returns the smaller (respectively, larger) of i and j.

fmt radix i

returns a string containing a numeric representation of i using the given radix. No prefix "Ow", "OwX", etc. is generated. The hexadecimal digits 10-15 are represented as [A-F].

toString i

returns a hexadecimal string representation of i. It is equivalent to fmt StringCvt.HEX i.

fromString s

returns SOME w if an unsigned hexadecimal number in the format (0wx|0wX|0x|0X)?[0-9a-fA-F]+ can be parsed from a prefix of string s, ignoring initial whitespace; NONE is returned otherwise. w is the value of the number parsed. Raises Overflow when a hexadecimal numeral can be parsed, but is too large to fit in type word. Equivalent to StringCvt.scanString (scan StringCvt.HEX).

scan radix getc src

returns SOME (w,r) if an unsigned number in the format denoted by radix can be parsed from a prefix of the character source src using the character input function getc; w is the value of the number parsed, r is the rest of the character source. Initial whitespace is ignored. NONE is returned otherwise. Raises Overflow when a number can be parsed, but is too large to fit in type word. The type of scan can also be written as

StringCvt.radix -> (char, 'a) StringCvt.reader -> (word, 'a) StringCvt.reader

The format expected depends on radix. The formats are as follows:

          StringCvt.BIN - (0w)?[0-1]+
          StringCvt.OCT - (0w)?[0-7]+
          StringCvt.DEC - (0w)?[0-9]+
          StringCvt.HEX - (0wx|0wX|0x|0X)?[0-9a-fA-F]+

Discussion

The type LargeWord.word represents the largest word supported. We require that LargeWord.wordSize <= LargeInt.precision

The structure SysWord is used with the optional Posix modules. The type SysWord.word is guaranteed to be large enough to hold any integral value used by the underlying system.

For words and integers of the same precision/word size, the operations fromInt and toIntX act as bit-wise identity functions. However, even in this case, toInt will raise Overflow if the high-order bit of the word is set.

Conversion between words and integers of any size can be handled by intermediate conversion into LargeWord.word and LargeInt.int. For example, the functions fromInt, toInt and toIntX are respectively equivalent to:

  fromLargeWord o LargeWord.fromLargeInt o Int.toLarge
  Int.fromLarge o LargeWord.toLargeInt   o toLargeWord
  Int.fromLarge o LargeWord.toLargeIntX  o toLargeWordX

Typically, implementations will provide very efficient word operations by inline-expanding them to a few machine instructions. It also is assumed that implementations will catch the idiom of converting between words and integers of differing precisions using an intermediate representation (e.g., Word32.fromLargeWord o Word8.toLargeWord) and optimize these conversions.