The REAL signature specifies structures that implement floating-point
numbers. The semantics of floating-point numbers should follow the
IEEE standard 754-1985 and the ANSI/IEEE standard 854-1987. In
addition, implementations of the REAL signature are required to use
non-trapping semantics. Additional aspects of the design of the REAL
and MATH signatures were guided by the Floating-Point C Extensions
developed by the X3J11 ANSI committee and the lecture notes by
W. Kahan on the IEEE standard 754.
Although there can be many representations for NaN values, the Library
models them as a single value and currently provides no explicit way
to distinguish among them, ignoring the sign bit. Thus, in the
descriptions below and in the Math structure, we just refer to the NaN
value.
structure Real64 : REAL (basis)structure Real : REAL (basis)structure LargeReal : REAL (basis)
signature REAL = sigtype real
structure Math : MATHvalradix : int
valprecision : int
valmaxFinite : real
valminPos : real
valminNormalPos : real
valposInf : real
valnegInf : real
val+ : real * real -> real
val- : real * real -> real
val* : real * real -> real
val/ : real * real -> real
valrem : real * real -> real
val*+ : real * real * real -> real
val*- : real * real * real -> real
val~ : real -> real
valabs : real -> real
valmin : real * real -> real
valmax : real * real -> real
valsign : real -> int
valsignBit : real -> bool
valsameSign : real * real -> bool
valcopySign : real * real -> real
valcompare : real * real -> order
valcompareReal : real * real -> IEEEReal.real_order
val< : real * real -> bool
val<= : real * real -> bool
val> : real * real -> bool
val>= : real * real -> bool
val== : real * real -> bool
val!= : real * real -> bool
val?= : real * real -> bool
valunordered : real * real -> bool
valisFinite : real -> bool
valisNan : real -> bool
valisNormal : real -> bool
valclass : real -> IEEEReal.float_class
valtoManExp : real -> {man : real, exp : int}
valfromManExp : {man : real, exp : int} -> real
valsplit : real -> {whole : real, frac : real}
valrealMod : real -> real
valnextAfter : real * real -> real
valcheckFloat : real -> real
valrealFloor : real -> real
valrealCeil : real -> real
valrealTrunc : real -> real
valrealRound : real -> real
valfloor : real -> int
valceil : real -> int
valtrunc : real -> int
valround : real -> int
valtoInt : IEEEReal.rounding_mode -> real -> int
valtoLargeInt : IEEEReal.rounding_mode -> real -> LargeInt.int
valfromInt : int -> real
valfromLargeInt : LargeInt.int -> real
valtoLarge : real -> LargeReal.real
valfromLarge : IEEEReal.rounding_mode -> LargeReal.real -> real
valfmt : StringCvt.realfmt -> real -> string
valtoString : real -> string
valscan : (char, 'a) StringCvt.reader -> (real, 'a) StringCvt.reader
valfromString : string -> real option
(*
val toDecimal : real -> IEEEReal.decimal_approx
val fromDecimal : IEEEReal.decimal_approx -> real option
*)
end
[type real]
The type of reals. Notice that real is not an equality
type.
The number of digits, each between 0 and radix-1, in the
mantissa. Note that the precision includes the implicit (or hidden)
bit used in the IEEE representation (e.g., the value of
Real64.precision is 53).
returns the remainder x - n*y, where n = trunc (x /
y). The result has the same sign as x and has absolute value less than
the absolute value of y. If x is an infinity or y is 0, rem returns
NaN. If y is an infinity, rem returns x.
These return a*b + c and a*b - c, respectively. Their behaviors on
infinities follow from the behaviors derived from addition,
subtraction, and multiplication. The precise semantics of these
operations depend on the language implementation and the underlying
hardware. Specifically, certain architectures provide these operations
as a single instruction, possibly using a single rounding
operation. Thus, the use of these operations may be faster than
performing the individual arithmetic operations sequentially, but may
also cause different rounding behavior.
is ~1, 0, or 1, according to whether x is negative, zero, or
positive. An infinity returns its sign; a zero returns 0 regardless of
its sign. It raises Domain on NaN.
The function compare returns LESS, EQUAL, or GREATER according to
whether its first argument is less than, equal to, or greater than the
second. It raises IEEEReal.Unordered on unordered arguments. The
function compareReal behaves similarly except that the values it
returns have the extended type IEEEReal.real_order and it returns
IEEEReal.UNORDERED on unordered arguments.
returns true if x is larger than or equal to y. Returns false
otherwise.
Note that these operators return false on unordered arguments, i.e.,
if either argument is NaN, so that the usual reversal of comparison
under negation does not hold, e.g., a < b is not the same as not (a >=
b).
The first returns true if and only if neither y nor x is NaN, and y
and x are equal, ignoring signs on zeros. This is equivalent to the
IEEE = operator. The second function != is equivalent to not o op ==
and the IEEE ?<> operator.
returns {man, exp}, where man and exp are the mantissa
and exponent of r, respectively. Specifically, we have the relation
r = man * radix^(exp)
where 1.0 <= man * radix < radix. This function is comparable to frexp
in the C library. If r is +-0, man is +-0 and exp is +0. If r is
+-infinity, man is +-infinity and exp is unspecified. If r is NaN, man
is NaN and exp is unspecified.
returns man * radix^(exp). This function is
comparable to ldexp in the C library. Note that, even if man is a
non-zero, finite real value, the result of fromManExp can be zero or
infinity because of underflows and overflows. If man is +-0, the
result is +-0. If man is +-infinity, the result is +-infinity. If man
is NaN, the result is NaN.
The former returns {whole, frac}, where frac and whole are the
fractional and integral parts of r, respectively. Specifically, whole
is integral, |frac| < 1.0, whole and frac have the same sign as r, and
r = whole + frac. This function is comparable to modf in the C
library. If r is +-infinity, whole is +-infinity and frac is +-0. If
r is NaN, both whole and frac are NaN. The realMod function is
equivalent to #frac o split.
returns the next representable real after r in the
direction of t. Thus, if t is less than r, nextAfter returns the
largest representable floating-point number less than r. If r = t then
it returns r. If either argument is NaN, this returns NaN. If r is
+-infinity, it returns +-infinity.
raises Overflow if x is an infinity, and raises Div if
x is NaN. Otherwise, it returns its argument. This can be used to
synthesize trapping arithmetic from the non-trapping operations given
here. Note, however, that infinities can be converted to NaNs by some
operations, so that if accurate exceptions are required, checks must
be done after each operation.
These functions convert real values to integer-valued reals. realFloor
produces floor(r), the largest integer not larger than r. realCeil
produces ceil(r), the smallest integer not less than r. realTrunc
rounds r towards zero, and realRound rounds to the integer-values real
value that is nearest to r. If r is NaN or an infinity, these
functions return r.
is the integer nearest to r, using the default rounding
mode. NOTE: This isn't the required behaviour: it should round to
nearest even integer in case of a tie. May raise Overflow.
These functions convert the argument x to an integral type using the
specified rounding mode. They raise Overflow if the result is not
representable, in particular, if x is an infinity. They raise Domain
if the input real is NaN.
These functions convert the integer i to a real value. If the absolute
value of i is larger than maxFinite, then the appropriate infinity is
returned. If i cannot be exactly represented as a real value, then the
current rounding mode is used to determine the resulting value. The
top-level function real is an alias for Real.fromInt.
These convert between values of type real and type LargeReal.real. If
r is too small or too large to be represented as a real, fromLarge
will convert it to a zero or an infinity.
returns a string representing r, in the format specified
by spec.
spec description C printf
---------------------------------------------------------------
SCI NONE scientific, 6 digits after point %e
SCI (SOME n) scientific, n digits after point %.ne
FIX NONE fixed-point, 6 digits after point %f
FIX (SOME n) fixed-point, n digits after point %.nf
GEN NONE auto choice, 12 significant digits %.12g
GEN (SOME n) auto choice, n significant digits %.ng
attempts to scan a floating-point number from the
character source charsrc, using the accessor getc, and ignoring any
initial whitespace. If successful, it returns SOME(r, rest) where r
is the number scanned, and rest is the unused part of the character
source. The valid forms of floating-point numerals are described by
returns SOME(r) if a floating-point numeral can be
scanned from a prefix of string s, ignoring any initial whitespace;
returns NONE otherwise. The valid forms of floating-point numerals
are described by
These convert between real values and decimal approximations. Decimal
approximations are to be converted using the IEEEReal.TO_NEAREST
rounding mode. toDecimal should produce only as many digits as are
necessary for fromDecimal to convert back to the same number. In
particular, for any normal or subnormal real value r, we have the
bit-wise equality: fromDecimal (toDecimal r) = r. For toDecimal, when
the r is not normal or subnormal, then the exp field is set to 0 and
the digits field is the empty list. In all cases, the sign and class
field capture the sign and class of r.
For fromDecimal, if class is ZERO or INF, the resulting real is the
appropriate signed zero or infinity. If class is NAN, a signed NaN is
generated. If class is NORMAL or SUBNORMAL, the sign, digits and exp
fields are used to produce a real number whose value is
s * 0.d(1)d(2)...d(n) 10^(exp)
where digits = [d(1), d(2), ..., d(n)] and where s is -1 if sign is
true and 1 otherwise. Note that the conversion itself should ignore
the class field, so that the resulting value might have class NORMAL,
SUBNORMAL, ZERO, or INF. For example, if digits is empty or a list of
all 0's, the result should be a signed zero. More generally, very
large or small magnitudes are converted to infinities or zeros. If
the argument to fromDecimal does not have a valid format, i.e., if the
digits field contains integers outside the range [0,9], it returns
NONE.
Implementation note: Algorithms for accurately and efficiently
converting between binary and decimal real representations are
readily available, e.g., see the technical report by Gay.