Abstract. Dafny provides several kinds of integer types. This note compares these to help you choose the right numbers for your various uses.
The most common integer type in Dafny is the built-in type int
. It represents the
mathematical integers, that is, the numbers we learned in school and that have no
bound (in either the positive direction or negative direction). In computing,
mathematical integers are also known as unbounded integers.
Type int
is the easiest integer type to work with, because you don't have to worry
about how large the magnitudes of your numbers are.
In particular, because int
represents the unbounded integers, there are no proof
obligations associated with integer operations (well, except division and modulus, which
incur a proof obligation that the denominator is not zero). This simplifies specifications
(I'll say more about this when discussing the other types below).
At run time, a value of type int
is represented by what is usually know as a “big-num”
(or “big-int”). This means that the number of bytes used to store the value depends
on the magnitude of the number. You may think of such a representation as using an
array of bytes. Big-nums incur some run-time overhead. However, since most numbers in
a program are relatively small (say, less than a million), a good big-num implementation
has a fast path for small numbers.
Note, even though we speak of “unbounded” numbers, any physical machine is going to have a limited amount of available memory. If your program uses a number whose representation (again, think: array of bytes) would require more storage than the executing machine has available, then your program will crash. Such a crash can also come about because your program simultaneously uses a very large number of big-nums, just like your program can run out of memory if it uses a very large number of objects or arrays, or if the program's recursion depth exceeds the stack space available. Dafny's verifier does not check for these kind of storage error (because it does not know what resources are available on an executing machine).
Dafny lets you define your own types that represent a subset of the values of another
type. We say that such a subset type is based on some base type. The base type can
be any type, but for the purposes of this note, I'll just consider the case when the
base type is int
. Using a subset type, you can define an integer type whose values
fall into a given range. The most common subset type in Dafny is the built-in type nat
,
which represents the natural numbers as a subset of the integers:
type nat = x: int | 0 <= x
Here is an example of a user-defined subset type:
type int32 = x: int | -0x8000_0000 <= x < 0x8000_0000
The values of this type are the same integers as can be represented using a 2's complement 32-bit integer.
A subset type gives a concise way to write specifications for values. For example, rather than defining a function like
predicate IsEven(x: int)
requires 0 <= x
a program can simply declare
predicate IsEven(n: nat)
This makes programs more concise and easier to read.
The conciseness is more dramatically pronounced when you use the subset type as a type parameter. Let me illustrate with an example. Suppose you have a list type that supports concatenation:
datatype List<T> = Nil | Cons(T, List<T>)
function Concat<T>(xs: List<T>, ys: List<T>): List<T> {
match xs
case Nil => ys
case Cons(x, xs') => Cons(x, Concat(xs', ys))
}
Furthermore, suppose you have a special interest in lists that contain natural numbers, as expressed by this predicate:
predicate IsNatList(xs: List<int>) {
match xs
case Nil => true
case Cons(x, xs') => 0 <= x && IsNatList(xs')
}
Now, if you have two lists xs: List<int>
and ys: List<int>
that each satisfies
IsNatList
, then you probably also want to know that Concat(xs, ys)
satisfies
IsNatList
. You can prove this using a lemma:
lemma ConcatIsNatList(xs: List<int>, ys: List<int>)
requires IsNatList(xs) && IsNatList(ys)
ensures IsNatList(Concat(xs, ys))
{
}
Thanks to Dafny's automatic induction, the proof of this lemma is simply the { }
given
here. Still, if we instead had represented the two lists using the type List<nat>
,
then the type signature of Concat
would immediately make it evident that the concatenation
also is a List<nat>
. So, by instantiating the List
type parameter with nat
, we
immediately learn things about functions like Concat
without having to state or prove any
lemma.
The constraint of a subset type is enforced whenever you assign a value to a variable of that
subset type. For example, if x
has type int
and n
has type nat
, then the assignment
n := x;
is allowed by the type system and the constraint that the value assigned to n
is non-negative
is enforced by the verifier.
More generally, it is only the final value of the RHS that's checked against the type of the
LHS. Subexpressions of the RHS are allowed to be any value of the base type (because the type
inferred for those subexpressions is the base type, not the subset type). For example,
for the same x
and n
above, the assignment
n := x + n - x;
is always allowed, even if the subexpression x + n
is negative.
To say it again, the intermediate values of a subexpression leading to a subset-type value are not themselves constrained to be values of the subset type.
At run time, a subset type is represented in the same way as its base type. So, the subset
type int32
above is stored using a big-num, just like its base type int
. This goes along
with the fact that subexpressions are not constrained by the constraint of a subtype type.
(If your intention is to define a type that is represented as a 32-bit integer at run time,
then you don't want the subset type int32
that I defined in this section. Rather, you'd like
a newtype, or possibly a bitvector type, but I'll get to those types later in this note.)
The property of being non-negative is relatively easy to handle in specifications. This is
because programs that use nat
tend use operations under which the natural numbers are closed
(e.g., +
, *
, /
, and %
). A program that uses subtraction on a nat
quite often also
precedes the subtraction by a run-time test, e.g.:
function StringRepeat(s: string, n: nat): string {
if n == 0 then
""
else
s + StringRepeat(s, n - 1)
}
In these common cases, no additional specifications are needed.
For other constraints, like type int32
above, even fairly simple operations may require
specifications. For example, if you use int32
in a mutable counter object, you'll need
a precondition for the Increment
operation:
class Counter {
var c: int32
constructor ()
ensures c == 0
{
c := 0;
}
method Increment()
requires c + 1 < 0x8000_0000
modifies this
{
c := c + 1;
}
}
If you had a more complicated operation, say, like matrix inversion where the matrix
elements are represented using the type int32
, then the necessary precondition
specification becomes very complicated—it can easily become more complicated than
the operation it's trying to specify.
Use a subset type to constrain your integer values when you mostly want a more frugal
way to express repeated specifications (this guideline applies to subset types with any
base type, not just int
). But don't go overboard. There is no point in defining
a subset type for every kind of range you might encounter in your program—a straightforward
precondition is far better for one-off situations.
By the way, of interest in this note are integer range constraints. But subset types allow any constraint. For example, you can write
type EvenInt = x: int | x % 2 == 0
However, the constraint cannot be parameterized and cannot read the heap. This means you cannot write anything like
// syntax error: a subset type is not allowed to take any parameters (other
// than type parameters)
type GoodIndex(a: array<int>) = x: int | 0 <= x < a.Length
or
// error: the following constraint depends the array's mutable elements, which
// is not allowed in a type
type AllTheSame = a: array<int> |
forall i, j :: 0 <= i < j < a.Length ==> a[i] == a[j]
For such constraints, express your needs using specifications like method preconditions.
Whereas a subset type is a restricted form of its base type, a newtype is a distinct clone of its base type. For example,
newtype MyInt = int
defines MyInt
to be a type like int
, with the same built-in literals and operations as int
,
but being a type that's incompatible with int
. It is possible to convert between int
and
MyInt
, but doing so requires explicit type-conversion expressions (as
):
var m: MyInt;
var x: int;
m := 15;
x := m as int;
m := (x + 1) as MyInt;
Like a subset type, a newtype can have a constraint. For example, here is a declaration of an integer newtype in the range from 0 to 256:
newtype Byte = x: int | 0 <= x < 256
In Ada, such a newtype is called a derived type [0].
The whole point of a newtype is to let the rules of the language enforce a separation from other
types. This gives a compiler the freedom to choose for the newtype a run-time representation that
is different from that of the base type, an opportunity that's profitable when the newtype
has a constraint. For instance, the standard Dafny compiler chooses an 8-bit target-native type to
represent Byte
. More generally, it chooses the smallest native type supported by the compilation
target that can fit the whole range of numeric values of the newtype; if there is no such native
type, then the compiler chooses a big-num.
Dafny has several compilation targets. Let's dwell for another moment on how the compiler
chooses a target representation for a given newtype. First, the compiler performs a simpleminded
syntactic scan on the constraint of the newtype. This syntactic scan looks for
inequality comparisons with the given bound variable (x
in the declaration of Byte
above).
If it finds both a lower and an upper bound, then it compares that range with the native
numeric types in the target language and chooses the one with the tightest fit.
You can learn which native type was chosen by looking at the IDE's hovertext for the Dafny type
declaration. For the specific case of Byte
above, the standard Dafny compiler will
for C# choose byte
(which is an unsigned 8-bit type), for Go uint8
(also an unsigned
8-bit type), for Java byte
(which is a signed 8-bit type, but the Dafny compiler makes
sure to choose operations on this 8-bit quantity that respect the desired unsigned quantity),
for JavaScript the built-in number type (which is a 64-bit double-precision floating-point
number that offers 52 bits of precision for integer magnitudes), and for Python the built-in
big-num type.
Secion 1 used int32
as an example. If you want such a type to be compiled
to a native 32-bit type, then you should declare it as a newtype, not as a subset type:
newtype int32 = x: int | -0x8000_0000 <= x < 0x8000_0000
A newtype is different from a subset type also in that intermediate values are enforced to be values of the type. In other words, when the operands of an arithmetic operator are of some newtype, then so is the result of the operation, so Dafny enforces that this result be a legal value of the newtype.
For example, if b
is of type Byte
, then the expression
5 + b - 5
is not allowed, unless the verifier can prove that b
is less than 251
.
Like for the bounded subset types, writing specifications for newtypes takes more effort than
for the unbounded type int
. The considerations mentioned in Section 2.2 apply.
In addition, intermediate values of newtypes are subject to the constraint, so note that the
precondition of Increment
in Section 2.2 is not well-formed if you use the
int32
type in Section 2.0. Instead, you have to rephrase the precondition,
for example as
requires c < 0x7FFF_FFFF
or as
requires c as int + 1 < 0x8000_0000
A newtype is appropriate if you want to keep the numeric type separate from other types. For example, to avoid mixing up Celsius and Fahrenheit values in a program, use two distinct newtypes. A newtype is also appropriate if you need or want the run-time representation of the type to use a certain number of bits. For example, you may need this if the quantity is part of a data structure stored on disk or traveling in a network packet, or if your program is interfacing with native code in the target language, and you may want this if performance is critical for operations on these kinds of numbers.
The final choice of types for integers are Dafny's built-in bitvector types. These come
in any width (bv0
, bv1
, bv2
, bv3
, bv4
, bv5
, …), but they are not polymorphic in the
bit width (as they are in, e.g., Cryptol [1]).
Each bitvector type has a bounded number of values, namely the $2^n$ values starting from 0
,
where $n$ is the width of the type. The (one and only) reason to choose a bitvector type over
the integer types previously described is if the application uses these “numbers” primarily
as, yup, vectors of bits. Such an application may interrogate or change the value of a particular
bit or may shift the bits left or right. Bitvector types also support arithmetic operations, but
unlike the other integer types, bitvector arithmetic operations are performed modulo $2^n$
(where, again, $n$ is the bit width). Such arithmetic is said to have wraparound semantics
(aka modulo semantics).
Bitvectors are bounded types, but since their operations wrap rather than overflow, the operations themselves do not require any preconditions. In other words, because the arithmetic operations on bitvectors are more permissive than those of the previously mentioned bounded types, you're always allowed to perform the operations. The price you pay is when you have to reason about the result of the operations. If your application relies on the wraparound, then it uses properties beyond basic arithmetic, which may complicate specifications and reasoning. On the other hand, if your application does not ever intend for the operations to wrap around, then you'll have to write your own specifications to limit the input values.
Because they are separate types, the compiler chooses native types to represent bitvectors when
possible, just like it does for newtypes.
In the context of bitvectors, we often think of such native types as “machine words”.
Unlike in several other languages, bitvectors in Dafny are always unsigned. (Also, unlike
languages whose operator precedence is C-like, Dafny's bitwise |
, &
, and ^
operators
binding more strongly than the boolean operators ||
and &&
. This follows Dafny's general
rule that thinner—meaning, using fewer ASCII characters—operators have higher precedence.)
At run time, the operations on bitvectors are very efficient (when the compiled code can use
machine words). But for the verifier, the situation is different. Experience with the verifier
shows that it can be very slow when bit-centric operations are combined with arithmetic
operations or with conversions between bitvectors and other integer types. Therefore, it
is generally recommended to use the other integer types unless you need the concrete view of
the bits. Though some programmers tend to immediately think of x << 1
instead of 2 * x
or
think of x & 0xFF
instead of x % 256
, using the latter may let a program stick with the
ordinary integer types rather than going to bitvectors.
The following table summarizes the points made in this note:
type | unbounded | bounded | negative | arithmetic | wrap-around | bit-twiddling | as int | subexpression checks | run-time |
---|---|---|---|---|---|---|---|---|---|
int | yes | no | yes | yes | N/A | no | no | no | big-nums |
subset type | yes | yes | yes | yes | no | no | no | no | big-nums |
newtype | yes | yes | yes | yes | no | no | yes | yes | machine words, big-nums |
bitvector type | no | yes | no | yes | yes | yes | yes | no | machine words, big-nums |
Remarks:
The row for “subset type” refers to subset types that use int
as their base type.
The row for newtype
refers to newtypes that use int
as their base type.
“unbounded” says whether or not the type has support for unbounded integers.
“bounded” says whether or not the type has the ability to bound the magnitude of its numbers.
“arithmetic” says whether or not the type supports arithmetic
addition (+
), subtraction (-
), multiplication (*
),
Euclidean division (/
), and Euclidean modulus (%
).
For bounded types, “wraparound” refers to “wraparound semantics” (aka “modulo semantics”, in contrast to
“overflow semantics”). Under this semantics, +
, -
, and *
operations are performed modulo $N$, where $N$ is
the rnumber of elements in the type. One consequence of wraparound semantics is that the sum of two numbers
may be smaller than either operand.
“bit-twiddling” says whether or not the type supports bitwise or (|
), bitwise and (&
),
bitwise xor (^
), left and right shift (<<
and >>
), and bit rotations (.RotateLeft
and .RotateRight
).
"as int
" says whether or not type-conversion expressions are necessary to go between the type and type int
.
“subexpression checks” says whether or not the type enforces a constraint on intermediate values.
“run-time” says how the compiler may choose to represent the numbers in a running program, and “machine word” refers to a target-native type that occupies a fixed number of bits.