<small>Dafny Power User:</small>\ How to choose your number types

Dafny Power User:
How to choose your number types

K. Rustan M. Leino

Manuscript KRML 289, 11 March 2023

Abstract. Dafny provides several kinds of integer types. This note compares these to help you choose the right numbers for your various uses.

0. Mathematical integers

The most common integer type in Dafny is the built-in type int. It represents the mathematical integers, that is, the numbers we learned in school and that have no bound (in either the positive direction or negative direction). In computing, mathematical integers are also known as unbounded integers.

Type int is the easiest integer type to work with, because you don't have to worry about how large the magnitudes of your numbers are. In particular, because int represents the unbounded integers, there are no proof obligations associated with integer operations (well, except division and modulus, which incur a proof obligation that the denominator is not zero). This simplifies specifications (I'll say more about this when discussing the other types below).

At run time, a value of type int is represented by what is usually know as a “big-num” (or “big-int”). This means that the number of bytes used to store the value depends on the magnitude of the number. You may think of such a representation as using an array of bytes. Big-nums incur some run-time overhead. However, since most numbers in a program are relatively small (say, less than a million), a good big-num implementation has a fast path for small numbers.

Note, even though we speak of “unbounded” numbers, any physical machine is going to have a limited amount of available memory. If your program uses a number whose representation (again, think: array of bytes) would require more storage than the executing machine has available, then your program will crash. Such a crash can also come about because your program simultaneously uses a very large number of big-nums, just like your program can run out of memory if it uses a very large number of objects or arrays, or if the program's recursion depth exceeds the stack space available. Dafny's verifier does not check for these kind of storage error (because it does not know what resources are available on an executing machine).

1. Subset types

Dafny lets you define your own types that represent a subset of the values of another type. We say that such a subset type is based on some base type. The base type can be any type, but for the purposes of this note, I'll just consider the case when the base type is int. Using a subset type, you can define an integer type whose values fall into a given range. The most common subset type in Dafny is the built-in type nat, which represents the natural numbers as a subset of the integers:

type nat = x: int | 0 <= x

Here is an example of a user-defined subset type:

type int32 = x: int | -0x8000_0000 <= x < 0x8000_0000

The values of this type are the same integers as can be represented using a 2's complement 32-bit integer.

A subset type gives a concise way to write specifications for values. For example, rather than defining a function like

predicate IsEven(x: int)
  requires 0 <= x

a program can simply declare

predicate IsEven(n: nat)

This makes programs more concise and easier to read.

1.0. Lemmas for free

The conciseness is more dramatically pronounced when you use the subset type as a type parameter. Let me illustrate with an example. Suppose you have a list type that supports concatenation:

datatype List<T> = Nil | Cons(T, List<T>)

function Concat<T>(xs: List<T>, ys: List<T>): List<T> {
  match xs
  case Nil => ys
  case Cons(x, xs') => Cons(x, Concat(xs', ys))
}

Furthermore, suppose you have a special interest in lists that contain natural numbers, as expressed by this predicate:

predicate IsNatList(xs: List<int>) {
  match xs
  case Nil => true
  case Cons(x, xs') => 0 <= x && IsNatList(xs')
}

Now, if you have two lists xs: List<int> and ys: List<int> that each satisfies IsNatList, then you probably also want to know that Concat(xs, ys) satisfies IsNatList. You can prove this using a lemma:

lemma ConcatIsNatList(xs: List<int>, ys: List<int>)
  requires IsNatList(xs) && IsNatList(ys)
  ensures IsNatList(Concat(xs, ys))
{
}

Thanks to Dafny's automatic induction, the proof of this lemma is simply the { } given here. Still, if we instead had represented the two lists using the type List<nat>, then the type signature of Concat would immediately make it evident that the concatenation also is a List<nat>. So, by instantiating the List type parameter with nat, we immediately learn things about functions like Concat without having to state or prove any lemma.

1.1. Intermediate values

The constraint of a subset type is enforced whenever you assign a value to a variable of that subset type. For example, if x has type int and n has type nat, then the assignment

n := x;

is allowed by the type system and the constraint that the value assigned to n is non-negative is enforced by the verifier.

More generally, it is only the final value of the RHS that's checked against the type of the LHS. Subexpressions of the RHS are allowed to be any value of the base type (because the type inferred for those subexpressions is the base type, not the subset type). For example, for the same x and n above, the assignment

n := x + n - x;

is always allowed, even if the subexpression x + n is negative.

To say it again, the intermediate values of a subexpression leading to a subset-type value are not themselves constrained to be values of the subset type.

1.2. Run-time representation

At run time, a subset type is represented in the same way as its base type. So, the subset type int32 above is stored using a big-num, just like its base type int. This goes along with the fact that subexpressions are not constrained by the constraint of a subtype type.

(If your intention is to define a type that is represented as a 32-bit integer at run time, then you don't want the subset type int32 that I defined in this section. Rather, you'd like a newtype, or possibly a bitvector type, but I'll get to those types later in this note.)

1.3. Specifications

The property of being non-negative is relatively easy to handle in specifications. This is because programs that use nat tend use operations under which the natural numbers are closed (e.g., +, *, /, and %). A program that uses subtraction on a nat quite often also precedes the subtraction by a run-time test, e.g.:

function StringRepeat(s: string, n: nat): string {
  if n == 0 then
    ""
  else
    s + StringRepeat(s, n - 1)
}

In these common cases, no additional specifications are needed.

For other constraints, like type int32 above, even fairly simple operations may require specifications. For example, if you use int32 in a mutable counter object, you'll need a precondition for the Increment operation:

class Counter {
  var c: int32
  constructor ()
    ensures c == 0
  {
    c := 0;
  }
  method Increment()
    requires c + 1 < 0x8000_0000
    modifies this
  {
    c := c + 1;
  }
}

If you had a more complicated operation, say, like matrix inversion where the matrix elements are represented using the type int32, then the necessary precondition specification becomes very complicated—it can easily become more complicated than the operation it's trying to specify.

1.4. Usage guidelines

Use a subset type to constrain your integer values when you mostly want a more frugal way to express repeated specifications (this guideline applies to subset types with any base type, not just int). But don't go overboard. There is no point in defining a subset type for every kind of range you might encounter in your program—a straightforward precondition is far better for one-off situations.

By the way, of interest in this note are integer range constraints. But subset types allow any constraint. For example, you can write

type EvenInt = x: int | x % 2 == 0

However, the constraint cannot be parameterized and cannot read the heap. This means you cannot write anything like

// syntax error: a subset type is not allowed to take any parameters (other
// than type parameters)
type GoodIndex(a: array<int>) = x: int | 0 <= x < a.Length

// error: the following constraint depends the array's mutable elements, which
// is not allowed in a type
type AllTheSame = a: array<int> |
  forall i, j :: 0 <= i < j < a.Length ==> a[i] == a[j]

For such constraints, express your needs using specifications like method preconditions.

2. Newtypes

Whereas a subset type is a restricted form of its base type, a newtype is a distinct clone of its base type. For example,

newtype MyInt = int

defines MyInt to be a type like int, with the same built-in literals and operations as int, but being a type that's incompatible with int. It is possible to convert between int and MyInt, but doing so requires explicit type-conversion expressions (as):

var m: MyInt;
var x: int;
m := 15;
x := m as int;
m := (x + 1) as MyInt;

Like a subset type, a newtype can have a constraint. For example, here is a declaration of an integer newtype in the range from 0 to 256:

newtype Byte = x: int | 0 <= x < 256

In Ada, such a newtype is called a derived type [0].

2.0. Run-time representation

The whole point of a newtype is to let the rules of the language enforce a separation from other types. This gives a compiler the freedom to choose for the newtype a run-time representation that is different from that of the base type, an opportunity that's profitable when the newtype has a constraint. For instance, the standard Dafny compiler chooses an 8-bit target-native type to represent Byte. More generally, it chooses the smallest native type supported by the compilation target that can fit the whole range of numeric values of the newtype; if there is no such native type, then the compiler chooses a big-num.

Dafny has several compilation targets. Let's dwell for another moment on how the compiler chooses a target representation for a given newtype. First, the compiler performs a simpleminded syntactic scan on the constraint of the newtype. This syntactic scan looks for inequality comparisons with the given bound variable (x in the declaration of Byte above). If it finds both a lower and an upper bound, then it compares that range with the native numeric types in the target language and chooses the one with the tightest fit. You can learn which native type was chosen by looking at the IDE's hovertext for the Dafny type declaration. For the specific case of Byte above, the standard Dafny compiler will for C# choose byte (which is an unsigned 8-bit type), for Go uint8 (also an unsigned 8-bit type), for Java byte (which is a signed 8-bit type, but the Dafny compiler makes sure to choose operations on this 8-bit quantity that respect the desired unsigned quantity), for JavaScript the built-in number type (which is a 64-bit double-precision floating-point number that offers 52 bits of precision for integer magnitudes), and for Python the built-in big-num type.

Secion 1 used int32 as an example. If you want such a type to be compiled to a native 32-bit type, then you should declare it as a newtype, not as a subset type:

newtype int32 = x: int | -0x8000_0000 <= x < 0x8000_0000

2.1. Intermediate values

A newtype is different from a subset type also in that intermediate values are enforced to be values of the type. In other words, when the operands of an arithmetic operator are of some newtype, then so is the result of the operation, so Dafny enforces that this result be a legal value of the newtype.

For example, if b is of type Byte, then the expression

5 + b - 5

is not allowed, unless the verifier can prove that b is less than 251.

2.2. Specifications

Like for the bounded subset types, writing specifications for newtypes takes more effort than for the unbounded type int. The considerations mentioned in Section 2.2 apply. In addition, intermediate values of newtypes are subject to the constraint, so note that the precondition of Increment in Section 2.2 is not well-formed if you use the int32 type in Section 2.0. Instead, you have to rephrase the precondition, for example as

requires c < 0x7FFF_FFFF

or as

requires c as int + 1 < 0x8000_0000

2.3. Usage guidelines

A newtype is appropriate if you want to keep the numeric type separate from other types. For example, to avoid mixing up Celsius and Fahrenheit values in a program, use two distinct newtypes. A newtype is also appropriate if you need or want the run-time representation of the type to use a certain number of bits. For example, you may need this if the quantity is part of a data structure stored on disk or traveling in a network packet, or if your program is interfacing with native code in the target language, and you may want this if performance is critical for operations on these kinds of numbers.

3. Bitvector types

The final choice of types for integers are Dafny's built-in bitvector types. These come in any width (bv0, bv1, bv2, bv3, bv4, bv5, …), but they are not polymorphic in the bit width (as they are in, e.g., Cryptol [1]).

Each bitvector type has a bounded number of values, namely the $$2^n$$ values starting from 0, where $$n$$ is the width of the type. The (one and only) reason to choose a bitvector type over the integer types previously described is if the application uses these “numbers” primarily as, yup, vectors of bits. Such an application may interrogate or change the value of a particular bit or may shift the bits left or right. Bitvector types also support arithmetic operations, but unlike the other integer types, bitvector arithmetic operations are performed modulo $$2^n$$ (where, again, $$n$$ is the bit width). Such arithmetic is said to have wraparound semantics (aka modulo semantics).

Bitvectors are bounded types, but since their operations wrap rather than overflow, the operations themselves do not require any preconditions. In other words, because the arithmetic operations on bitvectors are more permissive than those of the previously mentioned bounded types, you're always allowed to perform the operations. The price you pay is when you have to reason about the result of the operations. If your application relies on the wraparound, then it uses properties beyond basic arithmetic, which may complicate specifications and reasoning. On the other hand, if your application does not ever intend for the operations to wrap around, then you'll have to write your own specifications to limit the input values.

Because they are separate types, the compiler chooses native types to represent bitvectors when possible, just like it does for newtypes. In the context of bitvectors, we often think of such native types as “machine words”. Unlike in several other languages, bitvectors in Dafny are always unsigned. (Also, unlike languages whose operator precedence is C-like, Dafny's bitwise |, &, and ^ operators binding more strongly than the boolean operators || and &&. This follows Dafny's general rule that thinner—meaning, using fewer ASCII characters—operators have higher precedence.)

At run time, the operations on bitvectors are very efficient (when the compiled code can use machine words). But for the verifier, the situation is different. Experience with the verifier shows that it can be very slow when bit-centric operations are combined with arithmetic operations or with conversions between bitvectors and other integer types. Therefore, it is generally recommended to use the other integer types unless you need the concrete view of the bits. Though some programmers tend to immediately think of x << 1 instead of 2 * x or think of x & 0xFF instead of x % 256, using the latter may let a program stick with the ordinary integer types rather than going to bitvectors.

4. Summary

The following table summarizes the points made in this note:

type	unbounded	bounded	negative	arithmetic	wrap-around	bit-twiddling	`as int`	subexpression checks	run-time

`int`	yes	no	yes	yes	N/A	no	no	no	big-nums

subset type	yes	yes	yes	yes	no	no	no	no	big-nums

`newtype`	yes	yes	yes	yes	no	no	yes	yes	machine words, big-nums

bitvector type	no	yes	no	yes	yes	yes	yes	no	machine words, big-nums

Remarks:

The row for “subset type” refers to subset types that use int as their base type.
The row for newtype refers to newtypes that use int as their base type.
“unbounded” says whether or not the type has support for unbounded integers.
“bounded” says whether or not the type has the ability to bound the magnitude of its numbers.
“arithmetic” says whether or not the type supports arithmetic addition (+), subtraction (-), multiplication (*), Euclidean division (/), and Euclidean modulus (%).
For bounded types, “wraparound” refers to “wraparound semantics” (aka “modulo semantics”, in contrast to “overflow semantics”). Under this semantics, +, -, and * operations are performed modulo $$N$$ , where $$N$$ is the rnumber of elements in the type. One consequence of wraparound semantics is that the sum of two numbers may be smaller than either operand.
“bit-twiddling” says whether or not the type supports bitwise or (|), bitwise and (&), bitwise xor (^), left and right shift (<< and >>), and bit rotations (.RotateLeft and .RotateRight).
"as int" says whether or not type-conversion expressions are necessary to go between the type and type int.
“subexpression checks” says whether or not the type enforces a constraint on intermediate values.
“run-time” says how the compiler may choose to represent the numbers in a running program, and “machine word” refers to a target-native type that occupies a fixed number of bits.

References

[0]Ada Confirmity Assessment Authority. Ada 2012 language reference manual. http://www.ada-auth.org/standards/ada12.html, 2012. 🔎

[1]Galois, Inc. Cryptol: The language of cryptography. https://cryptol.net/, 2021. 🔎