Abstract. For simple examples, Dafny's automatic induction is powerful enough to prove some lemmas without manual input and helps reduce the manual input required to prove others. This note explains how automatic induction is applied and how the two main knobs are used to adjust what the automation does.
Dafny not only supports manually written lemmas, but also provides some automation that assists in proving lemmas by induction [0]. Let's start with variations of a manual proof, then see how automatic induction provides or fails to provide a proof, and finally how to adjust what automation does.
Consider the following standard definition of recursively defined lists
along with a Length
function and a function that constructs a list of
increasing integers.
datatype List<A> = Nil | Cons(A, List<A>)
function method Length(list: List): nat {
match list
case Nil => 0
case Cons(_, tail) => 1 + Length(tail)
}
function method Range(start: int, len: nat): List<int>
decreases len
{
if len == 0 then Nil else Cons(start, Range(start+1, len-1))
}
With these definitions, let's set out to prove that
the length of the list Range(start, len)
is len
.
So, we're going to prove that the length of the list Range(start, len)
is len
. We start with a manual proof. To be sure that automatic induction
does not help us along in this first example, I have marked with lemmas
with the attribute {:induction false}
.
lemma {:induction false} RangeLength(start: nat, len: nat)
ensures Length(Range(start, len)) == len
decreases len
{
if len == 0 {
// trivial
} else {
calc {
Length(Range(start, len));
== // def. Range, since len > 0
Length(Cons(start, Range(start+1, len-1)));
== // def. Length on Cons
1 + Length(Range(start+1, len-1));
== { RangeLength(start+1, len-1); } // apply induction hypothesis
1 + len - 1;
== // arithmetic
len;
}
}
}
For the purpose of this note, I assume you have a basic understanding of how to write proof like this manually, either in Dafny or on paper. I will describe the elements of the proof here, but won't try to explain all possible considerations.
The lemma is called RangeLength
and is parameterized by start
and len
,
just like the Range
function is. The property that the lemma proves is
stated in the ensures
clause. It is also called the proof goal.
The signature part of the lemma also defines
a decreases
clause, which says that the expression len
is to be used as the
termination metric for any recursive calls of the lemma. This termination
metric is the same as the one of the Range
function. This is common, because
the structure of a proof typically follows the structure of a function in
the lemma's proof goal.
The body of the lemma is divided up into two cases, following the cases in
the definition of Range
. The first case is len == 0
and gives rise to a
very simple proof: when len == 0
, Range
returns Nil
whose length is 0
.
The other case makes use of a verified calculation, commonly known as
a calc
statement [1]. This calculation starts with
the expression Length(Range(start, len))
and uses equality-preserving
transformations to arrive at len
, which is exactly what the proof goal says
the lemma has to prove. The first two steps of the calculation apply the
definitions of Range
and Length
to rewrite the expression into a form
where we see the expression
Length(Range(start+1, len-1))
This is where the one interesting step of the proof takes place. We want to
replace this expression with len - 1
, which requires a justification that
the expression indeed equals len - 1
. With the right parameterization,
this is exactly what the lemma we're trying to prove says. Therefore,
we use a calc
-statement hint and call the lemma:
RangeLength(start+1, len-1);
Because this is a recursive call to the lemma we're trying to prove, we
must prove termination. A recursive call to a lemma is commonly known as
an application of the induction hypothesis. In that way, our termination
check corresponds to making sure that the induction is well-founded.
In Dafny, termination is proved by
showing that every recursive call decreases some termination metric,
which is to say that some expression evaluates to a smaller value for
the recursive call than it does for the caller. What is this termination-metric
expression that is to be smaller and what does “smaller” mean? The expression is
len
, as given by the decreases
clause, and since len
is an integer,
Dafny uses "integer less-than, bounded below by 0
" as the order.
That was a mouthful. Let's review that last part again, the part about termination.
To prove that recursive calls to a (function or method or) lemma terminate,
we associate each invocation of the lemma with a value in a well-founded
order. Being well-founded means that there are no infinite descending chains.
That is, there is fixed ordering and every sequence of successively
smaller values in that ordering is finite. For example, in the
"integer less-than, bounded below by 0
" ordering, one descending chain
is
57, 56, 48, 39, 20, 4
This chain is finite. No matter what descending chain you write down, it
will be finite in this ordering (otherwise, you would eventually get a negative
number, but remember we said "bounded below by 0
"0).
Dafny builds in a fixed well-founded order for every type.1 It also supports lexicographic tuples of values, and the well-founded order for such tuples is the lexicographic ordering of the types on each component.
The way we associate a lemma invocation with a value in this fixed well-founded
order is that we declare a decreases
clause. It takes as its argument a
list of expressions, and these form a lexicographic tuple. For our
RangeLength
lemma, we used decreases len
. This says that each invocation
of the lemma will be associated with the same value as is passed in as
parameter len
.
Function Range
is also recursive, and for its invocations,
we also used len
as the termination metric. Function Length
is recursive,
too, but evidently we didn't give a decreases
clause for it. In the absence
of a decreases
clause, Dafny provides one for us, namely the lexicographic
tuple consisting of the function/method/lemma's parameters, in the order
given.2 So, the termination metric of Length
is list
(and
Dafny's well-founded order for inductive datatypes is structural inclusion).
Dafny's IDEs provide a tool tip (which you can see by hovering over the
function/method/lemma
declaration) that tells you which decreases
clause it picks for recursive
functions/methods/lemmas.
So far, I've said that Dafny defines a fixed well-founded ordering and the
way to associate lemma invocation with a value in that ordering is to
declare a decreases
clause. How does the verifier use these things to prove
termination? It proves that the value associated with a callee is strictly
below the value associated with the caller. In other words, it proves that
every recursive call takes a step in a descending chain. Because every
chain is finite, it follows that there is no infinite recursion. In other
words, the recursive calls terminate.
In the example, there's a recursive call from RangeLength(start, len)
to RangeLength(start+1, len-1)
. The lemma uses decreases len
, so the
verifier checks that len-1 < len
, which proves termination.
Similarly, there is a recursive call from Range(start, len)
to
Range(start+1, len-1)
. The verifier checks len-1 < len
and this proves
termination.
Finally, there is a recursive call from Length(list)
to Length(tail)
, where
tail
is structurally included in list
. Thus, the verifier is able to prove
termination here as well.
RangeLength
It is instructive to consider what would happen if we had chosen a different
termination metric for RangeLength
. For each one we consider, the proof
obligation is constructed the same way: the value for
RangeLength(start+1, len-1)
must be smaller than the value for
RangeLength(start, len)
.
Suppose we declared RangeLength
with decreases 10*len + 28
. Yes, this
termination metric is good enough to prove termination, because 10*len + 18
is less than 10*len + 28
.
What about decreases len - 6
? No, Dafny will complain about not being
able to prove termination if you give it this termination metric. It will
fail to prove that len-7
is below len-6
in the integer ordering, because
these could be negative.3
Let's try decreases start + len
. No, because start+1 + len-1
is not
less than start + len
.
How about decreases start + 2*len
. Yes, this proves termination, because
start+1 + 2*(len-1)
is less than start + 2*len
.
What about using the lexicographic tuple decreases start, len
?
No, this won't prove termination, because start+1, len-1
is not lexicographically
smaller than start, len
(in fact, it is lexicographically larger).
Had we left off the decreases
clause from RangeLength
altogether,
Dafny would have generated one for us. It generates it to be decreases start, len
,
because the parameters of RangeLength
are start
and len
, in that order.
So, without an explicit decreases
clause, Dafny would complain about not
being able to prove termination.
How about switching the order of the arguments, as in
decreases len, start
? Yes, this proves termination, because
len-1, start+1
is smaller than len, start
.
If you change RangeLength
to swap its parameters start
and len
(and do
the same swapping for the recursive call to RangeLength
), then you could
have left off the decreases
clause. In this case, Dafny would generate
decreases len, start
, and that proves termination. However, it's usually
advisable to stick with the parameter ordering that is most natural for
the function/method/lemma at hand.
Lastly, how about this quirky termination metric: decreases 7, len
?
When you first read this out loud, you might say
Decreases 7? How can anything decrease 7? 7 is what it was when my great-grandparents were alive, and 7 will always be just that. There is no way to decrease 7!
You are right, but that's not what the decreases
clause says. The
decreases
clause simply says how to map each lemma invocation to a value
in the well-founded order. So, with this quirky termination metric,
the proof obligation is to check that 7, len-1
is lexicographically
smaller than 7, len
, which it is. Not so quirky after all.
Let's simplify the manual proof we wrote for RangeLength
.
The calc
statement is helpful when we write a proof in the first
place, and it also gives a readable presentation of the proof.
But once we have figured out a proof, we sometimes choose to shorten it,
perhaps because in retrospect we found we were too punctilious even
for our own taste.
The calc
statement we wrote above really just has one non-trivial step,
the application of the induction hypothesis. In fact, we can replace
the entire calc
statement with just that one call:
lemma {:induction false} RangeLength(start: nat, len: nat)
ensures Length(Range(start, len)) == len
decreases len
{
if len == 0 {
} else {
RangeLength(start+1, len-1);
}
}
Since the “then” branch of the if
is empty, we can of course also
negate the guard, swap the branches, and omit the empty else
:
lemma {:induction false} RangeLength(start: nat, len: nat)
ensures Length(Range(start, len)) == len
decreases len
{
if len != 0 {
RangeLength(start+1, len-1);
}
}
Can we do even better than that one if
statement? Well, that depends
on what you think is better, but there is a way in Dafny to eliminate this
branch. We can do that by replacing the entire if
statement and its
one call to RangeLength
by a forall
statement that calls RangeLength
for a whole bunch of values.
To get this started, suppose we tried using the following as the body of
RangeLength
:
forall start', len' {
RangeLength(start', len');
}
This rather cavalier statement calls RangeLength
for all possible values
of start'
and len'
. Well, this doesn't work, because many of those recursive
calls won't terminate. We need to restrict ourselves to values of start'
and
len'
that decrease the termination metric.
So, let's try this:
lemma {:induction false} RangeLength(start: nat, len: nat)
ensures Length(Range(start, len)) == len
decreases len
{
forall start', len' | 0 <= len' < len {
RangeLength(start', len');
}
}
This calls RangeLength
for all values of start'
and all values of len'
smaller
than len
. That is, this forall
statement makes an infinite number of
recursive calls to RangeLength
, all at once. Each of these infinitely many
calls terminates, because the value of len'
is smaller than len
.4
This version of RangeLength
verifies. The mathematical name for calling
an induction hypothesis for all smaller values like this is called
strong induction.
Well, values of start'
can be larger than start
(in that, it is crucial
for the proof that start + 1
is one of those values). So, you may opine that
the forall
statement above does more than just strong induction.
Indeed, some may argue that this forall
statement also performs what is
called generalizing the lemma to all values of start'
. But if you consider
the ordering that determines “smaller” to be one that just compares the
len
component, then start', len'
is indeed smaller than start, len
, as long
as len'
is smaller than len
. However you want to think about it or whatever
mathematical name you want to give to it, Dafny accepts the forall
statement
above as a proof. In programming terms, all we're doing is proving that
each recursive call terminates, and that follows from the way we defined
the termination metric by decreases len
.
With all that background, we are finally ready to learn what
Dafny does to automate induction. Here's the short of it: if you remove
the attribute {:induction false}
(which I introduced only to disable
automatic induction while we were discussing what makes a proof in the
first place), then what Dafny does by default is insert the forall
statement we just saw. This means you can remove the manually inserted
forall
statement, because Dafny will add it for you. In other words,
Dafny accepts the following as a proof of the lemma:
lemma RangeLength(start: nat, len: nat)
ensures Length(Range(start, len)) == len
decreases len
{
}
Short and sweet.
Dafny automated induction all comes down to the maneuver of
automatically generating one forall
statement at the beginning
of every lemma. This does not
solve all problems of induction, but I'm amazed at how many simple
problems of induction that this simple maneuver does solve.
Let's look at the ingredients that go into the maneuver.
For a lemma L
with formal arguments args
, precondition P(args)
,5
and termination metric T(args)
, Dafny automatic induction inserts
the following forall
statement at the beginning of the body of L
:
forall aa' | P(args') && T(args') < T(args) {
L(args');
}
where
aa
is a subset of the formal parameters args
,
aa'
is aa
, suitably renamed to have fresh names,
args'
denotes args
but with every variable in aa
replaced by the
corresponding one in aa'
, and
<
denotes Dafny's well-founded lexicographic order.
The subset aa
of the variables args
can be customized by placing
the attribute {:induction aa}
on the lemma. If no such attribute is
given, then Dafny picks aa
heuristically. Dafny reports what it picks
in a tool tip that the Dafny IDEs show when you hover over the name of
the lemma.
We might think of {:induction x}
as saying we're "doing
induction over x
“, but be a little bit careful with this. I find that
common mathematical usage of the phrase ”do induction over …" conflates
several ideas. So, to be precise about it, what the {:induction aa}
attribute specifies is which parameters are to be universally quantified over.
There is one more subtlety that is easy to forget:
the decreases
clause matters. When we write a lemma like RangeLength
,
we might expect the proof to be simple enough that Dafny's automatic
induction will take care of it automatically. If so, we might just
write down the type signature of the lemma and the (pre- and) postcondition.
This often is all that's needed, but in the case of RangeLength
, it is
also necessary to supply a decreases
clause. Usually, if the main
function that the lemma is about (Range
in the running example) needs an
explicit decreases
clause, then the lemma does, too. Alas, this can be
easy to forget.
Next, let's consider some ways of customizing the automatic induction for
RangeLength
.
decreases
What happens if you do forget the decreases
clause of RangeLength
?
The lemma then looks like this:
lemma RangeLength(start: nat, len: nat)
ensures Length(Range(start, len)) == len
{
}
For this lemma, Dafny comes up with {:induction start, len}
and
decreases start, len
.6
That means the implicit forall
statement
that the induction maneuver inserts is
forall start': nat, len': nat |
start' < start || (start' == start && len' < len)
{
RangeLength(start', len');
}
As we explored in the previous section, this does not prove the lemma,
because the crucial call RangeLength(start+1, len-1)
is not among the
calls performed.
len
What if we manually override the quantified variables and list only len
?
lemma {:induction len} RangeLength(start: nat, len: nat)
ensures Length(Range(start, len)) == len
{
}
We then get the following forall
statement:
forall len': nat |
start < start || (start == start && len' < len)
{
RangeLength(start, len');
}
Notice how there is no start'
in this example. Rather, the original
formal parameter start
is used in the formulaic range expression
P(start, len') && T(start, len') < T(start, len)
There is no precondition in the example, so the P
part is true
.
Because we didn't supply an explicit decreases
clause, Dafny generates
decreases start, len
. Therefore, the range expression that restricts the
universally quantified variables (that is, len'
) is
start < start || (start == start && len' < len)
which simplifies to len' < len
.
Again, since the recursive calls entailed by this forall
statement
does not include the crucial call RangeLength(start+1, len-1)
, Dafny
will report an error that the lemma's postcondition might not hold.
The result would be the same even if we supplied a decreases
clause
manually. That would still restrict len'
to values that are smaller than
len
, which is fine, but the problem is still that the inductive hypothesis
is only considered for the given value of start
.
I'm grateful to Sean McLaughlin, who provided both the running example
and the question about automatic induction for RangeLength
.
0.I'm simplifying a little bit. Dafny's well-founded order on integers is
actually “integer less-than, with no more than one negative value”. So, a
chain is allowed to dip down below 0
, but once the chain includes a
negative number, it must stop. This extension is a well-founded order,
because there is still no way to make an infinite descending chain.
↩
1.I'm simplifying a little bit. Dafny's fixed well-founded order also orders some values across different types. ↩
2.This is a slight simplification.
Dafny's heuristic for coming up with a decreases
clause
for a recursive function/method/lemma omits parameters whose types aren't
helpful in letting you prove termination. For example, a parameter
whose type is a type parameter is omitted from automatically generated
decreases
clauses.
↩
3.I was fast and loose in the previous example, because I should have also
pointed out that 10*len + 18
is non-negative.)
↩
4.“An infinite number of calls?!”, you say to yourself. “How is that termination, even if each such call terminates?” I won't delve into that here, but I will at least point out that we're calling a lemma, not a compiled method. So, you don't need to worry about having enough computational resources to make an infinite number of calls. ↩
5.A precondition of a lemma is declared using the keyword requires
.
It is like an antecedent of the lemma, and it is checked to hold at
every call site. The running example does not include a precondition.
A simple way to include one would be to change the type of start
from nat
to int
and to add requires 0 <= start
.
↩
6.I mentioned that tool tips in the Dafny IDEs show you what decreases
clause Dafny picks. As of this writing, such a tool tip is shown only
when the body of the function/method/lemma includes an explicit
recursive call. In particular, if the body of the lemma is empty,
there is no tool tip that shows you what decreases
clause is generated.
↩