Dafny Power User:

Automatic Induction

Automatic Induction

**Abstract.** For simple examples,
Dafny's automatic induction is powerful enough to prove some lemmas without
manual input and helps reduce the manual input required to prove others.
This note explains how automatic induction is applied and how the two main
knobs are used to adjust what the automation does.

Dafny not only supports manually written lemmas, but also provides some automation that assists in proving lemmas by induction [0]. Let's start with variations of a manual proof, then see how automatic induction provides or fails to provide a proof, and finally how to adjust what automation does.

Consider the following standard definition of recursively defined lists
along with a `Length`

function and a function that constructs a list of
increasing integers.

```
datatype List<A> = Nil | Cons(A, List<A>)
function method Length(list: List): nat {
match list
case Nil => 0
case Cons(_, tail) => 1 + Length(tail)
}
function method Range(start: int, len: nat): List<int>
decreases len
{
if len == 0 then Nil else Cons(start, Range(start+1, len-1))
}
```

With these definitions, let's set out to prove that
the length of the list `Range(start, len)`

is `len`

.

So, we're going to prove that the length of the list `Range(start, len)`

is `len`

. We start with a manual proof. To be sure that automatic induction
does not help us along in this first example, I have marked with lemmas
with the attribute `{:induction false}`

.

```
lemma {:induction false} RangeLength(start: nat, len: nat)
ensures Length(Range(start, len)) == len
decreases len
{
if len == 0 {
// trivial
} else {
calc {
Length(Range(start, len));
== // def. Range, since len > 0
Length(Cons(start, Range(start+1, len-1)));
== // def. Length on Cons
1 + Length(Range(start+1, len-1));
== { RangeLength(start+1, len-1); } // apply induction hypothesis
1 + len - 1;
== // arithmetic
len;
}
}
}
```

For the purpose of this note, I assume you have a basic understanding of how to write proof like this manually, either in Dafny or on paper. I will describe the elements of the proof here, but won't try to explain all possible considerations.

The lemma is called `RangeLength`

and is parameterized by `start`

and `len`

,
just like the `Range`

function is. The property that the lemma proves is
stated in the `ensures`

clause. It is also called the *proof goal*.

The signature part of the lemma also defines
a `decreases`

clause, which says that the expression `len`

is to be used as the
termination metric for any recursive calls of the lemma. This termination
metric is the same as the one of the `Range`

function. This is common, because
the structure of a proof typically follows the structure of a function in
the lemma's proof goal.

The body of the lemma is divided up into two cases, following the cases in
the definition of `Range`

. The first case is `len == 0`

and gives rise to a
very simple proof: when `len == 0`

, `Range`

returns `Nil`

whose length is `0`

.

The other case makes use of a *verified calculation*, commonly known as
a `calc`

statement [1]. This calculation starts with
the expression `Length(Range(start, len))`

and uses equality-preserving
transformations to arrive at `len`

, which is exactly what the proof goal says
the lemma has to prove. The first two steps of the calculation apply the
definitions of `Range`

and `Length`

to rewrite the expression into a form
where we see the expression

`Length(Range(start+1, len-1))`

This is where the one interesting step of the proof takes place. We want to
replace this expression with `len - 1`

, which requires a justification that
the expression indeed equals `len - 1`

. With the right parameterization,
this is exactly what the lemma we're trying to prove says. Therefore,
we use a `calc`

-statement hint and call the lemma:

`RangeLength(start+1, len-1);`

Because this is a recursive call to the lemma we're trying to prove, we
must prove termination. A recursive call to a lemma is commonly known as
an application of the *induction hypothesis*. In that way, our termination
check corresponds to making sure that the induction is *well-founded*.
In Dafny, termination is proved by
showing that every recursive call decreases some *termination metric*,
which is to say that some expression evaluates to a smaller value for
the recursive call than it does for the caller. What is this termination-metric
expression that is to be smaller and what does “smaller” mean? The expression is
`len`

, as given by the `decreases`

clause, and since `len`

is an integer,
Dafny uses "integer less-than, bounded below by `0`

" as the order.

That was a mouthful. Let's review that last part again, the part about termination.

To prove that recursive calls to a (function or method or) lemma terminate,
we associate each invocation of the lemma with a value in a well-founded
order. Being well-founded means that there are no infinite descending chains.
That is, there is fixed ordering and every sequence of successively
smaller values in that ordering is finite. For example, in the
"integer less-than, bounded below by `0`

" ordering, one descending chain
is

`57, 56, 48, 39, 20, 4`

This chain is finite. No matter what descending chain you write down, it
will be finite in this ordering (otherwise, you would eventually get a negative
number, but remember we said "bounded below by `0`

"^{0}).

Dafny builds in a fixed well-founded order for every type.^{1}
It also supports *lexicographic tuples* of values, and the well-founded order
for such tuples is the lexicographic ordering of the types on each component.

The way we associate a lemma invocation with a value in this fixed well-founded
order is that we declare a `decreases`

clause. It takes as its argument a
list of expressions, and these form a lexicographic tuple. For our
`RangeLength`

lemma, we used `decreases len`

. This says that each invocation
of the lemma will be associated with the same value as is passed in as
parameter `len`

.

Function `Range`

is also recursive, and for its invocations,
we also used `len`

as the termination metric. Function `Length`

is recursive,
too, but evidently we didn't give a `decreases`

clause for it. In the absence
of a `decreases`

clause, Dafny provides one for us, namely the lexicographic
tuple consisting of the function/method/lemma's parameters, in the order
given.^{2} So, the termination metric of `Length`

is `list`

(and
Dafny's well-founded order for inductive datatypes is structural inclusion).
Dafny's IDEs provide a tool tip (which you can see by hovering over the
function/method/lemma
declaration) that tells you which `decreases`

clause it picks for recursive
functions/methods/lemmas.

So far, I've said that Dafny defines a fixed well-founded ordering and the
way to associate lemma invocation with a value in that ordering is to
declare a `decreases`

clause. How does the verifier use these things to prove
termination? It proves that the value associated with a callee is strictly
below the value associated with the caller. In other words, it proves that
every recursive call takes a step in a descending chain. Because every
chain is finite, it follows that there is no infinite recursion. In other
words, the recursive calls terminate.

In the example, there's a recursive call from `RangeLength(start, len)`

to `RangeLength(start+1, len-1)`

. The lemma uses `decreases len`

, so the
verifier checks that `len-1 < len`

, which proves termination.

Similarly, there is a recursive call from `Range(start, len)`

to
`Range(start+1, len-1)`

. The verifier checks `len-1 < len`

and this proves
termination.

Finally, there is a recursive call from `Length(list)`

to `Length(tail)`

, where
`tail`

is structurally included in `list`

. Thus, the verifier is able to prove
termination here as well.

`RangeLength`

It is instructive to consider what would happen if we had chosen a different
termination metric for `RangeLength`

. For each one we consider, the proof
obligation is constructed the same way: the value for
`RangeLength(start+1, len-1)`

must be smaller than the value for
`RangeLength(start, len)`

.

Suppose we declared `RangeLength`

with `decreases 10*len + 28`

. Yes, this
termination metric is good enough to prove termination, because `10*len + 18`

is less than `10*len + 28`

.

What about `decreases len - 6`

? No, Dafny will complain about not being
able to prove termination if you give it this termination metric. It will
fail to prove that `len-7`

is below `len-6`

in the integer ordering, because
these could be negative.^{3}

Let's try `decreases start + len`

. No, because `start+1 + len-1`

is not
less than `start + len`

.

How about `decreases start + 2*len`

. Yes, this proves termination, because
`start+1 + 2*(len-1)`

is less than `start + 2*len`

.

What about using the lexicographic tuple `decreases start, len`

?
No, this won't prove termination, because `start+1, len-1`

is not lexicographically
smaller than `start, len`

(in fact, it is lexicographically larger).
Had we left off the `decreases`

clause from `RangeLength`

altogether,
Dafny would have generated one for us. It generates it to be `decreases start, len`

,
because the parameters of `RangeLength`

are `start`

and `len`

, in that order.
So, without an explicit `decreases`

clause, Dafny would complain about not
being able to prove termination.

How about switching the order of the arguments, as in
`decreases len, start`

? Yes, this proves termination, because
`len-1, start+1`

is smaller than `len, start`

.

If you change `RangeLength`

to swap its parameters `start`

and `len`

(and do
the same swapping for the recursive call to `RangeLength`

), then you could
have left off the `decreases`

clause. In this case, Dafny would generate
`decreases len, start`

, and that proves termination. However, it's usually
advisable to stick with the parameter ordering that is most natural for
the function/method/lemma at hand.

Lastly, how about this quirky termination metric: `decreases 7, len`

?
When you first read this out loud, you might say

Decreases 7? How can anything decrease 7? 7 is what it was when my great-grandparents were alive, and 7 will always be just that. There is no way to decrease 7!

You are right, but that's not what the `decreases`

clause says. The
`decreases`

clause simply says how to map each lemma invocation to a value
in the well-founded order. So, with this quirky termination metric,
the proof obligation is to check that `7, len-1`

is lexicographically
smaller than `7, len`

, which it is. Not so quirky after all.

Let's simplify the manual proof we wrote for `RangeLength`

.
The `calc`

statement is helpful when we write a proof in the first
place, and it also gives a readable presentation of the proof.
But once we have figured out a proof, we sometimes choose to shorten it,
perhaps because in retrospect we found we were too punctilious even
for our own taste.

The `calc`

statement we wrote above really just has one non-trivial step,
the application of the induction hypothesis. In fact, we can replace
the entire `calc`

statement with just that one call:

```
lemma {:induction false} RangeLength(start: nat, len: nat)
ensures Length(Range(start, len)) == len
decreases len
{
if len == 0 {
} else {
RangeLength(start+1, len-1);
}
}
```

Since the “then” branch of the `if`

is empty, we can of course also
negate the guard, swap the branches, and omit the empty `else`

:

```
lemma {:induction false} RangeLength(start: nat, len: nat)
ensures Length(Range(start, len)) == len
decreases len
{
if len != 0 {
RangeLength(start+1, len-1);
}
}
```

Can we do even better than that one `if`

statement? Well, that depends
on what you think is better, but there is a way in Dafny to eliminate this
branch. We can do that by replacing the entire `if`

statement and its
one call to `RangeLength`

by a `forall`

statement that calls `RangeLength`

for a whole bunch of values.

To get this started, suppose we tried using the following as the body of
`RangeLength`

:

```
forall start', len' {
RangeLength(start', len');
}
```

This rather cavalier statement calls `RangeLength`

for all possible values
of `start'`

and `len'`

. Well, this doesn't work, because many of those recursive
calls won't terminate. We need to restrict ourselves to values of `start'`

and
`len'`

that decrease the termination metric.

So, let's try this:

```
lemma {:induction false} RangeLength(start: nat, len: nat)
ensures Length(Range(start, len)) == len
decreases len
{
forall start', len' | 0 <= len' < len {
RangeLength(start', len');
}
}
```

This calls `RangeLength`

for all values of `start'`

and all values of `len'`

smaller
than `len`

. That is, this `forall`

statement makes an infinite number of
recursive calls to `RangeLength`

, all at once. Each of these infinitely many
calls terminates, because the value of `len'`

is smaller than `len`

.^{4}

This version of `RangeLength`

verifies. The mathematical name for calling
an induction hypothesis for all smaller values like this is called
*strong induction*.

Well, values of `start'`

can be larger than `start`

(in that, it is crucial
for the proof that `start + 1`

is one of those values). So, you may opine that
the `forall`

statement above does more than just strong induction.
Indeed, some may argue that this `forall`

statement also performs what is
called *generalizing* the lemma to all values of `start'`

. But if you consider
the ordering that determines “smaller” to be one that just compares the
`len`

component, then `start', len'`

is indeed smaller than `start, len`

, as long
as `len'`

is smaller than `len`

. However you want to think about it or whatever
mathematical name you want to give to it, Dafny accepts the `forall`

statement
above as a proof. In programming terms, all we're doing is proving that
each recursive call terminates, and that follows from the way we defined
the termination metric by `decreases len`

.

With all that background, we are finally ready to learn what
Dafny does to automate induction. Here's the short of it: if you remove
the attribute `{:induction false}`

(which I introduced only to disable
automatic induction while we were discussing what makes a proof in the
first place), then what Dafny does by default is insert the `forall`

statement we just saw. This means you can remove the manually inserted
`forall`

statement, because Dafny will add it for you. In other words,
Dafny accepts the following as a proof of the lemma:

```
lemma RangeLength(start: nat, len: nat)
ensures Length(Range(start, len)) == len
decreases len
{
}
```

Short and sweet.

Dafny automated induction all comes down to the maneuver of
automatically generating one `forall`

statement at the beginning
of every lemma. This does not
solve all problems of induction, but I'm amazed at how many simple
problems of induction that this simple maneuver does solve.
Let's look at the ingredients that go into the maneuver.

For a lemma `L`

with formal arguments `args`

, precondition `P(args)`

,^{5}
and termination metric `T(args)`

, Dafny automatic induction inserts
the following `forall`

statement at the beginning of the body of `L`

:

```
forall aa' | P(args') && T(args') < T(args) {
L(args');
}
```

where

`aa`

is a subset of the formal parameters`args`

,`aa'`

is`aa`

, suitably renamed to have fresh names,`args'`

denotes`args`

but with every variable in`aa`

replaced by the corresponding one in`aa'`

, and`<`

denotes Dafny's well-founded lexicographic order.

The subset `aa`

of the variables `args`

can be customized by placing
the attribute `{:induction aa}`

on the lemma. If no such attribute is
given, then Dafny picks `aa`

heuristically. Dafny reports what it picks
in a tool tip that the Dafny IDEs show when you hover over the name of
the lemma.

We might think of `{:induction x}`

as saying we're "doing
induction over `x`

“, but be a little bit careful with this. I find that
common mathematical usage of the phrase ”do induction over …" conflates
several ideas. So, to be precise about it, what the `{:induction aa}`

attribute specifies is *which parameters are to be universally quantified over*.

There is one more subtlety that is easy to forget:
the `decreases`

clause matters. When we write a lemma like `RangeLength`

,
we might expect the proof to be simple enough that Dafny's automatic
induction will take care of it automatically. If so, we might just
write down the type signature of the lemma and the (pre- and) postcondition.
This often is all that's needed, but in the case of `RangeLength`

, it is
also necessary to supply a `decreases`

clause. Usually, if the main
function that the lemma is about (`Range`

in the running example) needs an
explicit `decreases`

clause, then the lemma does, too. Alas, this can be
easy to forget.

Next, let's consider some ways of customizing the automatic induction for
`RangeLength`

.

`decreases`

What happens if you do forget the `decreases`

clause of `RangeLength`

?
The lemma then looks like this:

```
lemma RangeLength(start: nat, len: nat)
ensures Length(Range(start, len)) == len
{
}
```

For this lemma, Dafny comes up with `{:induction start, len}`

and
`decreases start, len`

.^{6}
That means the implicit `forall`

statement
that the induction maneuver inserts is

```
forall start': nat, len': nat |
start' < start || (start' == start && len' < len)
{
RangeLength(start', len');
}
```

As we explored in the previous section, this does not prove the lemma,
because the crucial call `RangeLength(start+1, len-1)`

is not among the
calls performed.

`len`

What if we manually override the quantified variables and list only `len`

?

```
lemma {:induction len} RangeLength(start: nat, len: nat)
ensures Length(Range(start, len)) == len
{
}
```

We then get the following `forall`

statement:

```
forall len': nat |
start < start || (start == start && len' < len)
{
RangeLength(start, len');
}
```

Notice how there is no `start'`

in this example. Rather, the original
formal parameter `start`

is used in the formulaic range expression

`P(start, len') && T(start, len') < T(start, len)`

There is no precondition in the example, so the `P`

part is `true`

.
Because we didn't supply an explicit `decreases`

clause, Dafny generates
`decreases start, len`

. Therefore, the range expression that restricts the
universally quantified variables (that is, `len'`

) is

`start < start || (start == start && len' < len)`

which simplifies to `len' < len`

.

Again, since the recursive calls entailed by this `forall`

statement
does not include the crucial call `RangeLength(start+1, len-1)`

, Dafny
will report an error that the lemma's postcondition might not hold.

The result would be the same even if we supplied a `decreases`

clause
manually. That would still restrict `len'`

to values that are smaller than
`len`

, which is fine, but the problem is still that the inductive hypothesis
is only considered for the given value of `start`

.

I'm grateful to Sean McLaughlin, who provided both the running example
and the question about automatic induction for `RangeLength`

.

[0]K. Rustan M. Leino.
Automating induction with an SMT solver.
In Viktor Kuncak and Andrey Rybalchenko, editors, *Verification,
Model Checking, and Abstract Interpretation — 13th International
Conference, VMCAI 2012*, volume 7148 of *Lecture Notes in Computer
Science*, pages 315–331. Springer, January 2012. 🔎

[1]K. Rustan M. Leino and Nadia Polikarpova.
Verified calculations.
In Ernie Cohen and Andrey Rybalchenko, editors, *Verified
Software: Theories, Tools, Experiments — 5th International Conference,
VSTTE 2013, Revised Selected Papers*, volume 8164 of *Lecture Notes in
Computer Science*, pages 170–190. Springer, 2014. 🔎

^{0.}I'm simplifying a little bit. Dafny's well-founded order on integers is
actually “integer less-than, with no more than one negative value”. So, a
chain is allowed to dip down below `0`

, but once the chain includes a
negative number, it must stop. This extension is a well-founded order,
because there is still no way to make an infinite descending chain.
↩

^{1.}I'm simplifying a little bit. Dafny's fixed well-founded order also orders
some values across different types.
↩

^{2.}This is a slight simplification.
Dafny's heuristic for coming up with a `decreases`

clause
for a recursive function/method/lemma omits parameters whose types aren't
helpful in letting you prove termination. For example, a parameter
whose type is a type parameter is omitted from automatically generated
`decreases`

clauses.
↩

^{3.}I was fast and loose in the previous example, because I should have also
pointed out that `10*len + 18`

is non-negative.)
↩

^{4.}“An infinite number of calls?!”, you say to yourself. “How is that termination,
even if each such call terminates?” I won't delve into that here, but
I will at least point out that we're calling a lemma, not a compiled
method. So, you don't need to worry about having enough computational
resources to make an infinite number of calls.
↩

^{5.}A precondition of a lemma is declared using the keyword `requires`

.
It is like an antecedent of the lemma, and it is checked to hold at
every call site. The running example does not include a precondition.
A simple way to include one would be to change the type of `start`

from `nat`

to `int`

and to add `requires 0 <= start`

.
↩

^{6.}I mentioned that tool tips in the Dafny IDEs show you what `decreases`

clause Dafny picks. As of this writing, such a tool tip is shown only
when the body of the function/method/lemma includes an explicit
recursive call. In particular, if the body of the lemma is empty,
there is no tool tip that shows you what `decreases`

clause is generated.
↩