Dafny Power User:
Automatic Induction
K. Rustan M. Leino
Manuscript KRML 269, 31 May 2019

Abstract. For simple examples, Dafny's automatic induction is powerful enough to prove some lemmas without manual input and helps reduce the manual input required to prove others. This note explains how automatic induction is applied and how the two main knobs are used to adjust what the automation does.

Dafny not only supports manually written lemmas, but also provides some automation that assists in proving lemmas by induction [0]. Let's start with variations of a manual proof, then see how automatic induction provides or fails to provide a proof, and finally how to adjust what automation does.

0. Running example

0.0. List and function definitions

Consider the following standard definition of recursively defined lists along with a Length function and a function that constructs a list of increasing integers.

datatype List<A> = Nil | Cons(A, List<A>)

function method Length(list: List): nat {
  match list
  case Nil => 0
  case Cons(_, tail) => 1 + Length(tail)
}

function method Range(start: int, len: nat): List<int>
  decreases len
{
  if len == 0 then Nil else Cons(start, Range(start+1, len-1))
}

With these definitions, let's set out to prove that the length of the list Range(start, len) is len.

0.1. A lemma and proof

So, we're going to prove that the length of the list Range(start, len) is len. We start with a manual proof. To be sure that automatic induction does not help us along in this first example, I have marked with lemmas with the attribute {:induction false}.

lemma {:induction false} RangeLength(start: nat, len: nat)
  ensures Length(Range(start, len)) == len
  decreases len
{
  if len == 0 {
    // trivial
  } else {
    calc {
      Length(Range(start, len));
    ==  // def. Range, since len > 0
      Length(Cons(start, Range(start+1, len-1)));
    ==  // def. Length on Cons
      1 + Length(Range(start+1, len-1));
    ==  { RangeLength(start+1, len-1); }  // apply induction hypothesis
      1 + len - 1;
    ==  // arithmetic
      len;
    }
  }
}

For the purpose of this note, I assume you have a basic understanding of how to write proof like this manually, either in Dafny or on paper. I will describe the elements of the proof here, but won't try to explain all possible considerations.

The lemma is called RangeLength and is parameterized by start and len, just like the Range function is. The property that the lemma proves is stated in the ensures clause. It is also called the proof goal.

The signature part of the lemma also defines a decreases clause, which says that the expression len is to be used as the termination metric for any recursive calls of the lemma. This termination metric is the same as the one of the Range function. This is common, because the structure of a proof typically follows the structure of a function in the lemma's proof goal.

The body of the lemma is divided up into two cases, following the cases in the definition of Range. The first case is len == 0 and gives rise to a very simple proof: when len == 0, Range returns Nil whose length is 0.

The other case makes use of a verified calculation, commonly known as a calc statement [1]. This calculation starts with the expression Length(Range(start, len)) and uses equality-preserving transformations to arrive at len, which is exactly what the proof goal says the lemma has to prove. The first two steps of the calculation apply the definitions of Range and Length to rewrite the expression into a form where we see the expression

Length(Range(start+1, len-1))

This is where the one interesting step of the proof takes place. We want to replace this expression with len - 1, which requires a justification that the expression indeed equals len - 1. With the right parameterization, this is exactly what the lemma we're trying to prove says. Therefore, we use a calc-statement hint and call the lemma:

RangeLength(start+1, len-1);

Because this is a recursive call to the lemma we're trying to prove, we must prove termination. A recursive call to a lemma is commonly known as an application of the induction hypothesis. In that way, our termination check corresponds to making sure that the induction is well-founded. In Dafny, termination is proved by showing that every recursive call decreases some termination metric, which is to say that some expression evaluates to a smaller value for the recursive call than it does for the caller. What is this termination-metric expression that is to be smaller and what does “smaller” mean? The expression is len, as given by the decreases clause, and since len is an integer, Dafny uses "integer less-than, bounded below by 0" as the order.

0.2. Termination in more detail

That was a mouthful. Let's review that last part again, the part about termination.

To prove that recursive calls to a (function or method or) lemma terminate, we associate each invocation of the lemma with a value in a well-founded order. Being well-founded means that there are no infinite descending chains. That is, there is fixed ordering and every sequence of successively smaller values in that ordering is finite. For example, in the "integer less-than, bounded below by 0" ordering, one descending chain is

57, 56, 48, 39, 20, 4

This chain is finite. No matter what descending chain you write down, it will be finite in this ordering (otherwise, you would eventually get a negative number, but remember we said "bounded below by 0"0).

Dafny builds in a fixed well-founded order for every type.1 It also supports lexicographic tuples of values, and the well-founded order for such tuples is the lexicographic ordering of the types on each component.

The way we associate a lemma invocation with a value in this fixed well-founded order is that we declare a decreases clause. It takes as its argument a list of expressions, and these form a lexicographic tuple. For our RangeLength lemma, we used decreases len. This says that each invocation of the lemma will be associated with the same value as is passed in as parameter len.

Function Range is also recursive, and for its invocations, we also used len as the termination metric. Function Length is recursive, too, but evidently we didn't give a decreases clause for it. In the absence of a decreases clause, Dafny provides one for us, namely the lexicographic tuple consisting of the function/method/lemma's parameters, in the order given.2 So, the termination metric of Length is list (and Dafny's well-founded order for inductive datatypes is structural inclusion). Dafny's IDEs provide a tool tip (which you can see by hovering over the function/method/lemma declaration) that tells you which decreases clause it picks for recursive functions/methods/lemmas.

So far, I've said that Dafny defines a fixed well-founded ordering and the way to associate lemma invocation with a value in that ordering is to declare a decreases clause. How does the verifier use these things to prove termination? It proves that the value associated with a callee is strictly below the value associated with the caller. In other words, it proves that every recursive call takes a step in a descending chain. Because every chain is finite, it follows that there is no infinite recursion. In other words, the recursive calls terminate.

In the example, there's a recursive call from RangeLength(start, len) to RangeLength(start+1, len-1). The lemma uses decreases len, so the verifier checks that len-1 < len, which proves termination.

Similarly, there is a recursive call from Range(start, len) to Range(start+1, len-1). The verifier checks len-1 < len and this proves termination.

Finally, there is a recursive call from Length(list) to Length(tail), where tail is structurally included in list. Thus, the verifier is able to prove termination here as well.

0.3. Other termination metrics for RangeLength

It is instructive to consider what would happen if we had chosen a different termination metric for RangeLength. For each one we consider, the proof obligation is constructed the same way: the value for RangeLength(start+1, len-1) must be smaller than the value for RangeLength(start, len).

Suppose we declared RangeLength with decreases 10*len + 28. Yes, this termination metric is good enough to prove termination, because 10*len + 18 is less than 10*len + 28.

What about decreases len - 6? No, Dafny will complain about not being able to prove termination if you give it this termination metric. It will fail to prove that len-7 is below len-6 in the integer ordering, because these could be negative.3

Let's try decreases start + len. No, because start+1 + len-1 is not less than start + len.

How about decreases start + 2*len. Yes, this proves termination, because start+1 + 2*(len-1) is less than start + 2*len.

What about using the lexicographic tuple decreases start, len? No, this won't prove termination, because start+1, len-1 is not lexicographically smaller than start, len (in fact, it is lexicographically larger). Had we left off the decreases clause from RangeLength altogether, Dafny would have generated one for us. It generates it to be decreases start, len, because the parameters of RangeLength are start and len, in that order. So, without an explicit decreases clause, Dafny would complain about not being able to prove termination.

How about switching the order of the arguments, as in decreases len, start? Yes, this proves termination, because len-1, start+1 is smaller than len, start.

If you change RangeLength to swap its parameters start and len (and do the same swapping for the recursive call to RangeLength), then you could have left off the decreases clause. In this case, Dafny would generate decreases len, start, and that proves termination. However, it's usually advisable to stick with the parameter ordering that is most natural for the function/method/lemma at hand.

Lastly, how about this quirky termination metric: decreases 7, len? When you first read this out loud, you might say

Decreases 7? How can anything decrease 7? 7 is what it was when my great-grandparents were alive, and 7 will always be just that. There is no way to decrease 7!

You are right, but that's not what the decreases clause says. The decreases clause simply says how to map each lemma invocation to a value in the well-founded order. So, with this quirky termination metric, the proof obligation is to check that 7, len-1 is lexicographically smaller than 7, len, which it is. Not so quirky after all.

0.4. A shorter proof

Let's simplify the manual proof we wrote for RangeLength. The calc statement is helpful when we write a proof in the first place, and it also gives a readable presentation of the proof. But once we have figured out a proof, we sometimes choose to shorten it, perhaps because in retrospect we found we were too punctilious even for our own taste.

The calc statement we wrote above really just has one non-trivial step, the application of the induction hypothesis. In fact, we can replace the entire calc statement with just that one call:

lemma {:induction false} RangeLength(start: nat, len: nat)
  ensures Length(Range(start, len)) == len
  decreases len
{
  if len == 0 {
  } else {
    RangeLength(start+1, len-1);
  }
}

Since the “then” branch of the if is empty, we can of course also negate the guard, swap the branches, and omit the empty else:

lemma {:induction false} RangeLength(start: nat, len: nat)
  ensures Length(Range(start, len)) == len
  decreases len
{
  if len != 0 {
    RangeLength(start+1, len-1);
  }
}

0.5. Strong induction

Can we do even better than that one if statement? Well, that depends on what you think is better, but there is a way in Dafny to eliminate this branch. We can do that by replacing the entire if statement and its one call to RangeLength by a forall statement that calls RangeLength for a whole bunch of values.

To get this started, suppose we tried using the following as the body of RangeLength:

forall start', len' {
  RangeLength(start', len');
}

This rather cavalier statement calls RangeLength for all possible values of start' and len'. Well, this doesn't work, because many of those recursive calls won't terminate. We need to restrict ourselves to values of start' and len' that decrease the termination metric.

So, let's try this:

lemma {:induction false} RangeLength(start: nat, len: nat)
  ensures Length(Range(start, len)) == len
  decreases len
{
  forall start', len' | 0 <= len' < len {
    RangeLength(start', len');
  }
}

This calls RangeLength for all values of start' and all values of len' smaller than len. That is, this forall statement makes an infinite number of recursive calls to RangeLength, all at once. Each of these infinitely many calls terminates, because the value of len' is smaller than len.4

This version of RangeLength verifies. The mathematical name for calling an induction hypothesis for all smaller values like this is called strong induction.

Well, values of start' can be larger than start (in that, it is crucial for the proof that start + 1 is one of those values). So, you may opine that the forall statement above does more than just strong induction. Indeed, some may argue that this forall statement also performs what is called generalizing the lemma to all values of start'. But if you consider the ordering that determines “smaller” to be one that just compares the len component, then start', len' is indeed smaller than start, len, as long as len' is smaller than len. However you want to think about it or whatever mathematical name you want to give to it, Dafny accepts the forall statement above as a proof. In programming terms, all we're doing is proving that each recursive call terminates, and that follows from the way we defined the termination metric by decreases len.

1. Automatic induction

With all that background, we are finally ready to learn what Dafny does to automate induction. Here's the short of it: if you remove the attribute {:induction false} (which I introduced only to disable automatic induction while we were discussing what makes a proof in the first place), then what Dafny does by default is insert the forall statement we just saw. This means you can remove the manually inserted forall statement, because Dafny will add it for you. In other words, Dafny accepts the following as a proof of the lemma:

lemma RangeLength(start: nat, len: nat)
  ensures Length(Range(start, len)) == len
  decreases len
{
}

Short and sweet.

1.0. The detailed recipe for automatic induction

Dafny automated induction all comes down to the maneuver of automatically generating one forall statement at the beginning of every lemma. This does not solve all problems of induction, but I'm amazed at how many simple problems of induction that this simple maneuver does solve. Let's look at the ingredients that go into the maneuver.

For a lemma L with formal arguments args, precondition P(args),5 and termination metric T(args), Dafny automatic induction inserts the following forall statement at the beginning of the body of L:

forall aa' | P(args') && T(args') < T(args) {
  L(args');
}

where

The subset aa of the variables args can be customized by placing the attribute {:induction aa} on the lemma. If no such attribute is given, then Dafny picks aa heuristically. Dafny reports what it picks in a tool tip that the Dafny IDEs show when you hover over the name of the lemma.

We might think of {:induction x} as saying we're "doing induction over x“, but be a little bit careful with this. I find that common mathematical usage of the phrase ”do induction over " conflates several ideas. So, to be precise about it, what the {:induction aa} attribute specifies is which parameters are to be universally quantified over.

There is one more subtlety that is easy to forget: the decreases clause matters. When we write a lemma like RangeLength, we might expect the proof to be simple enough that Dafny's automatic induction will take care of it automatically. If so, we might just write down the type signature of the lemma and the (pre- and) postcondition. This often is all that's needed, but in the case of RangeLength, it is also necessary to supply a decreases clause. Usually, if the main function that the lemma is about (Range in the running example) needs an explicit decreases clause, then the lemma does, too. Alas, this can be easy to forget.

Next, let's consider some ways of customizing the automatic induction for RangeLength.

1.1. Example: missing decreases

What happens if you do forget the decreases clause of RangeLength? The lemma then looks like this:

lemma RangeLength(start: nat, len: nat)
  ensures Length(Range(start, len)) == len
{
}

For this lemma, Dafny comes up with {:induction start, len} and decreases start, len.6 That means the implicit forall statement that the induction maneuver inserts is

forall start': nat, len': nat |
  start' < start || (start' == start && len' < len)
{
  RangeLength(start', len');
}

As we explored in the previous section, this does not prove the lemma, because the crucial call RangeLength(start+1, len-1) is not among the calls performed.

1.2. Example: quantifying over just len

What if we manually override the quantified variables and list only len?

lemma {:induction len} RangeLength(start: nat, len: nat)
  ensures Length(Range(start, len)) == len
{
}

We then get the following forall statement:

forall len': nat |
  start < start || (start == start && len' < len)
{
  RangeLength(start, len');
}

Notice how there is no start' in this example. Rather, the original formal parameter start is used in the formulaic range expression

P(start, len') && T(start, len') < T(start, len)

There is no precondition in the example, so the P part is true. Because we didn't supply an explicit decreases clause, Dafny generates decreases start, len. Therefore, the range expression that restricts the universally quantified variables (that is, len') is

start < start || (start == start && len' < len)

which simplifies to len' < len.

Again, since the recursive calls entailed by this forall statement does not include the crucial call RangeLength(start+1, len-1), Dafny will report an error that the lemma's postcondition might not hold.

The result would be the same even if we supplied a decreases clause manually. That would still restrict len' to values that are smaller than len, which is fine, but the problem is still that the inductive hypothesis is only considered for the given value of start.

Acknowledgments

I'm grateful to Sean McLaughlin, who provided both the running example and the question about automatic induction for RangeLength.

References

[0]K. Rustan M. Leino. Automating induction with an SMT solver. In Viktor Kuncak and Andrey Rybalchenko, editors, Verification, Model Checking, and Abstract Interpretation — 13th International Conference, VMCAI 2012, volume 7148 of Lecture Notes in Computer Science, pages 315331. Springer, January 2012. 🔎
[1]K. Rustan M. Leino and Nadia Polikarpova. Verified calculations. In Ernie Cohen and Andrey Rybalchenko, editors, Verified Software: Theories, Tools, Experiments — 5th International Conference, VSTTE 2013, Revised Selected Papers, volume 8164 of Lecture Notes in Computer Science, pages 170190. Springer, 2014. 🔎

0.I'm simplifying a little bit. Dafny's well-founded order on integers is actually “integer less-than, with no more than one negative value”. So, a chain is allowed to dip down below 0, but once the chain includes a negative number, it must stop. This extension is a well-founded order, because there is still no way to make an infinite descending chain.

1.I'm simplifying a little bit. Dafny's fixed well-founded order also orders some values across different types.

2.This is a slight simplification. Dafny's heuristic for coming up with a decreases clause for a recursive function/method/lemma omits parameters whose types aren't helpful in letting you prove termination. For example, a parameter whose type is a type parameter is omitted from automatically generated decreases clauses.

3.I was fast and loose in the previous example, because I should have also pointed out that 10*len + 18 is non-negative.)

4.“An infinite number of calls?!”, you say to yourself. “How is that termination, even if each such call terminates?” I won't delve into that here, but I will at least point out that we're calling a lemma, not a compiled method. So, you don't need to worry about having enough computational resources to make an infinite number of calls.

5.A precondition of a lemma is declared using the keyword requires. It is like an antecedent of the lemma, and it is checked to hold at every call site. The running example does not include a precondition. A simple way to include one would be to change the type of start from nat to int and to add requires 0 <= start.

6.I mentioned that tool tips in the Dafny IDEs show you what decreases clause Dafny picks. As of this writing, such a tool tip is shown only when the body of the function/method/lemma includes an explicit recursive call. In particular, if the body of the lemma is empty, there is no tool tip that shows you what decreases clause is generated.