Dafny Power User:
Case study of definitions, proofs, algorithm correctness: GCD
K. Rustan M. Leino
Manuscript KRML 279, 22 June 2021

Abstract. The purpose of this note is to show an example development of a program, introducing definitions that support the specification of the program, stating and proving lemmas about those definitions, and using the lemmas in proving the correctness of the program. Euclid's subtractive algorithm for computing the greatest common divisor is used as the example.

0. Problem description

Let's specify and verify an algorithm to compute the greatest common divisor (GCD) of two numbers. For the specification, we will introduce a function whose definition is intended to be “obviously correct”. We won't use that function to compute the GCD, because the “obviously correct” definition would give really inefficient code if compiled directly. Instead, we'll use Euclid's algorithm for computing the value that the “obviously correct” function defines. We'll prove that the algorithm does compute that value.

In essence, we'll have

function Gcd(x: pos, y: pos): pos

method EuclidGcd(x: pos, y: pos) returns (gcd: pos)
  ensures gcd == Gcd(x, y)

where pos denotes the type of positive integers.

1. Positive integers

Everything we do will concern (strictly) positive integers. Dafny builds in a type for natural numbers (that is, non-negative integers), but not positive integers. We can define these using a subset type in Dafny:

type pos = x | 1 <= x  // error: cannot find witness to show type is inhabited

Dafny wants to know if this type has any inhabitants, and it doesn't figure that out by itself. This doesn't matter for our example, but we do need to address the error we're getting. To do that, we supply a witness clause:

type pos = x | 1 <= x witness 1

If we really didn't care to exhibit a witness that shows the type to be nonempty, we could have written witness *, which causes Dafny to treat the type pos as possibly empty. For our example, you can do either, but since it's easy to supply an actual witness, we do that.

In the sequel, I will just say number when I mean positive integer.

2. Factors

The divisors of a number are its factors. We define a predicate that says what it means for a number p to be a factor of a number x:

predicate IsFactor(p: pos, x: pos) {
  exists q :: p * q == x
}

In words, p is a factor of x iff there is a multiplicand q such that x is the product p * q.

To talk about all the factors of a number, we introduce a function Factors that we define using a set comprehension. A straightforward definition would be:

function Factors(x: pos): set<pos> {
  set p: pos | IsFactor(p, x)  // error: set constructed must be finite
}

A set in Dafny denotes a finite set (for possibly infinite sets, use iset). In this case, Dafny doesn't immediately see that the comprehension would generate a finite set. Luckily, it is simple for us to add another conjunct to the comprehension that lets Dafny see that the set is finite:

function Factors(x: pos): set<pos> {
  set p: pos | p <= x && IsFactor(p, x)
}

In adding this conjunct, there's a risk we're making a mistake, because perhaps the new set doesn't include all the factors we'd like. Our conjunct p <= x certainly looks innocent enough, but why not prove that adding it does not accidentally leave out any factors. We can do that by proving that this set has the same elements as the possibly infinite set:

lemma FactorsHasAllFactors(x: pos)
  ensures forall n :: n in Factors(x) <==> n in iset p: pos | IsFactor(p, x)
{
}

The proof of a lemma is given in the lemma's body (that is, between the pair of curly braces that follow the lemma's specification). In this case, the proof is empty, because Dafny proves the lemma automatically without any further help from us.

Before leaving the definition of factors, let's state and prove two simple lemmas. These lemmas act as sanity checks on our definitions, and they will also be helpful later in our development.

lemma FactorsContains1(x: pos)
  ensures 1 in Factors(x)
{
  assert 1 * x == x;
}

lemma FactorsContainsSelf(x: pos)
  ensures x in Factors(x)
{
  assert x * 1 == x;
}

To prove that a number n (here, 1 or x) is in the set Factors(x), we need to establish that n satisfies the condition in the set comprehension (in the body of Factors(x)). The conjunct n <= x is proved automatically, but the conjunct IsFactor(n, x) is not. By the definition of IsFactor, we need to prove the existence of a multiplicand q for which n * q == x. Such a proof typically involves demonstrating a witness, which is what the assert statements in the two lemmas above do. From those assertions, the verifier completes the proofs of the lemmas.

3. Max of a set

To talk about the greatest common divisor, we need a function that picks out the largest number in a set. A somewhat declarative way to do that is to use the such-that construct. In particular, for a set s, the let-such-that expression

var x :| x in s && forall y :: y in s ==> y <= x;
x

says to bind x to a value satisfying the condition x in s && forall y :: y in s ==> y <= x, and then return the value of the expression x. The condition says that x is in the set s, and that, among all the numbers in s, x is the largest.

Use of a such-that construct comes with a proof obligation that a value satisfying the given condition exists. If we require s to be nonempty, then the x in s condition is easily satisfied, but it takes more work to convince the verifier that a value for x satisfies the quantifier. We'll define a lemma for that purpose. We'll name the lemma MaxExists and then we can write our function Max as follows:

function Max(s: set<pos>): pos
  requires s != {}
{
  MaxExists(s);
  var x :| x in s && forall y :: y in s ==> y <= x;
  x
}

lemma MaxExists(s: set<pos>)
  requires s != {}
  ensures exists x :: x in s && forall y :: y in s ==> y <= x

Dafny uses the lemma invocation MaxExists(s) in establishing the well-formedness of the subsequent expression. Note, by the way, that Max (and lemma MaxExists, too) has a precondition s != {} (keyword requires). This means that the function (and the lemma, too) can only be called for a nonempty set.

Alright, so then how do we prove MaxExists? The most straightforward way to prove the existence of such an x is to compute an x satisfying the desired properties. We'll introduce another function for computing the max, call it FindMax, and use it in the proof of the MaxExists lemma. Function FindMax will be implemented recursively.

lemma MaxExists(s: set<pos>)
  requires s != {}
  ensures exists x :: x in s && forall y :: y in s ==> y <= x
{
  var x := FindMax(s);
}
 
function FindMax(s: set<pos>): pos
  requires s != {}
  ensures max in s && forall y :: y in s ==> y <= FindMax(s)

Aren't we going in circles now? Yes, in some ways we're making life more difficult than necessary. If we have FindMax, we don't need Max, and then we also don't need lemma MaxExists. Indeed, we could have written and used just FindMax and never introduced Max or MaxExists. But for this example, I wanted the primary definitions to be as clear as possible without concern for how things are computed. In that sense, the body of Max is more declarative than the body we are about to write for FindMax.

Here is the full definition of FindMax:

function FindMax(s: set<pos>): (max: pos)
  requires s != {}
  ensures max in s && forall y :: y in s ==> y <= max
{
  var x :| x in s;
  if s == {x} then
    x
  else
    var s' := s - {x};
    assert s == s' + {x};
    var y := FindMax(s');
    if x < y then y else x
}

When a function postcondition wants to mention the result value of the function, you can just use the function itself, with the arguments given: FindMax(s). I did this when I first introduced FindMax above. In the full definition, I show an alternative way of doing this, which is to introduce a name for the result value: max. That name is usable only in the postcondition of the function. Many times, introducing such a name for the result leads to a shorter specification.

4. GCD

With the functions we defined, we're now ready to define GCD. Take the factors of x and the factors of y, intersect them to get their common factors, and then take the maximum thereof:

function Gcd(x: pos, y: pos): pos {
  var common := Factors(x) * Factors(y);
  Max(common)  // error: common must be nonempty
}

For this simple definition, the verifier reports a precondition violation, because it's unable to prove that common satisfies the precondition of Max. We know that common is nonempty, because we know that 1 is a common factor of any two numbers x and y. To bring that information to the verifier's attention, we write an assertion:

function Gcd(x: pos, y: pos): pos {
  var common := Factors(x) * Factors(y);
  assert 1 in common;  // error: assertion violation
  Max(common)
}

Alas, the verifier is not able to prove this assertion. But we can see that the presence of the assertion is enough to eliminate the precondition violation. So, we now focus on proving the assertion. This is where we use the FactorsContains1 lemma we introduced earlier. Two calls to that lemma will prove the assertion, which is best captured in the program text by changing the assert to an assert by and giving the proof of the assertion in the by block:

function Gcd(x: pos, y: pos): pos {
  var common := Factors(x) * Factors(y);
  assert 1 in common by {
    FactorsContains1(x);
    FactorsContains1(y);
  }
  Max(common)
}

That does it! We have now given a well-formed definition of Gcd.

5. Properties of GCD

We'll prove three properties of our Gcd function—call them sanity checks, if you will. (We'll need a fourth property as well, but I'll introduce it later.)

As a first sanity check, we expect Gcd(x, y) to return a number that is a factor of both x and y. Furthermore, among all the numbers that are factors of both x and y, what Gcd(x, y) returns should be the largest.

lemma AboutGcd(x: pos, y: pos)
  ensures IsFactor(Gcd(x, y), x)
  ensures IsFactor(Gcd(x, y), y)
  ensures forall p: pos :: IsFactor(p, x) && IsFactor(p, y) ==> p <= Gcd(x, y)

The first two postconditions of this lemma are proved automatically, but not the third. How do we go about proving that a universal quantifier (that is, a forall expression) holds? We use Dafny's forall statement. When used in a proof, the forall statement corresponds to the “universal introduction” rule in logic. This is the rule that says "if you want to prove forall x :: P(x), then all you need to do is pick an arbitrary x and prove P(x) for that x.

We introduce the forall statement like this:

  forall p: pos | IsFactor(p, x) && IsFactor(p, y)
    ensures p <= Gcd(x, y)

To prove it, we only need to bring up the fact that p, which is a factor of both x and y, is in the intersection of factors of x and y. The verifier is then able to complete the proof.

lemma AboutGcd(x: pos, y: pos)
  ensures IsFactor(Gcd(x, y), x)
  ensures IsFactor(Gcd(x, y), y)
  ensures forall p: pos :: IsFactor(p, x) && IsFactor(p, y) ==> p <= Gcd(x, y)
{
  forall p: pos | IsFactor(p, x) && IsFactor(p, y)
    ensures p <= Gcd(x, y)
  {
    assert p in Factors(x) * Factors(y);
  }
}

The Dafny verifier often needs help with properties like this. To prove them, just write them as an assertion. In other words, the verifier knows this property about set intersection, but it isn't creative enough the bring that property into the proof. By asserting the property, we're asking the verifier to confirm the property (which it's able to do) and then to use that property in the rest of the proof (which in this case completes the proof).

As a second sanity check, we prove that Gcd is symmetric.

lemma GcdSymmetric(x: pos, y: pos)
  ensures Gcd(x, y) == Gcd(y, x)
{
  assert Factors(x) * Factors(y) == Factors(y) * Factors(x);
}

The proof comes down to the fact that set intersection is symmetric, which we bring to the verifier's attention by writing it as a lemma.

As a third sanity check, we prove that Gcd is idempotent. That is, if you give it the same argument twice, it will return that argument.

lemma GcdIdempotent(x: pos)
  ensures Gcd(x, x) == x
{
  FactorsContainsSelf(x);
  assert x in Factors(x) * Factors(x);
}

The proof of this property comes down to the fact that set intersection is idempotent, as well as the property that a number is one of its own factors.

6. Euclid's algorithm

Euclid's subtractive algorithm for finding the GCD of two numbers is to repeatedly subtract the smaller of the numbers from the larger until they are both equal. Each such subtraction preserves the GCD—an invariant that we will need to prove—and the GCD of two equal numbers is that number—which we established by lemma GcdIdempotent above.

The algorithm, with the loop invariant and idempotence lemma, thus looks like this:

method EuclidGcd(X: pos, Y: pos) returns (gcd: pos)
  ensures gcd == Gcd(X, Y)
{
  var x, y := X, Y;
  while
    invariant Gcd(x, y) == Gcd(X, Y)  // error: invariant not maintained
    decreases x + y
  {
    case x < y =>
      y := y - x;
    case y < x =>
      x := x - y;
  }
  GcdIdempotent(x);
  return x;
}

This method uses a while-case loop. (If you're familiar with Dijkstra's guarded commands [1], this is the do-od loop.) Each iteration of this loop chooses one of the cases to execute. The case chosen must be one whose guard condition evaluates to true (and if the guards of several cases evaluate to true, the loop chooses arbitrarily between those cases) . If no such guard condition evaluates to true, then the loop stops iterating. The loop in EuclidGcd could of course be an ordinary while x != y loop, but the symmetry of the two cases afforded by the while-case loop makes it aesthetically pleasing.

In addition to a loop invariant, the loop also declares a termination metric (keyword decreases). Proving that the loop terminates comes down to proving that each iteration makes the value of the termination metric decrease (in Dafny's built-in well-founded order on integers).

The EuclidGcd method above does not verify, because the verifier is unable to prove that each iteration maintains the loop invariant. For this, we need the fourth property of GCD that I alluded to above:

lemma GcdSubtract(x: pos, y: pos)
  requires x < y
  ensures Gcd(x, y) == Gcd(x, y - x)

Using this lemma and the symmetry of GCD, we can complete the proof of Gcd:

method EuclidGcd(X: pos, Y: pos) returns (gcd: pos)
  ensures gcd == Gcd(X, Y)
{
  var x, y := X, Y;
  while
    invariant Gcd(x, y) == Gcd(X, Y)
    decreases x + y
  {
    case x < y =>
      GcdSubtract(x, y);
      y := y - x;
    case y < x =>
      calc {
        Gcd(x, y);
      ==  { GcdSymmetric(x, y); }
        Gcd(y, x);
      ==  { GcdSubtract(y, x); }
        Gcd(y, x - y);
      ==  { GcdSymmetric(y, x - y); }
        Gcd(x - y, y);
      }
      x := x - y;
  }
  GcdIdempotent(x);
  return x;
}

This version adds a call to GcdSubtract in the first branch of the loop. In the second branch of the loop, the proof calculation uses equality-preserving steps to transform the expression Gcd(x, y) into Gcd(x - y, y). The hints given in the steps appeal to the GcdSubtract and GcdSymmetric lemmas.

7. GCD subtract property

The proof of GcdSubtract is more involved than any of the other definitions and lemmas in this case study.

The proof starts by introducing a name for Gcd(x, y):

  var p := Gcd(x, y);

We know from the definition of Gcd that p is a factor of both x and y, and we can prove that p is also a factor of y - x:

  assert IsFactor(p, y - x) by {
    var a :| p * a == x;
    var b :| p * b == y;
    calc {
      y - x;
    ==
      p * b - p * a;
    ==
      p * (b - a);
    }
  }

To prove IsFactor(p, y - x), we introduce names a and b for the multiplicands that the definition of IsFactor tells us exist (since p is a factor of both x and y). A simple calculation using basic arithmetic steps then gives us that p can be multiplied another number (namely, b - a) to get y - x.

Since p is a factor of both x and y - x, we have that it's in the common factors of x and y - x. We write two lines to make sure the verifier is on board with this property, phrased in terms of set intersection:

  var common := Factors(x) * Factors(y - x);
  assert p in common;

Lastly, we need to show that p is the largest such common factor. We state this property using a forall statement:

  forall q | q in common
    ensures q <= p

To prove this property, we fill in the body of the forall statement. For q, which denotes an arbitrary number in the set common, we give names to the multiplicands that yield the products x and y - x, respectively:

  {
    var a :| q * a == x;
    var b :| q * b == y - x;

Using simple arithmetic steps, we can use a proof calculation to establish that q is also a factor of y:

    assert IsFactor(q, y) by {
      calc {
        y;
      ==
        x + (y - x);
      ==
        q * a + q * b;
      ==
        q * (a + b);
      }
    }

So, since q is a factor of both x and y, the definition of Gcd(x, y) tells us q <= Gcd(x, y). By giving yet another hint about set intersection:

    assert q in Factors(x) * Factors(y);
  }

the verifier completes the proof.

8. More symmetry

While we now have a full proof of the GCD algorithm, your aesthetic sense may be bothered by the asymmetry in how we supplied the proofs in the two cases of the loop. Since the while-case loop affords us a symmetric rendition of the two cases, it would be nice if we could make the proofs of the two cases more similar as well.

There are several ways we can improve on this situation. One is to refactor the proof calculation of the second case into its own lemma. Then, then each case would have one line of proof.

Just for fun, let me describe another “trick” to make the two cases (not entirely symmetric, but at least) more similar. The trick is to make the (already asymmetric) GcdSubtract lemma also swap the arguments to Gcd. We rewrite it into:

lemma GcdSubtract(x: pos, y: pos)
  requires x < y
  ensures Gcd(y, x) == Gcd(x, y - x)
{
  GcdSymmetric(x, y);
  // ... the proof continues as before
}

Note that the left-hand side of the postcondition is now Gcd(y, x), not Gcd(x, y) as it had been in our first version of this lemma. The only change required for the proof is to appeal to the symmetry of Gcd, which we can do by one lemma call immediately inside the body of the lemma. This gives us a proof of our reformulated GcdSubtract lemma.

With this reformulation, we can simplify the second case of EuclidGcd, at the expense of making the first case more complicated. Essentially, we're moving one lemma call from the second case to the first, so instead of having 1 and 3 lemma calls in the two respective cases, we'll have 2 and 2.

    case x < y =>
      GcdSubtract(x, y);
      GcdSymmetric(y, x);
      y := y - x;
    case y < x =>
      GcdSymmetric(x - y, y);
      GcdSubtract(y, x);
      x := x - y;

It's not entirely symmetric, but perhaps you still like it. Or perhaps you'll remember this trick for another situation where the shoe fits perfectly. If nothing else, you can stick with the first complete proof we developed above.

9. Main

If the proof itself doesn't satisfy you and still want to see the algorithm in action, you can write a Main method and compile and run the program. (A simple way of doing that from the command line is to use the /compile:3 option with the dafny tool. It will verify and then run the program.)

Here is a sample Main:

method Main() {
  Test(15, 9);
  Test(14, 22);
  Test(371, 1);
  Test(1, 2);
  Test(1, 1);
  Test(13, 13);
  Test(60, 60);
}

method Test(x: pos, y: pos) {
  var gcd := EuclidGcd(x, y);
  print x, " gcd ", y, "  =  ", gcd, "\n";
}

10. Conclusions

This case study shows how to define a domain of interest (here, factors of numbers, leading up to the definition of GCD), state and prove some lemmas about those definitions, and then use these in the proof of a small program.

The program, including all lemmas and other proof obligations associated with the definitions, takes the Dafny verifier less than 3 seconds to verify. You can find the entire program in the Dafny test suite [3].

Euclid's GCD algorithm is a familiar textbook example. It's proved in different forms in various verifiers. For example, the TLA+ tutorial uses this program as an example [0]. It assumes the mathematical properties of GCD that we proved here. The gallery of Why3 programs contains a version of Euclid's GCD algorithm that uses modulo instead of subtraction with each step, which results in fewer iterations [2].

Acknowledgments

I thank Reto Kramer for suggesting this problem as a useful case study.

References

[1]Edsger W. Dijkstra. A Discipline of Programming. Prentice Hall, Englewood Cliffs, NJ, 1976. 🔎
[2]Jean-Christophe Filliâtre and Claude Marché. Greatest common divisor, using the euclidean algorithm. http://​toccata.​lri.​fr/​gallery/​gcd.​en.​html🔎