Abstract. This note proves termination for a common situation of mutually recursive functions.
After that, it takes an arduous journey to turn an aggregating set operation
into compiled code, which gives an opportunity to showcase Dafny's
function-by-method
construct.
A nonempty prefix-closed set S
of strings can be stored as a
DAG where each edge is labeled with one character.
The presence in S
of a string L
is represented by the presence of a path from the root of the DAG through
edges whose labels, in order, spell out L
.
For example, a DAG that represents the set {"", "a", "ab", "b"}
is
Node (root)
/ \
a b
/ \
Node |
\ |
b |
\ /
Node
We declare the data structure for storing such sets of strings as follows:
datatype StringSet = Node(children: map<char, StringSet>)
predicate In(s: string, tree: StringSet) {
s == "" ||
var ch := s[0];
ch in tree.children.Keys &&
In(s[1..], tree.children[ch])
}
lemma Examples() {
var tree :=
var empty := Node(map[]);
var n := Node(map['b' := empty]);
Node(map['a' := n, 'b' := empty]);
assert In("", tree);
assert In("ab", tree);
assert !In("xyz", tree);
}
Let's write a function that computes the length of the longest string in the set.
We'll do it using two mutually recursive functions (and a helper
function Maximum
).
function MaxLen(tree: StringSet): nat {
if tree.children == map[] then
0
else
1 + MaxLenForest(tree.children.Values)
}
function MaxLenForest(trees: set<StringSet>): nat {
if trees == {} then
0
else
var t :| t in trees;
var m := MaxLen(t);
var n := MaxLenForest(trees - {t});
Maximum(m, n)
}
function Maximum(x: int, y: int): int {
if x < y then y else x
}
But herein lies a problem: proving termination. As it stands, the verifier
complains that it cannot prove termination for the call from MaxLen
to
MaxLenForest
or for the call from MaxLenForest
to MaxLen
. (But it is able to
prove termination for the recursive call to MaxLenForest
.)
Let's remind ourselves of what the decreases
clauses for the two functions are,
since we didn't write them explicitly and thus get Dafny's defaults. If you
hover over the functions in the IDE, you'll discover that the decreases
clauses that Dafny used are the following:
function MaxLen(tree: StringSet): nat
decreases tree
function MaxLenForest(trees: set<StringSet>): nat
decreases trees
Termination for the recursive call of MaxLenForest
verifies, because
trees - {t}
is smaller than tree
. But for the other two calls, it's
no wonder that termination cannot be proved, since Dafny's built-in
well-founded order does not relate datatypes (here, StringSet
) and
sets.
If we could arrange for the decreases
clauses to have the same
type, or at least that the types of one is a prefix of the types of the
other, we may be able to make progress. This can be done by thinking of
MaxLenForest
as operating in “the context of” an enclosing tree, that is,
the “parent” tree of the forest. Adding such a parent
parameter to
MaxLenForest
and, for the purpose of this presentation, showing the
default decreases
clauses explicitly, we get:
function MaxLen(tree: StringSet): nat
decreases tree
{
if tree.children == map[] then
0
else
1 + MaxLenForest(tree, tree.children.Values)
}
function MaxLenForest(parent: StringSet, trees: set<StringSet>): nat
decreases parent, trees
{
if trees == {} then
0
else
var t :| t in trees;
var m := MaxLen(t);
var n := MaxLenForest(parent, trees - {t});
Maximum(m, n)
}
This is a little better, because now the verifier can prove the
termination of the call from MaxLen
to MaxLenForest
. The reason it
can do that is that Dafny's lexicographic ordering considers
the 2-tuple tree, tree.children.Values
to be smaller than the
1-tuple tree
.
But we still need to prove termination of the call from MaxLenForest
back to MaxLen
. We'd like to argue that this call terminates because t
is
“smaller than”—that is, structurally included in—the enclosing parent.
But this information is not spelled out in the program. To make the
information available,
we need to write a precondition for MaxLenForest
that says that
the forest in enclosed in parent
. Dafny allows operator <
to be
used with datatypes, with the meaning of “structurally included in”.
function MaxLen(tree: StringSet): nat
decreases tree
{
if tree.children == map[] then
0
else
1 + MaxLenForest(tree, tree.children.Values)
}
function MaxLenForest(parent: StringSet, trees: set<StringSet>): nat
requires forall t :: t in trees ==> t < parent
decreases parent, trees
{
if trees == {} then
0
else
var t :| t in trees;
var m := MaxLen(t);
var n := MaxLenForest(parent, trees - {t});
Maximum(m, n)
}
With that precondition, the verifier completes the proof of the program.
So, although Dafny's well-founded order does not relate datatypes and sets, even for a set that is structurally included in a datatype value, the verifier does know that the elements of that set (assuming the type of the elements is a datatype) are structurally included in the parent. In symbols, the well-founded order does not let you prove
tree.children < tree
but it does let you prove
tree.children[i] < tree
for any i
in range. This is why “the parent trick” is an idiom that
lets you prove termination for these sorts of mutually recursive
functions.
In summary, the need for the parent trick came about because we had
two mutually recursive functions. This meant that we needed to “remember”
the context in which the MaxLenForest
function is called, which is
done by passing the extra parameter. If MaxLenForest
somehow were
written as just one recursive function, then we would still use Dafny's
knowledge that a datatype value d
inside the set inside a datatype
value D
is structurally included in D
(and thus d < D
), but we
wouldn't need to apply the parent trick to pass the context as a parameter.
I will show such an example below (see function MaxLen
in
Section 6).
First, in presenting the program above, I explicitly showed the decreases
clauses. But if you list the parameters in the order that I did, then the
explicit decreases
clause will coincide with Dafny's defaults. So, you
can omit the decreases
clauses and the verifier will still be able to
prove the program.
Second, as I wrote them, the functions are ghost, so they are erased by
the compiler and not available at run time. If you want the functions to
be available at run time, you might be bothered by having to pass the
“parent” parameter along everywhere. This is easily solved by then marking
the “parent” parameter as ghost
. This is possible, because parent
is
not used by the function, other than for its proof, so it's fine if the
compiler erases the parameter.
If you try doing what I just said in the two remarks, the functions will be declared as follows:
function method MaxLen(tree: StringSet): nat {
if tree.children == map[] then
0
else
1 + MaxLenForest(tree, tree.children.Values)
}
function method MaxLenForest(ghost parent: StringSet, trees: set<StringSet>): nat
requires forall t :: t in trees ==> t < parent
{
if trees == {} then
0
else
var t :| t in trees;
var m := MaxLen(t);
var n := MaxLenForest(parent, trees - {t});
Maximum(m, n)
}
but you'll notice an error:
var t :| t in trees;
^ here
Error: to be compilable, the value of a let-such-that expression must be
uniquely determined
This is a completely different problem than the termination problem we
worked to solve above. The problem is that the choice of a t
that satisfies
t in trees
is not unique, as is required for compiling the let-such-that expression.
The reasons for this restriction are explained in a different
paper [0].
If you don't care about compiling MaxLen
, you can stop reading now. Otherwise,
stretch your legs, get more coffee, and keep reading.
To make the choice of t
unique, we need to strengthen the constraint in the
let-such-that expression. For example, we might want to pick the “smallest” of the
values in trees
. But what does “smallest” mean among a set of StringSet
trees?
An easier way out is to change the program to pass the whole map tree.children
to MaxLenForest
, rather than just the children trees themselves (tree.childre.Values
).
Then, we can pick the smallest among the labels, which—since our labels have
type char
—is easy to do. Here's what we get:
function method MaxLen(tree: StringSet): nat {
if tree.children == map[] then
0
else
1 + MaxLenForest(tree, tree.children)
}
function method MaxLenForest(ghost parent: StringSet,
children: map<char, StringSet>): nat
requires forall lbl :: lbl in children.Keys ==> children[lbl] < parent
{
if children == map[] then
0
else
var lbl :|
lbl in children.Keys &&
forall lbl' :: lbl' in children.Keys ==> lbl <= lbl';
var m := MaxLen(children[lbl]);
var n := MaxLenForest(parent, children - {lbl});
Maximum(m, n)
}
Note that the expression children - {lbl}
is a map-domain subtraction. That is,
it denotes the map that is like children
, except it doesn't have the key lbl
.
The additional constraint makes the choice unique. However, now that we've
strengthened it, the verifier is no longer convinced there is any choice for
lbl
. To fix this problem, we need to demonstrate that every nonempty set of
characters has a smallest element. The easiest way to conduct such a demonstration
is to compute it:
function PickSmallest(s: set<char>): (ch: char)
requires s != {}
ensures ch in s && forall ch' :: ch' in s ==> ch <= ch'
{
var ch :| ch in s;
if ch' :| ch' in s && ch' < ch then
var s' := s - {ch};
assert s == s' + {ch};
PickSmallest(s')
else
ch
}
“But wait!”, you say. "How come this function gets away with the weak let-such-that
constraint that was a problem in MaxLenForest
?" It's because PickSmallest
is
a ghost function, so its let-such-that expression does not have the uniqueness
requirement (see [0]).
Alright, then, so our final step is to use function PickSmallest
just before the let-such-that expression in MaxLenForest
:
function method MaxLenForest(ghost parent: StringSet,
children: map<char, StringSet>): nat
requires forall lbl :: lbl in children.Keys ==> children[lbl] < parent
{
if children == map[] then
0
else
ghost var smallest := PickSmallest(children.Keys);
var lbl :|
lbl in children.Keys &&
forall lbl' :: lbl' in children.Keys ==> lbl <= lbl';
var m := MaxLen(children[lbl]);
var n := MaxLenForest(parent, children - {lbl});
Maximum(m, n)
}
After putting the result of PickSmallest
into a ghost variable, the verifier has
all the evidence it needs for the existence of a value of lbl
. So, this concludes
our program.
The running example has already showed several techniques and features in Dafny.
But in our final program above, we still have a rather big and unsatisfying
let-such-that constraint. The mandate that we have to pick a unique element from
the set seems unnecessarily harsh, since all we're trying to do is compute the
maximum of a set, and the maximum will be the same regardless of which order we
consider the elements in. So, what we'd like to do is use nondeterminism inside
MaxLenForest
, as long as we can prove that the nondeterminism does not affect
the result. And Dafny has just the feature for this: function
-by
-method
declarations.
To make a reusable function, let's define a general function Max
as follows:
function method Max(s: set<int>): int
requires s != {}
ensures Max(s) in s
ensures forall z :: z in s ==> z <= Max(s)
By constructing the set of elements we want to take the maximum of, we can
use Max
in MaxLen
:
function method MaxLen(tree: StringSet): nat {
if tree.children == map[] then
0
else
var s := set t | t in tree.children.Values :: MaxLen(t);
assert s != {} by {
assert forall lbl ::
lbl in tree.children.Keys ==> MaxLen(tree.children[lbl]) in s;
}
1 + Max(s)
}
As you see here, the verifier needed help in proving that s
is nonempty.
This proof obligation is addressed by asserting that, for every label in (the
known to be nonempty) tree.children
, s
contains the MaxLen
of the corresponding
value. (Since MaxLen
is now recursive and doesn't go through a mutually recursive
function like MaxLenForest
, we don't need the parent trick. That is, there's
no longer a need to pass any additional context parameter.)
From now on, we'll focus just on Max
. It would be nice to write its body as
var x :| x in s;
if s == {x} then
x
else
var s' := s - {x};
assert s == s' + {x};
var y := Max(s');
Maximum(x, y)
Alas, this is not allowed, because the compiled let-such-that does not uniquely
determine the value for x
. This is the exactly the problem that we started with
and want to steer around!
Function-by-method to the rescue.
Say what?! A function-by-method
is a combination of a function and method.
The body of the function part of a function-by-method
is a ghost
expression, but this expression acts just as the specification
for what value is to be returned. The body of the method part of the
function-by-method
then gives a compilable method body for computing the value.
Using a bit of program text, the basic idea is to declare a function-by-method
like
function F(x: X): (y: Y)
requires Pre(x)
ensures Post(x, y)
{
Expr;
} by method {
MBody;
}
where Expr
is a ghost expression (the “function part”) and
MBody
is a compilable statement list (the “method part”).
To explain the meaning of this function-by-method
, it's helpful to look at
it as two declarations:
function F(x: X): (y: Y)
requires Pre(x)
ensures Post(x, y)
{
Expr;
}
method _F(x: X) returns (y: Y)
requires Pre(x)
ensures y == F(x)
{
MBody;
}
Now, Dafny arranges that every use of the original function-by-method
in a ghost context calls ghost function F
, and that every use of the
function-by-method
is a non-ghost context calls the compiled method _F
.
Note that any ensures
clause on the original function-by-method
is a
proof obligation of the function part. The postcondition of the method part
is y == F(x)
. (I'll come back to this point later.)
Using a function-by-method
declaration, we write Max
as follows:
function Max(s: set<int>): (m: int)
requires s != {}
ensures m in s && forall z :: z in s ==> z <= m
{
var x :| x in s;
if s == {x} then
x
else
var s' := s - {x};
assert s == s' + {x};
var y := Max(s');
Maximum(x, y)
} by method {
m :| m in s;
var r := s - {m};
while r != {}
invariant r < s
invariant m in s && forall z :: z in s - r ==> z <= m
{
var x :| x in r;
assert forall z :: z in s - (r - {x}) ==> z in s - r || z == x;
r := r - {x};
if m < x {
m := x;
}
}
assert s - {} == s;
}
This function-by-method verifies as given. However, the reason it verifies is subtle
and may not be what you expect (especially if this is the first time you see
a function-by-method
). Let me explain the reason and then show a more typical
proof of a function-by-method
.
The point of the method part of a function-by-method
is to provide a method
implementation that returns exactly the same value as the given function body.
For this reason, the postcondition that the method body has to satisfy is
m == Max(s)
. This is a very precise postcondition.
The ensures
clause declared in the function-by-method
applies to the
function part. It is usually much less precise; in fact, for most functions,
this ensures
clause is omitted, since the body of the function transparently
says what the result value is.
In general, just because the method part of a function-by-method
happens to satisfy the ensures
clause of the function-by-method
does not
mean the method part is correct. But in our Max
example, the two coincide, because
the ensures
clause I gave for the function-by-method
uniquely determines m
.
Thus, any method body that satisfies this ensures
clause also satisfies m == Max(s)
.
(You can confirm this by deleting m in s
from the ensures
clause. This will
cause the method part to fail to verify, while the function part still verifies.)
A more generally applicable pattern for proving the method part of a recursive
makes use of the function in the loop invariant. To illustrate this, I will remove
the ensures
clause of Max
. Consequently, the iterative method body
needs to establish that it computes exactly what the recursion function body
does.
The idea is to use a body like this:
m :| m in s;
var r := {m};
while r != s
invariant {} != r <= s
invariant m == Max(r)
{
var x :| x in s - r;
r := r + {x};
if m < x {
m := x;
}
}
This loops maintains the invariant m == Max(r)
while enlarging r
until it
becomes s
. Doing this proof is tricky, because of two issues. Both of these
issues have standard solutions, which are really good to know. The two issues
and solutions are explained for a simpler example in another Dafny Power User
note [1], but it seems worthwhile to show them here as well.
The first issue is that the let-such-that expression is like a function, but by
Dafny semantics it may be a different function for each textual occurrence of :|
(see [0]).
It is easy to become confused and frustrated if, without realizing it, you're
using more than one :|
function. Therefore, let's make sure there is only one
textual occurrence of this operator. We do that by putting it inside
a named function:
function Pick(s: set<int>): int
requires s != {}
{
var x :| x in s; x
}
With this function, we write the function part of the function-by-method
Max
as
var x := Pick(s);
if s == {x} then
x
else
Maximum(Max(s - {x}), x)
The second issue is that we will need to help the verifier along in the proof
of invariance for the loop. The reason is that the next element that the loop
adds to the growing set r
may not be the same element that the Pick
function
would choose when going in the opposite direction. Specifically, if the loop
adds x
to r
as it's computing Max(r + {x})
, there's no reason to believe
that the definition of Max(r + {x})
picks x
and recurses on Max(r)
.
(Again, for a simpler example and better motivation, see [1].)
This lemma is tricky to prove, even if you've written similar proofs several
times. An additional complication for our function Max
is that Max
requires
its argument to be nonempty. Here is the lemma and a proof:
lemma MaxOfOneMore(t: set<int>, x: int)
requires t != {} && x !in t
ensures Max(t + {x}) == Maximum(Max(t), x)
decreases |t|
{
var u := t + {x};
var z := Pick(u);
var u'x, u'z := u - {x}, u - {z};
assert Max(u) == Maximum(Max(u'z), z);
if z == x {
// Straight from the definition of Max
} else if t == {z} {
assert u'z == {x} && u'x == {z};
// This is known about Max on singletons:
assert Max(u'z) == x;
assert Max(u'x) == z;
// So, regardless of which of x and z gets picked first, the result is the same
assert Max(u) == Maximum(x, z) == Maximum(z, x);
} else {
var u'z'x := u'z - {x};
calc {
Maximum(Max(u'z), z);
== { assert u'z == (u'z'x) + {x}; }
Maximum(Max((u'z'x) + {x}), z);
== { MaxOfOneMore(u'z'x, x); }
Maximum(Maximum(Max(u'z'x), x), z);
== // Maximum is commutative and associative
Maximum(Maximum(Max(u'z'x), z), x);
== { MaxOfOneMore(u'z'x, z); }
Maximum(Max(u'z'x + {z}), x);
== { assert u'z'x + {z} == u'x == t; }
Maximum(Max(t), x);
}
}
}
That's a mouthful. But once you have the lemma, using it once in the loop
body finishes the proof of our function-by-method
:
function Max(s: set<int>): (m: int)
requires s != {}
{
var x := Pick(s);
if s == {x} then
x
else
Maximum(Max(s - {x}), x)
} by method {
m :| m in s;
var r := {m};
while r != s
invariant {} != r <= s
invariant m == Max(r)
{
var x :| x in s - r;
MaxOfOneMore(r, x);
r := r + {x};
if m < x {
m := x;
}
}
}
A final remark. It looks as if Max
uses the lemma MaxOfOneMore
, and the lemma
uses Max
. Doesn't this mutual recursion mean we have to prove termination?
No, because there's isn't actually any mutual recursion in this example. To understand
why, remember that the two parts of a function-by-method
are treated as
if they were one function and one method (see Section 8).
This function and method are different points in the call graph, where calls to
Max
from ghost contexts go to the function part and calls to Max
from compiled
contexts go to the method part.
Therefore, we have that the following call-graph edges:
Max (function) --> Max (function)
MaxOfOneMore --> MaxOfOneMore, Max (function)
Max (method) --> MaxOfOneMore, Max (function)
As you can see from these, the function part of Max
is recursive and the lemma is
recursive, but there is no mutual recursion.
I started this note with the innocent problem of how to prove the termination of two
mutually recursive functions. As the solution, I showed the Parent Trick.
I then changed the code to make it compilable, which
introduced proof obligations to show certain choices to be deterministic. This can
be better solved using a function-by-method
, so I gave a brief introduction
to Dafny's function-by-method
declaration and used it to present two solutions.
The more straightforward solution worked for the example at hand, whereas
the second solution shows a more typical pattern for such method implementations.
I'm grateful to Mikaël Mayer for asking a question about these mutually recursive
functions, which propelled me to write this note, and for his helpful comments
on the write-up. The function-by-method
construct
was designed as a direct outgrowth of some collaborative work with Daniel Matichuk,
Olivier Savary Belanger, and Mike Dodds, where as a step toward verified compilation
we were considering how to implement immutable things with mutable things.