In the fifth chapter of the FFA series, Stanislav presents a slow yet entirely usable constanttime implementation of division and multiplication, and most importantly, one that fits in head. Following up on our previous exercise, we read the chapter and finally arrive at a set of challenges. Cherrypicking^{1} through them, we see:
 Prove that the provided Division and Multiplication operations, work.
This article is an exercise in exactly that. But before delving into the gory details, we must clarify what it means (from the author's point of view and in this particular context) a. to prove, and b. that something "works".
While the proof of correctness is indeed mathematical, it is not of the "formal verification" type in the sense of mechanical gargle found in today's literature. That said, we aim to import a minimal body of prior knowledge, consisting mostly of basic arithmetics and highschool algebra. We will also, when absolutely needed, introduce assumptions and as much as possible state them explicitly.
The article presents an algorithmic proof. That is, we read through the code and we manually translate it into very similar pseudocode when needed, and then we manually translate that into mathematical expressions, for the purpose of algebraic manipulation. Thus egyptian division and multiplication implementations "work" only in the sense that we carefully examine and understand them, and then in the sense that their algorithmic counterparts are led through a deductive process which then leads us to the conclusion. Sadly, we can make no strong statements about either the compiler^{2} or the hardware (architecture and implementation thereof), so we are forced to make the otherwise unwarranted assumption that they also work.
We also assume that the basic operations presented in previous FFA chapters  e.g. addition, subtraction, muxing, shifts  work by the definition of "works" in the previous paragraph. The reader is of course encouraged to not take this statement at face value and (re)examine the implementation of the operations carefully.
We divide the remainder of this article into two pieces. As both the implementation and the underlying equations for egyptian multiplication are simpler, we begin with a correctness proof for it and then later on we continue with the division operation.
Part the first: multiplication
We begin with a description and a code snippet^{3} of
FZ_Mul_Egyptian
. The partial and final results of the operation are
held in a variable (XY
) double the size of the input (X
and Y
). At
first, XY
is cleared (all bits set to zero) and a set of "slides",
XS
and YS
, are initialized with the values of X
and Y
respectively; in particular, XS
is the same size as XY
, as X
must
be repeatedly added into XY
.
Then the implementation loops over YS
, also modifying XS
:
 For each bit of Y:
for i in 1 .. FZ_Bitness(Y) loop
 If lowest bit of YSlide is 1, XSlide is added into XY
FZ_Add_Gated(X => XY, Y => XS, Sum => XY,
Gate => FZ_OddP(YS));
 XSlide := XSlide * 2
FZ_ShiftLeft(XS, XS, 1);
 YSlide := YSlide / 2
FZ_ShiftRight(YS, YS, 1);
end loop;
Then the output variables are set to XY
.
The first observation we make is that, FZ_Add_Gated
being a
conditional add operation very similar to FZ_Mux
, we can also look at
it as:
 If lowest bit of YSlide is 1, XSlide is added into XY
if FZ_OddP(YS) then
FZ_Add(X => XY, Y => XS, Sum => XY);
else
FZ_Add(X => XY, Y => FZ_Zero, Sum => XY);
end if;
where FZ_Zero
is a FZ
constant of the same size as XY
and XS
,
with all bits set to zero. Using our previous assumption about the
correctness of FZ_Add
, we can rewrite this in the following pseudocode
form:
if YSbit(1) is 1 then
XY := XY + XS;
else  YSbit(1) is 0
XY := XY + 0;
end if;
where +
here is an equivalent to FZ_Add
(and consequently we will
later assume that 
represents an equivalent to FZ_Sub
) and
YSbit(I)
represents the I
th ("eldest", least significant) bit of
YS
. We will further say that FZ_Bitness(Y) = N
, so that 1 <= I <=
N
, i.e. 1 is the least significant bit of YS
, while N is the most
significant.
Observe that XY
is updated (XS
is added to it) only when YSbit(1) =
1
. Thus we can make one (final) modification to this bit, by explicitly
introducing YSbit(1)
in the update instruction. The equivalent
(branchless!) assignment would be:
XY := XY + YSbit(1) * XS;
where *
here denotes a multiplication operation between a bit and a
FZ
, however in this case perfectly equivalent to the muxing in
FZ_Add_Gated
.
Furthermore we rewrite the shifting operations as equivalent operations
that multiply (*
) and divide (/
) by 2:
 XSlide
XS := XS * 2;
 YSlide
YS := YS / 2;
The full algorithmic translation of FZ_Mul_Egyptian
then becomes:
FZ_Mul_Egyptian(X, Y) is:
 Initialize variables
1. XY := 0;
2. XS := X;
3. YS := Y;
 Over the bits of Y
4. for I in 1 .. N loop
 Partial product
5. XY := XY + YSbit(1) * XS;
 Partial XSlide
6. XS := XS * 2;
 Partial YSlide
7. YS := YS / 2;
8. end loop;
9. Result := XY;
This version of the algorithm looks pretty good and it can help us model
the problem mathematically, but it's not quite the final version. Diana
identified that instead of using the first bit of YS
for
muxing and then shifting YS
to the right at each step, one can
optimize the execution time of FZ_Mul_Egyptian
by simply using the
I
th bit of Y
. We'll do the same here, with the mention that this
makes our algorithm smaller and it helps us more easily identify an
inductive pattern. Let's first view the modified (and
final) version of the egyptian multiplication algorithm:
FZ_Mul_Egyptian(X, Y) is:
 Initialize variables
1. XY := 0;
2. XS := X;
 Over the bits of Y
3. for I in 1 .. N loop
 Partial product
4. XY := XY + Ybit(I) * XS;
 Partial XSlide
5. XS := XS * 2;
6. end loop;
7. Result := XY;
Note the changes: a. we removed the Yslide; and b. at each step of the
loop, Ybit(I)
is used instead of YSbit(1)
.
Now we can begin to reason about the hardest part of this algorithm, the
loop. While, as described, the loop goes over the bits of Y
(i.e. they
are our "induction variable"), we can see it as the computation of two
recurrence relations, that is, of the following finite sequences. We
begin with XS
:
XS(0) = X
XS(I) = XS(I1) * 2, 1 <= I <= N
This has so far been the largest leap in logic in our article, crossing
suddenly from an algorithmic view of the problem to a mathematical one;
thus I ask the reader to sit for a moment and contemplate this leap. At
the end, the correspondence between the XS(I)
in the recurrence and
the XS
in the algorithm should hopefully become obvious: at each step,
the value of XS
is doubled, and this has an algebraic significance, as
we will see in a moment. Now we model the update of XY
similarly:
XY(0) = 0
XY(I) = XY(I1) + Ybit(I) * XS(I1), 1 <= I 1`, then `K` can be
decomposed into the sum:
K(1) * B^0 + K(2) * B^1 + K(3) * B^2 + ...
where `0 <= K(I) <B> R, Y => Divisor, Difference => R, Underflow => C);
 If C=1, subtraction underflowed, and then Divisor gets added back:
FZ_Add_Gated(X => R, Y => Divisor, Gate => C, Sum => R);
 Current resultbit is equal to NotC, i.e. 1 if Divisor 'went in'
FZ_Or_W(Q, W_Not(C));
end loop;
We take the first left shift first. Shifting QR
does (from an
arithmetic point of view) two things: it multiplies Q
by 2 and it
overflows bits of Q
into R
; the latter meaning the multiplication of
R
by 2 and adding the "newest" (most significant) bit of Q
to
R
. We can thus rewrite it as:
 Advance QR by 1 bit:
R := R * 2 + Qbit(N);
Q := Q * 2;  NB: mod 2^N!
The remaining operations in the loop are dependent upon the result of
the call to FZ_Sub
, which is an implicit R < Divisor
comparison. If
this comparison fails, then the subtraction is undone using
FZ_Add_Gated
and the least significant bit of Q
is set to zero;
else, Divisor
remains subtracted from R
and the least significant
bit of Q
is set to one. As in the case of multiplication, we rewrite
this into a pseudocode using explicit conditionals:
 Check (R  Divisor) underflow
if R < Divisor then
R := R  0;
Q := Q + 0;
else
R := R  Divisor;
Q := Q + 1;
end if;
The intuition is the exact one from long division: once (or if) enough
bits of the dividend have overflown into R
, we subtract the divisor
and we set the digit (i.e. bit) of the quotient accordingly. As with
multiplication, we can get rid of the complicated dependency on specific
bits at each step. We observe that at the first step (I = 1
) Qbit(N)
addresses the N
th bit of Dividend
, then at I = 2
the N1
th and
so on until I = N
and the addressing of the first bit of
Dividend
. Thus we make the first update of R
depend on
Dividendbit(NI+1)
instead of Qbit(N)
.
Thus the algorithmic form (or one of the forms) of FZ_IDiv
is:
FZ_IDiv(Dividend, Divisor) is:
 Initialize variables
1. Q := Dividend;
2. R := 0;
 Over the bits of Dividend
3. for I in 1 .. N loop
 Update nonconditional values of R and Q
4. R := R * 2 + Dividendbit(NI+1);
5. Q := Q * 2;
 Conditional updates of R and Q
6. if R < Divisor then
7. R := R  0;
8. Q := Q + 0;
10. else
11. R := R  Divisor;
12. Q := Q + 1;
13. end if;
14. end loop;
15. Result := Q, R;
Now comes the tricky part, that is, putting the algorithm in a
mathematical form that allows us to reason about it. At least two
aspects are not exactly easy to spot: R
and Q
suffer two updates
each, and the R
in R < Divisor
(on which the conditional updates to
Q
and R
depend) is not the same R
as in the previous iteration of
the loop. We'll try to untangle this first, by taking a
conditionalR
(CR
) which maps to the first update to R
:
 for 1 <= I <= N
CR(I) = R(I1) * 2 + Dividendbit(NI+1)
Next, since it's not obvious yet how we could get rid of the explicit
conditionals  as we did with multiplication by factoring in Ybit(.)
 we define deltas for Q
and R
, depending explicitly on
CR(I)
. That is:
 for 1 <= I <= N
DR(I) = { 0, if CR(I) < Divisor
Divisor, otherwise }
DQ(I) = { 0, if CR(I) < Divisor
1, otherwise }
The recurrence relation for R
is then:
R(0) = 0
R(I) = CR(I)  DR(I)
= R(I1) * 2 + Dividendbit(NI+1)  DR(I),
1 <= I <= N
and for Q
:
Q(0) = Dividend
Q(I) = Q(I1) * 2 + DQ(I), 1 <= I <= N
As before, these recurrences are a leap from the algorithmic to the
mathematical, so it might be completely nonobvious where the relations
come from. The reader is recommended to go through the mental gymnastics
necessary to obtain them by using pen and paper. At the end it should be
clear that Q(N)
is the resulting quotient, while R(N)
is the
resulting remainder, and furthermore we wish to show that the equation:
Dividend = Divisor * Q(N) + R(N)
holds true.
Now let's try to figure out a general form for Q(I)
first, since this
is the simplest of the two:
Q(0) = Dividend
Q(1) = Q(0) * 2 + DQ(1) = Dividend * 2 + DQ(1)
Q(2) = Q(1) * 2 + DQ(2)
= [Dividend * 2 + DQ(1)] * 2 + DQ(2)
= Dividend * 2^2 + DQ(1) * 2 + DQ(2)
 let's do one more
Q(3) = Q(2) * 2 + DQ(3)
= [Dividend * 2^2 + DQ(1) * 2 + DQ(2)] * 2 + DQ(3)
= Dividend * 2^3 + DQ(1) * 2^2 + DQ(2) * 2 + DQ(3)
...
 induction hypothesis
Q(N1) = Dividend * 2^(N1) +
DQ(1) * 2^(N2) +
DQ(2) * 2^(N3) +
... +
DQ(N2) * 2^1 +
DQ(N1) * 2^0
 induction step proof
Q(N) = Q(N1) * 2 + DQ(N)  subst. Q(N1)
= [Dividend * 2^(N1) +
DQ(1) * 2^(N2) +
DQ(2) * 2^(N3) +
... +
DQ(N2) * 2^1 +
DQ(N1) * 2^0 ]
* 2 + DQ(N)
= Dividend * 2^N +
DQ(1) * 2^(N1) +
DQ(2) * 2^(N2) +
... +
DQ(N1) * 2^1 +
DQ(N) * 2^0
 QED.
Moreover, all multiplications are modulo 2^N
(we fit the the results
in N
bits), so the term Dividend * 2^N
vanishes. We are left with a
base2 number:
Q(N) = DQ(1) * 2^(N1) + DQ(2) * 2^(N2) + ... + DQ(N) * 2^0
As for R(I)
:
R(0) = 0
R(1) = R(0) * 2 + Dividendbit(N)  DR(1)
= Dividendbit(N)  DR(1)
R(2) = R(1) * 2 + Dividendbit(N1)  DR(2)
= [Dividendbit(N)  DR(1)] * 2 + Dividendbit(N1)  DR(2)
= Dividendbit(N) * 2  DR(1) * 2 + Dividendbit(N1)  DR(2)
 one moar
R(3) = R(2) * 2 + Dividendbit(N2)  DR(3)
= [Dividendbit(N) * 2 + Dividendbit(N1)  DR(1) * 2  DR(2)] * 2 +
Dividendbit(N2)  DR(3)
= [Dividendbit(N) * 2^2 +
Dividendbit(N1) * 2 +
Dividendbit(N2)] 
[DR(1) * 2^2 + DR(2) * 2 + DR(3)]
Observe how we grouped the terms by Dividendbit(.)
and DR(.)
; we
anticipate for R(.)
to be the difference between two numbers. Now for
the induction hypothesis and step substitution we use an index I < N
,
so as to not confuse with the N
in Dividendbit(N)
. We are very
careful to match the index to R(.)
with the powers of two in the
terms:
 induction hypothesis
R(I1) = [Dividendbit(N) * 2^(I2) +
Dividendbit(N1) * 2^(I3) +
... +
Dividendbit(NI+3) * 2^1 +
Dividendbit(NI+2) * 2^0] 
[DR(1) * 2^(I2) +
DR(2) * 2^(I3) +
... +
DR(I2) * 2^1 +
DR(I1) * 2^0]
 induction step
R(I) = R(I1) * 2 + Dividendbit(NI+1)  DR(I)
= [Dividendbit(N) * 2^(I2) +
... +
Dividendbit(NI+2)] * 2 
[DR(1) * 2^(I2) +
... +
DR(I1)] * 2 +
Dividendbit(NI+1)  DR(I)
= [Dividendbit(N) * 2^(I1) +
Dividendbit(N1) * 2^(I2) +
... +
Dividendbit(NI+2) * 2^1 +
Dividendbit(NI+1) * 2^0] 
[DR(1) * 2^(I1) +
DR(2) * 2^(I2) +
... +
DR(I1) * 2^1 +
DR(I) * 2^0]
 QED.
Then R(N)
is:
R(N) = [Dividendbit(N) * 2^(N1) +
Dividendbit(N1) * 2^(N2) +
... +
Dividendbit(2) * 2^1 +
Dividendbit(1) * 2^0] 
[DR(1) * 2^(N1) + DR(2) * 2^(N2) + ... + DR(N)]
The reader is again recommended to follow the reasoning using pen and paper.
The (by now very) keen reader will notice that the first bracket in the
relation above represents the exact base2 decomposition of Dividend
;
the second term doesn't seem to denote anything in particular, so we'll
give the (notsoinspired) name DR
, leading to:
R(N) = Dividend  DR
 same as:
Dividend = DR + R(N)
On a first glance, it would seem that we're stuck, but there's a wonderful relation hidden behind the equation above. Remember, our goal is to find out whether:
Dividend = Divisor * Q(N) + R(N)
and we also know that:
 for 1 <= I <= N
DR(I) = { 0, if CR(I) < Divisor
Divisor, otherwise }
DQ(I) = { 0, if CR(I) < Divisor
1, otherwise }
So what if we wrote DR(I)
this way?
~~~~
DR(I) = Divisor * DQ(I), 1 <= I = Divisor.
3. At the end of
Nsteps, we will have computed
R(N)and
Q(N)`, the
unique values for which the quotientremainder theorem holds.
In conclusion, and to the best of my knowledge, the FFA implementations of egyptian multiplication and division work. I have therefore added my signature to the code shelf. Comments, reviews, improvements, generalizations and succintizations are more than welcome!

Though I wouldn't advise the reader to be selective, as there's a lot of knowledge to be gained from doing the entire homework.
As for myself, I'm writing only this particular piece because a. to my knowledge no one has yet published something similar, b. there's only that much space on (virtual or otherwise) paper to cover everything and c. I'd like to encourage readers to do their own homework, publish it etc. instead of reading secondhand material. That's why it's called homework, after all. ↩

Bear in mind that while Adacore provide auditable sources for their Ada impementation, the compiler can only be bootstrapped using another Ada compiler, which makes it susceptible to Thompson's hack, i.e. the problem of trust. ↩

The reader is encouraged to read the full implementation etc. ↩