Egyptian div and mul "work": a correctness proof for the arithmetic operations in FFA Chapter 5

February 23, 2018 by Lucian Mogosanu

In the fifth chapter of the FFA series, Stanislav presents a slow yet entirely usable constant-time implementation of division and multiplication, and most importantly, one that fits in head. Following up on our previous exercise, we read the chapter and finally arrive at a set of challenges. Cherry-picking1 through them, we see:

• Prove that the provided Division and Multiplication operations, work.

This article is an exercise in exactly that. But before delving into the gory details, we must clarify what it means (from the author's point of view and in this particular context) a. to prove, and b. that something "works".

While the proof of correctness is indeed mathematical, it is not of the "formal verification" type in the sense of mechanical gargle found in today's literature. That said, we aim to import a minimal body of prior knowledge, consisting mostly of basic arithmetics and highschool algebra. We will also, when absolutely needed, introduce assumptions and as much as possible state them explicitly.

The article presents an algorithmic proof. That is, we read through the code and we manually translate it into very similar pseudo-code when needed, and then we manually translate that into mathematical expressions, for the purpose of algebraic manipulation. Thus egyptian division and multiplication implementations "work" only in the sense that we carefully examine and understand them, and then in the sense that their algorithmic counterparts are led through a deductive process which then leads us to the conclusion. Sadly, we can make no strong statements about either the compiler2 or the hardware (architecture and implementation thereof), so we are forced to make the otherwise unwarranted assumption that they also work.

We also assume that the basic operations presented in previous FFA chapters -- e.g. addition, subtraction, muxing, shifts -- work by the definition of "works" in the previous paragraph. The reader is of course encouraged to not take this statement at face value and (re)examine the implementation of the operations carefully.

We divide the remainder of this article into two pieces. As both the implementation and the underlying equations for egyptian multiplication are simpler, we begin with a correctness proof for it and then later on we continue with the division operation.

Part the first: multiplication

We begin with a description and a code snippet3 of `FZ_Mul_Egyptian`. The partial and final results of the operation are held in a variable (`XY`) double the size of the input (`X` and `Y`). At first, `XY` is cleared (all bits set to zero) and a set of "slides", `XS` and `YS`, are initialized with the values of `X` and `Y` respectively; in particular, `XS` is the same size as `XY`, as `X` must be repeatedly added into `XY`.

Then the implementation loops over `YS`, also modifying `XS`:

``````      -- For each bit of Y:
for i in 1 .. FZ_Bitness(Y) loop

-- If lowest bit of Y-Slide is 1, X-Slide is added into XY
FZ_Add_Gated(X    => XY, Y => XS, Sum => XY,
Gate => FZ_OddP(YS));

-- X-Slide := X-Slide * 2
FZ_ShiftLeft(XS, XS, 1);

-- Y-Slide := Y-Slide / 2
FZ_ShiftRight(YS, YS, 1);

end loop;
``````

Then the output variables are set to `XY`.

The first observation we make is that, `FZ_Add_Gated` being a conditional add operation very similar to `FZ_Mux`, we can also look at it as:

``````         -- If lowest bit of Y-Slide is 1, X-Slide is added into XY
if FZ_OddP(YS) then
FZ_Add(X => XY, Y => XS,      Sum => XY);
else
FZ_Add(X => XY, Y => FZ_Zero, Sum => XY);
end if;
``````

where `FZ_Zero` is a `FZ` constant of the same size as `XY` and `XS`, with all bits set to zero. Using our previous assumption about the correctness of `FZ_Add`, we can rewrite this in the following pseudocode form:

``````         if YSbit(1) is 1 then
XY := XY + XS;
else -- YSbit(1) is 0
XY := XY + 0;
end if;
``````

where `+` here is an equivalent to `FZ_Add` (and consequently we will later assume that `-` represents an equivalent to `FZ_Sub`) and `YSbit(I)` represents the `I`th ("eldest", least significant) bit of `YS`. We will further say that `FZ_Bitness(Y) = N`, so that ```1 <= I <= N```, i.e. 1 is the least significant bit of `YS`, while N is the most significant.

Observe that `XY` is updated (`XS` is added to it) only when ```YSbit(1) = 1```. Thus we can make one (final) modification to this bit, by explicitly introducing `YSbit(1)` in the update instruction. The equivalent (branchless!) assignment would be:

``````            XY := XY + YSbit(1) * XS;
``````

where `*` here denotes a multiplication operation between a bit and a `FZ`, however in this case perfectly equivalent to the muxing in `FZ_Add_Gated`.

Furthermore we rewrite the shifting operations as equivalent operations that multiply (`*`) and divide (`/`) by 2:

``````         -- X-Slide
XS := XS * 2;

-- Y-Slide
YS := YS / 2;
``````

The full algorithmic translation of `FZ_Mul_Egyptian` then becomes:

``````FZ_Mul_Egyptian(X, Y) is:
-- Initialize variables
1.  XY := 0;
2.  XS := X;
3.  YS := Y;
-- Over the bits of Y
4.  for I in 1 .. N loop
-- Partial product
5.     XY := XY + YSbit(1) * XS;
-- Partial X-Slide
6.     XS := XS * 2;
-- Partial Y-Slide
7.     YS := YS / 2;
8.  end loop;
9.  Result := XY;
``````

This version of the algorithm looks pretty good and it can help us model the problem mathematically, but it's not quite the final version. Diana identified that instead of using the first bit of `YS` for muxing and then shifting `YS` to the right at each step, one can optimize the execution time of `FZ_Mul_Egyptian` by simply using the `I`th bit of `Y`. We'll do the same here, with the mention that this makes our algorithm smaller and it helps us more easily identify an inductive pattern. Let's first view the modified (and final) version of the egyptian multiplication algorithm:

``````FZ_Mul_Egyptian(X, Y) is:
-- Initialize variables
1.  XY := 0;
2.  XS := X;
-- Over the bits of Y
3.  for I in 1 .. N loop
-- Partial product
4.     XY := XY + Ybit(I) * XS;
-- Partial X-Slide
5.     XS := XS * 2;
6.  end loop;
7.  Result := XY;
``````

Note the changes: a. we removed the Y-slide; and b. at each step of the loop, `Ybit(I)` is used instead of `YSbit(1)`.

Now we can begin to reason about the hardest part of this algorithm, the loop. While, as described, the loop goes over the bits of `Y` (i.e. they are our "induction variable"), we can see it as the computation of two recurrence relations, that is, of the following finite sequences. We begin with `XS`:

``````XS(0) = X
XS(I) = XS(I-1) * 2, 1 <= I <= N
``````

This has so far been the largest leap in logic in our article, crossing suddenly from an algorithmic view of the problem to a mathematical one; thus I ask the reader to sit for a moment and contemplate this leap. At the end, the correspondence between the `XS(I)` in the recurrence and the `XS` in the algorithm should hopefully become obvious: at each step, the value of `XS` is doubled, and this has an algebraic significance, as we will see in a moment. Now we model the update of `XY` similarly:

``````XY(0) = 0
XY(I) = XY(I-1) + Ybit(I) * XS(I-1), 1 <= I <= N
``````

Observe how `XY(I)` is computed based on `XS(I-1)`; to get this intuition, try to mentally (or on the paper) run the first step of the algorithm: `XY` is updated before `XS`, and thus `XY(1)` is computed based on `XS(0)`, `XY(2)` based on `XS(1)` and so on.

The problem of computing the arithmetic product egyptologically then becomes equivalent to finding the value of `XY(N)`. And thus in order to show that egyptian multiplication is correct, we need to find the general form of this sequence. This can be solved by induction. First we need to do `XS`:

``````XS(0) = X
XS(1) = XS(0) * 2 = X * 2
XS(2) = XS(1) * 2 = X * 2 * 2 = X * 2^2
...
-- by induction
XS(I) = XS(I-1) * 2 = X * 2^(I-1) * 2 = X * 2^I
-- thus, the general form of XS(I)
XS(I) = X * 2^I
``````

Now the harder part, we do the same for `XY`:

``````XY(0) = 0
XY(1) = XY(0) + Ybit(1) * XS(0)
= 0     + Ybit(1) * X     = Ybit(1) * X
XY(2) = XY(1)       + Ybit(2) * XS(1)
= Ybit(1) * X + Ybit(2) * X * 2
-- let's do one more
XY(3) = XY(2)             + Ybit(3) * XS(2)
= Ybit(1) * X * 2^0 +
Ybit(2) * X * 2^1 +
Ybit(3) * X * 2^2
...
-- induction hypothesis
XY(N-1) = Ybit(1)   * X * 2^0     +
Ybit(2)   * X * 2^1     +
...                     +
Ybit(N-1) * X * 2^(N-2)
-- induction step proof
XY(N) = XY(N-1) + Ybit(N) * XS(N-1)     -- subst. XS(N-1)
= XY(N-1) + Ybit(N) * X * 2^(N-1) -- subst. ind. hyp.
= Ybit(1)   * X * 2^0       +
Ybit(2)   * X * 2^1       +
...                       +
Ybit(N-1) * X * 2^(N-2)   +
Ybit(N)   * X * 2^(N-1)
-- QED.
``````

The reader is of course encouraged to replicate the result using pen and paper.

Now that we have the general form of `XY(N)` beneath our eyes, the question remains, what do we do with it? The astute reader has probably observed that we have been juggling with meanings for our ideal multiplication operation, i.e. `*`. For example `Ybit(I) * X` has a somewhat different meaning than `X * 2^(I-1)`. We will therefore intuit that `*` is in fact the arithmetic multiplication and it has the very same properties that we know from school. We then observe that we can factor out the common `X` in the sum, and thus `XY(N)` becomes:

``````XY(N) = X * [Ybit(1) * 2^0 + Ybit(2) * 2^1 + ... + Ybit(N) * 2^(N-1)]
``````

which brings us to some notions of basic algebra and number theory: given an arbitrary number `K` and a base `B > 1`, then `K` can be decomposed into the sum:

``````K(1) * B^0 + K(2) * B^1 + K(3) * B^2 + ...
``````

where `0 <= K(I) < B`, i.e. `K(I)` is a "modulo-B digit". Our particular `K(I)` is `Ybit(I)` and it denotes the elements of the bit-vector `Y`. From this follows that the sum-of-products factor in the relation of `XY(N)` above is in fact `Y`, and thus `XY(N)` becomes:

``````XY(N) = X * Y
``````

This then settles it. A number of refinement steps have brought us from repeated `FZ_Add` operations to the mathematical `*`; the reverse (un-)refinement is also possible4, which serves to prove that the algorithm implemented by `FZ_Mul_Egyptian` and the algebraic `*` are equivalent. The reader is, of course, encouraged to challenge this statement.

More importantly, these refinement steps serve to show us what egyptian multiplication is: given a base `B` and two numbers `X` and `Y`, egyptian multiplication finds the answer to `X * Y` by adding `X` and multiplying it (the same `X`) by `B` a number of times depending on the number of digits (in base-`B` representation) of `Y`. Programmers can then implement a variation of this algorithm on ternary or quaternary computers, etc.5

We move on to the trickier part of our article: egyptian division.

Part the second: division

Egyptian division is very similar to its multiplication counterpart, only it is considerably trickier to comprehend. As `X * Y` means "to add `X` a number of times equal to `Y`", so does `X / Y` mean "to subtract `Y` from `X` a number of times equal to `Q`, until a number `R` remains". The egyptological principle remains the same: we are in base 2 and we shift, subtract and add-conditionally `N` times. But before laying the implementation of `FZ_IDiv` before our eyes, let us consider a few aspects inherent to division.

Computing `X / Y` involves finding two numbers, `Q` and `R`, so that ```X = Y * Q + R```. We call `X` the divisor and `Y` the dividend; and an old theorem called the quotient-remainder theorem6 guarantees us that for arbitrary integers `X` and `Y`, there exist `Q` and `R` satisfying the relation given here; and furthermore, that `Q` and `R` are unique. `FZ_IDiv` is thus supposed to correctly find these two numbers.

Now, for the implementation. `FZ_IDiv` declares `Q` and `R` as a single register, `QR`, whose `N` least significant bits are initially equal to the dividend, and at the end will contain `Q`. The rest of the bits in `QR` are initially zero, and at the end will contain the value of `R`. The (supposed) "induction" part looks as follows:

``````      for i in 1 .. FZ_Bitness(Dividend) loop

-- Advance QR by 1 bit:
FZ_ShiftLeft(QR, QR, 1);

-- Subtract Divisor from R; Underflow goes into C
FZ_Sub(X => R, Y => Divisor, Difference => R, Underflow => C);

-- If C=1, subtraction underflowed, and then Divisor gets added back:
FZ_Add_Gated(X => R, Y => Divisor, Gate => C, Sum => R);

-- Current result-bit is equal to Not-C, i.e. 1 if Divisor 'went in'
FZ_Or_W(Q, W_Not(C));

end loop;
``````

We take the first left shift first. Shifting `QR` does (from an arithmetic point of view) two things: it multiplies `Q` by 2 and it overflows bits of `Q` into `R`; the latter meaning the multiplication of `R` by 2 and adding the "newest" (most significant) bit of `Q` to `R`. We can thus rewrite it as:

``````         -- Advance QR by 1 bit:
R := R * 2 + Qbit(N);
Q := Q * 2;           -- NB: mod 2^N!
``````

The remaining operations in the loop are dependent upon the result of the call to `FZ_Sub`, which is an implicit `R < Divisor` comparison. If this comparison fails, then the subtraction is undone using `FZ_Add_Gated` and the least significant bit of `Q` is set to zero; else, `Divisor` remains subtracted from `R` and the least significant bit of `Q` is set to one. As in the case of multiplication, we rewrite this into a pseudo-code using explicit conditionals:

``````         -- Check (R - Divisor) underflow
if R < Divisor then
R := R - 0;
Q := Q + 0;
else
R := R - Divisor;
Q := Q + 1;
end if;
``````

The intuition is the exact one from long division: once (or if) enough bits of the dividend have overflown into `R`, we subtract the divisor and we set the digit (i.e. bit) of the quotient accordingly. As with multiplication, we can get rid of the complicated dependency on specific bits at each step. We observe that at the first step (`I = 1`) `Qbit(N)` addresses the `N`th bit of `Dividend`, then at `I = 2` the `N-1`th and so on until `I = N` and the addressing of the first bit of `Dividend`. Thus we make the first update of `R` depend on `Dividendbit(N-I+1)` instead of `Qbit(N)`.

Thus the algorithmic form (or one of the forms) of `FZ_IDiv` is:

``````FZ_IDiv(Dividend, Divisor) is:
-- Initialize variables
1.  Q := Dividend;
2.  R := 0;
-- Over the bits of Dividend
3.  for I in 1 .. N loop
-- Update non-conditional values of R and Q
4.     R := R * 2 + Dividendbit(N-I+1);
5.     Q := Q * 2;
-- Conditional updates of R and Q
6.     if R < Divisor then
7.        R := R - 0;
8.        Q := Q + 0;
10.    else
11.       R := R - Divisor;
12.       Q := Q + 1;
13.    end if;
14. end loop;
15. Result := Q, R;
``````

Now comes the tricky part, that is, putting the algorithm in a mathematical form that allows us to reason about it. At least two aspects are not exactly easy to spot: `R` and `Q` suffer two updates each, and the `R` in `R < Divisor` (on which the conditional updates to `Q` and `R` depend) is not the same `R` as in the previous iteration of the loop. We'll try to untangle this first, by taking a conditional-`R` (`CR`) which maps to the first update to `R`:

``````-- for 1 <= I <= N
CR(I) = R(I-1) * 2 + Dividendbit(N-I+1)
``````

Next, since it's not obvious yet how we could get rid of the explicit conditionals -- as we did with multiplication by factoring in `Ybit(.)` -- we define deltas for `Q` and `R`, depending explicitly on `CR(I)`. That is:

``````-- for 1 <= I <= N
DR(I) = { 0,       if CR(I) < Divisor
Divisor, otherwise }
DQ(I) = { 0, if CR(I) < Divisor
1, otherwise }
``````

The recurrence relation for `R` is then:

``````R(0) = 0
R(I) = CR(I)                           - DR(I)
= R(I-1) * 2 + Dividendbit(N-I+1) - DR(I),
1 <= I <= N
``````

and for `Q`:

``````Q(0) = Dividend
Q(I) = Q(I-1) * 2 + DQ(I), 1 <= I <= N
``````

As before, these recurrences are a leap from the algorithmic to the mathematical, so it might be completely non-obvious where the relations come from. The reader is recommended to go through the mental gymnastics necessary to obtain them by using pen and paper. At the end it should be clear that `Q(N)` is the resulting quotient, while `R(N)` is the resulting remainder, and furthermore we wish to show that the equation:

``````Dividend = Divisor * Q(N) + R(N)
``````

holds true.

Now let's try to figure out a general form for `Q(I)` first, since this is the simplest of the two:

``````Q(0) = Dividend
Q(1) = Q(0) * 2 + DQ(1) = Dividend * 2 + DQ(1)
Q(2) = Q(1)                   * 2 + DQ(2)
= [Dividend * 2 + DQ(1)] * 2 + DQ(2)
= Dividend * 2^2 + DQ(1) * 2 + DQ(2)
-- let's do one more
Q(3) = Q(2)                                 * 2 + DQ(3)
= [Dividend * 2^2 + DQ(1) * 2 + DQ(2)] * 2 + DQ(3)
= Dividend * 2^3 + DQ(1) * 2^2 + DQ(2) * 2 + DQ(3)
...
-- induction hypothesis
Q(N-1) = Dividend * 2^(N-1) +
DQ(1)    * 2^(N-2) +
DQ(2)    * 2^(N-3) +
...                +
DQ(N-2)  * 2^1     +
DQ(N-1)  * 2^0
-- induction step proof
Q(N) = Q(N-1) * 2 + DQ(N) -- subst. Q(N-1)
= [Dividend * 2^(N-1) +
DQ(1)    * 2^(N-2) +
DQ(2)    * 2^(N-3) +
...                +
DQ(N-2)  * 2^1     +
DQ(N-1)  * 2^0     ]
* 2 + DQ(N)
= Dividend * 2^N     +
DQ(1)    * 2^(N-1) +
DQ(2)    * 2^(N-2) +
...                +
DQ(N-1)  * 2^1     +
DQ(N)    * 2^0
-- QED.
``````

Moreover, all multiplications are modulo `2^N` (we fit the the results in `N` bits), so the term `Dividend * 2^N` vanishes. We are left with a base-2 number:

``````Q(N) = DQ(1) * 2^(N-1) + DQ(2) * 2^(N-2) + ... + DQ(N) * 2^0
``````

As for `R(I)`:

``````R(0) = 0
R(1) = R(0) * 2 + Dividendbit(N) - DR(1)
=            Dividendbit(N) - DR(1)
R(2) = R(1) * 2 + Dividendbit(N-1) - DR(2)
= [Dividendbit(N) - DR(1)] * 2 + Dividendbit(N-1) - DR(2)
= Dividendbit(N) * 2 - DR(1) * 2 + Dividendbit(N-1) - DR(2)
-- one moar
R(3) = R(2) * 2 + Dividendbit(N-2) - DR(3)
= [Dividendbit(N) * 2 + Dividendbit(N-1) - DR(1) * 2 - DR(2)] * 2 +
Dividendbit(N-2) - DR(3)
= [Dividendbit(N)    * 2^2 +
Dividendbit(N-1)  * 2   +
Dividendbit(N-2)]       -
[DR(1) * 2^2 + DR(2) * 2 + DR(3)]
``````

Observe how we grouped the terms by `Dividendbit(.)` and `DR(.)`; we anticipate for `R(.)` to be the difference between two numbers. Now for the induction hypothesis and step substitution we use an index `I < N`, so as to not confuse with the `N` in `Dividendbit(N)`. We are very careful to match the index to `R(.)` with the powers of two in the terms:

``````-- induction hypothesis
R(I-1) = [Dividendbit(N)     * 2^(I-2) +
Dividendbit(N-1)   * 2^(I-3) +
...                          +
Dividendbit(N-I+3) * 2^1     +
Dividendbit(N-I+2) * 2^0]    -
[DR(1)   * 2^(I-2) +
DR(2)   * 2^(I-3) +
...               +
DR(I-2) * 2^1     +
DR(I-1) * 2^0]
-- induction step
R(I) = R(I-1) * 2 + Dividendbit(N-I+1) - DR(I)
= [Dividendbit(N) * 2^(I-2) +
...                      +
Dividendbit(N-I+2)]        * 2 -
[DR(1) * 2^(I-2)  +
...              +
DR(I-1)]                   * 2 +
Dividendbit(N-I+1)              - DR(I)
= [Dividendbit(N)     * 2^(I-1) +
Dividendbit(N-1)   * 2^(I-2) +
...                          +
Dividendbit(N-I+2) * 2^1     +
Dividendbit(N-I+1) * 2^0]    -
[DR(1)   * 2^(I-1) +
DR(2)   * 2^(I-2) +
...               +
DR(I-1) * 2^1     +
DR(I)   * 2^0]
-- QED.
``````

Then `R(N)` is:

``````R(N) = [Dividendbit(N)   * 2^(N-1) +
Dividendbit(N-1) * 2^(N-2) +
...                        +
Dividendbit(2)   * 2^1     +
Dividendbit(1)   * 2^0]    -
[DR(1) * 2^(N-1) + DR(2) * 2^(N-2) + ... + DR(N)]
``````

The reader is again recommended to follow the reasoning using pen and paper.

The (by now very) keen reader will notice that the first bracket in the relation above represents the exact base-2 decomposition of `Dividend`; the second term doesn't seem to denote anything in particular, so we'll give the (not-so-inspired) name `DR`, leading to:

``````R(N) = Dividend - DR
-- same as:
Dividend = DR + R(N)
``````

On a first glance, it would seem that we're stuck, but there's a wonderful relation hidden behind the equation above. Remember, our goal is to find out whether:

``````Dividend = Divisor * Q(N) + R(N)
``````

and we also know that:

``````-- for 1 <= I <= N
DR(I) = { 0,       if CR(I) < Divisor
Divisor, otherwise }
DQ(I) = { 0, if CR(I) < Divisor
1, otherwise }
``````

So what if we wrote `DR(I)` this way?

``````DR(I) = Divisor * DQ(I), 1 <= I = Divisor`.
``````

This relation is perfectly valid, and it helps us rewrite `DR` as:

``````DR = DR(1) * 2^(N-1) + DR(2) * 2^(N-2) + ... + DR(N)
= Divisor * DQ(1) * 2^(N-1) +
Divisor * DQ(2) * 2^(N-2) +
...                       +
Divisor * DQ(N) * 2^0
= Divisor * [DQ(1) * 2^(N-1) + DQ(2) * 2^(N-2) + ... + DQ(N)]
= Divisor * Q(N)
``````

which seals the deal for us.

Let's now take a few steps back and explain how long egyptological division works:

1. We initialize `R(0)` and `Q(0)`.
2. At each step `I` we: a. compute a `CR(I)`, deciding whether the divisor should be subtracted; b. based on `CR(I)`, compute `DQ(I)` and `Q(I)`, adding 1 to the quotient when a successful subtraction was performed; and c. compute `DR(I)` and `R(I)`, which performs the actual subtraction when `CR(I) >= Divisor`.
3. At the end of `N` steps, we will have computed `R(N)` and `Q(N)`, the unique values for which the quotient-remainder theorem holds.

In conclusion, and to the best of my knowledge, the FFA implementations of egyptian multiplication and division work. I have therefore added my signature to the code shelf. Comments, reviews, improvements, generalizations and succintizations are more than welcome!

1. Though I wouldn't advise the reader to be selective, as there's a lot of knowledge to be gained from doing the entire homework.

As for myself, I'm writing only this particular piece because a. to my knowledge no one has yet published something similar, b. there's only that much space on (virtual or otherwise) paper to cover everything and c. I'd like to encourage readers to do their own homework, publish it etc. instead of reading second-hand material. That's why it's called home-work, after all.

2. Bear in mind that while Adacore provide auditable sources for their Ada impementation, the compiler can only be bootstrapped using another Ada compiler, which makes it susceptible to Thompson's hack, i.e. the problem of trust

3. The reader is encouraged to read the full implementation etc.

4. And a good exercise for the mathematically-inclined reader.

5. History has shown egyptian multiplication to work with some degree of success on decimal computers. This works still, and may continue to work for generations to come -- as long as one does not forget his whip

6. Whose proof is left as an exercise to the reader.

Filed under: math.