COS 226 Programming Assignment 8
Factoring
Write a program to factor huge integers. Read an
integer N from the command line, then print it as the product
of two integers (or note that it is prime) as in the following
brute-force solution:
#include <stdio.h>
#include <limits.h>
main(int argc, char *argv[])
{ int d;
int N = atoi(argv[1]);
for (d = 2; d < N/d; d++)
{
if (N % d == 0) break;
}
if (N % d == 0)
printf("%d = %d*%d\n", N, d, N/d);
else printf("%d is prime\n", N);
}
This code simply checks each integer greater than 1 to see whether it
divides evenly into N. The header limits.h gives
access to constants like INT_MAX that are not used by this
program, but which you may need to avoid overflow in your own code.
We stop after finding any factor since we know that we could find
all factors, if desired, by using the same method for the factors.
The solution just given is not quite a brute-force solution because it
already contains a test that improves performance by a substantial amount:
If we do not find a factor by the time that we reach
the square root of N, then we certainly will not find a factor
larger than that value, and we can declare N to be prime. A truly
naive brute-force implementation might use time proportional to N
(try every integer less than N; this improvement improves the running
time to be proportional to the square root of N, so we can use it to
factor any 32-bit integer in a fraction of a second. In this assignment,
we explore the problem of factoring larger integers.
There are two reasons that the above code is not useful for factoring huge
integers. First, the standard int data type supports 32-bit
integers, so we would get overflow and/or incorrect results if we were
to try to use it for larger numbers. Second, it is too slow. Even if we
could deal with (say) 128-bit integers, we could never afford to try
2^64 divide operations to check for factors.
For this assignment, you will develop an implementation of a faster algorithm
for factoring that is known as Pollard's rho method. The method
is brought to us by the magic of number theory that is beyond the scope of
this course, but it is an easy computation, based on iterating the
function f(x) = x*x + c (mod N). Specifically, we start at some
value x = x0 and maintain,
for i = 1, 2, 3, ... ,
one variable a that tracks
the result of i iterations of f
and another variable b that tracks
the result of 2i iterations of f, computing the
greatest common divisor d of |a - b| and N.
It turns out that whenever we find that
d is not 1, we can infer that
it is a factor of N, so we are done.
(If you know number theory and are interested in details, see
Pollard's paper in BIT 15, 1975, pp. 331-334.)
For example, the following table traces the computation when we use
it to factor 299.
x0 = 1
c = 1
a b |a-b| d
-------------------
2 5 3 1
5 79 74 1
26 174 148 1
79 105 26 13
299 = 13*23
The key to a simple implementation is to note that we can maintain
the values of a and b as required by simply setting
a = f(a) and b = f(f(b)) in the inner loop.
The loop always terminates, since a and b must
eventually be equal (think about why this must be true!) and
the greatest common divisor of 0 and N is N.
Pollard's method is a randomized algorithm that factors with high
probability if the starting value
of a and the
additive constant c
are chosen at random.
Your first task for this assignment is to write a program
small.c to factor int N that uses Pollard's method
to compute the factorization. Instrument your code to also print the
chosen values of x0 and c, the number of times the
loop iterates, and, for N less than 1000, a table
like the one shown above. Use your program to factor
713, 6319, 10609, and 42037.
Completing this part of the
assignment successfully is worth 6 points.
Note that the numbers you have to factor are so small that you do
not have to worry about overflow (see the first Extra Credit below).
You will need to implement a function to compute the greatest common
divisor of two integers. For this purpose, use
a recursive gcd function based on the fact that the
greatest common divisor of a and b is the same as
the greatest common divisor of b and a % b. (This
classic method is known as Euclid's algorithm.) Incidentally, use
a simple if test, not a library function, to compute the
absolute value.
We will hardly notice the difference in performance between
Pollard's algorithm and the brute-force method unless we want to factor huge
integers. To develop an implementation for this purpose, we need an
abstract data type (ADT) that supports all the usual arithmetic
operations on such objects. This is an exercise in ADT design and
implementation, a topic which is covered briefly in COS 126; in some
detail in Chapter 4 of Algorithms in C; and in great detail in
Hanson's book C Interfaces and Implementations and in COS 217.
For this assignment, you can use the interface
bignum.h. At Princeton, there are hundreds of
implementations of this interface lying around, because it is a COS
217 assignment. A trustworthy implementation of this ADT
is provided in ~cs226/prog8_files/ by including the objects
bignum.o and libgmp.a during your compilation.
Your second task for this assignment is to
write a program big.c that uses the BigNum ADT
to factor huge integers, using Pollard's algorithm. Use your program to factor
5352499, 670726081,
9449868410449, 1082154235955237,
121932633334857493, and 13565005454706599869.
Experiment with other huge integers, and include in your readme
file a discussion of the number of iterations that you think are required,
in general.
Extra credit A.
Implement small.c such that it is functionally equivalent to the
sample program given above (but much faster). That is, your program
needs to factor any int, avoiding overflow
in the intermediate calculations.
Extra credit B.
Find the largest prime pair (N and N+2 both prime)
that you can
in about ten seconds of computer time. An impressively large value of
N will earn you an extra point.
Due: 11:59PM Sunday, April 18.