Facebook Prime Bits

This is one of Facebook job candidate puzzles. Given a range [a,b] of positive integer numbers, test for the primality of the number of 1 bits in the binary representation of each number, and do so in O(n) where n is b - a.

Unfortunately the puzzle goes on to assume that n is a 64 bit integer, and that makes asymptotic analysis quite meaningless, so I will discard this assumption for now and comment on it later. The naive solution loops through the given range converting each number to its binary representation, counting 1 bits and testing primality. The complexity of this algorithm is O((b-a) log(b)), using the polynomial time deterministic primality test of Agrawal, Saxena and Kayal. One might think that the primality test is the hard part of the algorithm, but the numbers to be tested are pretty small. The representation of b takes log(b) bits. The number of 1 bits in it is at most, of course, log(b) and that can be represented with log(log(b)) bits, and the ASK algorithm is polynomial w.r.t to the size of its input, or poly(log(log(b))). So is it possible to get rid of the log(b) term depending on counting the bits in the binary representation of a number? The answer is in the following functions and is positive. The main idea is to avoid recomputing the same partial sum of bits over and over again. Imagine a trie or prefix tree of the binary representations of all the numbers in the range. Imagine a traversal in preorder that is performed while keeping a count of 1 bits seen so far between the root (most significant bit) and the current position. Every time a leaf (complete number) is reached, the results are extended with one item.

use strict;

sub countbits {
return f(@_, 0, []);
}

sub f {
my ($a, $b, $count, $retv) = @_;
if ($a != $b) {
my $k = int(log2($b));
if (topbits($a, $k) == topbits($b, $k)) {
return  f(bottombits($a, $k),
bottombits($b, $k),
$count + topbits($a, $k),
$retv );
}
else {
f(bottombits($a, $k),  
2**$k - 1,
$count,
$retv);
return  f(0,
bottombits($b, $k),
$count + 1,  
$retv);
}
}
else {
push @$retv, $count + g($a);
return $retv;
}
}

sub g {
my $a = shift;
return ($a == 0 ? 0 :$a % 2 + g ($a >> 1));
}

sub topbits{
my ($a, $k) = @_;
return $a >> $k;
}

sub bottombits{
my ($a, $k) = @_;
return $a % 2**$k;
}

sub log2 {
my $n = shift;
return log($n)/log(2);
}

The shift and mod operations are not strictly constant time, but they are used just to select the first bit and the remaining bits, so that can be done in constant time but I thought the function was clearer this way. The worst case analysis assumes that the double recursive call is always performed and relies on the fact that the number of significant digits is decreasing by one with each level of recursion. G is of course O(log(a)). This establishes a O((b-a)+log(a)) bound. So there is a log(a) term in there, and we still need the primality test, either precomputed (O((b-a)+log(a)+log(b)poly(log(log(b)))) total time) or performed on each number in O((b-a) poly(log(log(b)))). Neither is superior depending on a. So one can not strictly achieve O(b-a) as claimed by the Facebook gurus. This is not a limitation of my approach. If it were possible, it would imply for a=b that primality can be tested in constant time. Two friends of mine and programming superstars both used the additional assumption of 64 bit numbers to consider log(b) <=64 and therefore constant as we were discussing this problem. Personally, I think mixing finite and asymptotic analysis doesn't really help making things more clear and ridding oneself of a log(b) factor when it can be as large as 64 can have practical consequences. Actually, I think this is a great example that asymptotic analysis is insightful. Still it would be nice to make sure that my algorithm doesn't have horrible constants in its time complexity function. This is left as an exercise for the reader (suggestion: use generating functions).