A cute algorithm for a problem you've never cared to solve

As part of my recent playing with metric search structures, I was wondering about the following problem: Given a metric space X I want to partition X into two sets A and B such that each of A and B has a small diameter. Equivalently, I’m trying to find A that minimizes max(diam(A), diam(\(A^c\))).

The brute force solution for this would be fairly awful: There are \(2^n\) partitions, calculating the score of each is \(O(n^2)\), so the brute force solution takes \(O(2^n n^2)\). So let’s not do that.

I was expecting that I’d have to find an approximate solution, as this looked like a classic hairy NP-hard optimization problem. I’d started contemplating local optimizations, simulated annealing, etc. Then after doing a bit more searching I happened across this paper, which in fact has a polynomial time exact solution to the problem: Diameter Partitioning by David Avis.

The algorithm is incredibly cute - it uses absolutely no advanced ideas, just two clever observations which reduce the entire problem to elementary algorithms. It’s the sort of thing which as soon as you see it you just kick yourself for not having thought of it. So I thought I’d write about it to share how much I enjoyed it, even if no one else is actually likely to care about solving this problem.

The first observation is this:

We can find a partition A, B which has max(diam(A), diam(B))) \(\leq\) t if and only if we can find two sets A, B such that if \(d(x, y) > t\) they are in different halves of the partition (this is restating the definition).

That is, if we draw an edge between x and y if and only if \(d(x, y) > t\), then a partition with max(diam(A), diam(B))) \(\leq\) t is precisely a bipartite matching for this graph.

But bipartite matchings are easy to find! Or at least, \(O(n^2)\) to find, which is no worse than calculating the diameter of a set in the first place. You just do a graph search on each of the components, marking nodes alternatively black and white (i.e. if you’re coming from a white node you mark this node black, if you’re coming from a black you mark this node white) as you find them, and yell “Impossible!” if you ever try to paint a node a different colour than you’ve already painted it. Done.

So for any given t we have an \(O(n^2)\) algorithm that either finds us a partition at least that good or tells us that none exists. Now how do we find a best possible partition?

The obvious answer is binary search, but binary searching on floats is never going to give us the exact answer, only a series of successively better approximations. This then motivates the second observation: Our distances are discrete.

If we work out the distance between every pair (we already have to do that for building the graph) then you can put these pairs + their distances in an array and sort that array. Then it’s just a simple matter of binary searching that array, and additionally you can use it to build the graph without calculating any more distances - just take the pairs to the right of your current point in the array.

Building the array is \(O(n^2 log(n))\), and we will do \(log(n)\) binary searches, each performing an \(O(n^2)\) operation, so in total the cost of this algorithm is \(O(n^2 log(n))\). A lot better than brute force.

Notes



Comments

David R. MacIver on 2012-01-30 15:56:00:

[...] This is motivated by trying to find small diameter partitions of X, which we will be doing by trying to either prove (X, E) is not bipartite or finding a bipartition of it (see this previous post) [...]

Best of drmaciver.com | David R. MacIver on 2013-08-25 13:28:12:

[…] A manifesto for error reporting […]

Mike Jones on 2014-08-30 16:12:29:

error: This blog post links from “A manifesto for error reporting” on this page:
/2013/08/best-of-drmaciver-com/#misc