30 September, 2014

An alternative of nchoosek for selecting all pairs

Sometimes we need to select all pairs from a set, for example to calculate the distance between all point pairs. Function nchoosek provides a convenient way to do this. We simply have to call it by two parameters.

% select the indices of all pairs from 5 elements
p = nchoosek(1 : 5, 2)

The result is:

p =
  1  2
  1  3
  1  4
  1  5
  2  3
  2  4
  2  5
  3  4
  3  5
  4  5

As the number of the elements in the set goes high, the runtime of nchoosek increases a lot. In case of large number of elements we shall look for another approach to select all pairs.

To illustrate the idea, let us draw a matrix and mark all unique pairs. The result will be a lower triangular matrix, because all elements are set under the diagonal:

   1  2  3  4  5
1  0  0  0  0  0
2  1  0  0  0  0
3  1  1  0  0  0
4  1  1  1  0  0
5  1  1  1  1  0

MATLAB provides an easy way to create such a matrix by using the tril function:

% create an 5-by-5 matrix excluding the diagonal
M = tril(ones(5), -1)

The result is:

M =
  0  0  0  0  0
  1  0  0  0  0
  1  1  0  0  0
  1  1  1  0  0
  1  1  1  1  0

Now we only have to extract the row and column indices of the true elements, so let us use the find function with appropriate parameters:

[y, x] = find(M);
p = [x, y]

The result equals to the desired output:

p =
  1  2
  1  3
  1  4
  1  5
  2  3
  2  4
  2  5
  3  4
  3  5
  4  5

Turn the method above to a function:

% return indices of all pairs from n elements
% example output for n = 3
% 1 2
% 1 3
% 2 3
function indices = pairIndices(n)
  [y, x] = find(tril(logical(ones(n)), -1));
  indices = [x, y];
end

In the following table you can find the run-times for sets having different number of elements.

 #set         nchoosek       pairIndices
  100           2.27ms            0.26ms
  250          16.10ms            0.73ms
  500         291.64ms            3.72ms
 1000        2739.42ms           14.55ms

It can be clearly seen, that the required run-time of nchoosek increases exponentially with the number of the elements in the set, the alternative method is about 188 times faster in case of 1000 elements.

         

New comment

comments powered by Disqus