30 September, 2014

# An alternative of nchoosek for selecting all pairs

Sometimes we need to select all pairs from a set, for example to calculate the distance between all point pairs. Function nchoosek provides a convenient way to do this. We simply have to call it by two parameters.

% select the indices of all pairs from 5 elements
p = nchoosek(1 : 5, 2)


The result is:

p =
1  2
1  3
1  4
1  5
2  3
2  4
2  5
3  4
3  5
4  5


As the number of the elements in the set goes high, the runtime of nchoosek increases a lot. In case of large number of elements we shall look for another approach to select all pairs.

To illustrate the idea, let us draw a matrix and mark all unique pairs. The result will be a lower triangular matrix, because all elements are set under the diagonal:

   1  2  3  4  5
1  0  0  0  0  0
2  1  0  0  0  0
3  1  1  0  0  0
4  1  1  1  0  0
5  1  1  1  1  0


MATLAB provides an easy way to create such a matrix by using the tril function:

% create an 5-by-5 matrix excluding the diagonal
M = tril(ones(5), -1)


The result is:

M =
0  0  0  0  0
1  0  0  0  0
1  1  0  0  0
1  1  1  0  0
1  1  1  1  0


Now we only have to extract the row and column indices of the true elements, so let us use the find function with appropriate parameters:

[y, x] = find(M);
p = [x, y]


The result equals to the desired output:

p =
1  2
1  3
1  4
1  5
2  3
2  4
2  5
3  4
3  5
4  5


Turn the method above to a function:

% return indices of all pairs from n elements
% example output for n = 3
% 1 2
% 1 3
% 2 3
function indices = pairIndices(n)
[y, x] = find(tril(logical(ones(n)), -1));
indices = [x, y];
end


In the following table you can find the run-times for sets having different number of elements.

 #set         nchoosek       pairIndices
100           2.27ms            0.26ms
250          16.10ms            0.73ms
500         291.64ms            3.72ms
1000        2739.42ms           14.55ms


It can be clearly seen, that the required run-time of nchoosek increases exponentially with the number of the elements in the set, the alternative method is about 188 times faster in case of 1000 elements.

Tags: