09 March, 2014

Create two-dimensional histograms in MATLAB

Sometimes we need two-dimensional histograms for a task, for example to visualize distribution of vectors or points. In MATLAB there is no function designed especially for this operation, so we have to find a workaround, or look for an appropriate solution in the File Exchange section of MATLAB Central. In this post we will use a function designed not for creating histograms but for sparse matrices in a tricky way.

If you are not familiar with histograms, please read this document.

First create a test point set to visualize in a two dimensional histogram:

% create 1000 random plots, non-uniform distribution
r = rand(1, 1000) .^ 2 * 50;
a = rand(1, 1000) * pi * 2;

% convert polar coordinates to cartesian coordinates
x = round(sin(a) .* r);
y = round(cos(a) .* r);

A hundred random points are generated in the polar space having non-uniform distribution. Note, that all the random radius values fall into the [0, 1] interval, so taking the square of them moves the points towards zero; then they are scaled by 50. In the last step the polar coordinates are converted to cartesian ones, and are rounded to the nearest integer.

Sometimes our dataset is not so nice, having some extremely small or large values, called outliers, since they are out of the region we are interested in. Suppose that in the current case we want to remove points, which are not in the [-25, -25], [25, 25] region.

% create a logical array containing true for points where:
%  1. x is less than or equal to -25, or
%  2. x is more than or equal to +25, or
%  3. y is less than or equal to -25, or
%  4. y is more than or equal to +25
m = x <= -25 | x >= 25 | y <= -25 | y >= 25;

% remove the outliers identified by true
x(m) = [];
y(m) = [];

Now we have a nice dataset containing several points in the [-25, -25], [25, 25] region, the remaining step is to create a two-dimensional histogram from them. Here comes the trick: we will use a function designed not for this operation but for sparse matrices, called sparse.

In given cases matrices have several zero and only some non-zero elements: $$ A = \begin{bmatrix} 0 & 2 & 0 & \cdots & 0 & 1 \\ 0 & 0 & 1 & \cdots & 0 & 0 \\ 1 & 0 & 0 & \cdots & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & 1 & 0 \\ \end{bmatrix} $$ In full matrices we store all elements regardless to their value: this may require a large amount of memory. In a sparse matrix only non-zero elements and their positions are stored, saving lots of memory. To read more on sparse matrices, please have a look at this document.

The general call of this function is the following:

S = sparse(i, j, s);

According to the MATLAB documentation, vectors i, j, and s are all the same length. In the resulting sparse matrix S(i(k), j(k)) equals to s(k). To convert a sparse matrix back to a full matrix, simply call function full. See this example below:

% define points and values
%  - x identifies the columns
%  - y identifies the rows
%  - v identifies the values
x = [1, 4, 2];
y = [4, 4, 1];
v = [1, 1, 1];

% create a sparse matrix
S = sparse(y, x, v);

% convert and display the full matrix
M = full(S)

The output is:

M =
   0   1   0   0
   0   0   0   0
   0   0   0   0
   1   0   0   1

There is an important sentence in the documentation:

Any elements of s that have duplicate values of i and j are added together.

Practically that means, that if a coordinate occurs multiple times, values are added together. This is basically the same functionality we want for creating a histogram. Another trick is, that although the documentation states, that i, j and s must have the same lengths, MATLAB accepts, if s is a scalar. See the example below:

% define points
x = [1, 4, 2, 1, 3, 4, 4];
y = [4, 4, 1, 4, 3, 4, 4];

% create a sparse matrix, then convert it to a full one
M = full(sparse(y, x, 1))

The output is:

M =
   0   1   0   0
   0   0   0   0
   0   0   1   0
   2   0   0   3

As we can see, point [4, 4] was three times in the input list, so the appropriate value in the output matrix is three, too. The case is similar for point [1, 4].

There is only one step left, because the coordinates must be positive integers, when creating a sparse matrix. In our set there are negative coordinates too, so the set must be translated in order to contain only positive values. Since the minimum value is -24 for both x and y coordinates, we simply add 25 to them.

The full code example is below:

% create 1000 random plots, non-uniform distribution
r = rand(1, 1000) .^ 2 * 50;
a = rand(1, 1000) * pi * 2;

% convert polar coordinates to cartesian coordinates
x = round(sin(a) .* r);
y = round(cos(a) .* r);

% create a logical array containing true for points where:
%  1. x is less than or equal to -25, or
%  2. x is more than or equal to +25, or
%  3. y is less than or equal to -25, or
%  4. y is more than or equal to +25
m = x <= -25 | x >= 25 | y <= -25 | y >= 25;

% remove the outliers identified by true
x(m) = [];
y(m) = [];

% translate points by 25 units to get positive coordinates
j = x + 25;
i = y + 25;

% create two dimensional histogram using sparse matrix
H = full(sparse(i, j, 1));

The output will be similar to this histogram, where most of the points are centered around the pole, as we supposed:

This is a simple way to create two-dimensional histograms from a point or vector set. The method works only for integer values: in a later post an extension will be discussed for handling floating-point input values, too.

         

New comment

comments powered by Disqus