14 March, 2014

Create two-dimensional histograms in MATLAB #2

In the last post a simple trick of creating two-dimensional histograms from integer coordinates was explained. As a small supplement, let us see, how to create histograms from any floating-point data. This method works for single vectors: in case of two-dimensional histograms we just have to apply the method for each individual dimension.

If you are not familiar with histograms, please read the following Wikipedia article. The method described below is similar to the built-in MATLAB histc function.

First have vector A containing the input values:

$$ A = \{a_0, a_1, \cdots, a_n\} \\ $$

We want to assign each values to one of N bins in range between rmin and rmax. The range can be either given directly or being dependent on the input data. (For example rmin equals to min(A) and rmax is max(A). The given range is divided to N equal sections, so the size of a bin is the following:

$$ d = \frac{r_{max} - r_{min}}{N} \\ $$

To remove the offset from A substract rmin from each value. To get the appropriate bin number for a value, simply divide it by d and apply the ceiling function on the result:

$$ b_n = \left\lceil{\frac{a_n - r_{min}}{d}}\right\rceil $$

The advantage of the ceiling function, that this way we will get 1-based indices, so the result can be used directly for indexing a MATLAB vector.

It may be needed to remove the outliers or saturate them into the desired range. Another important case when an equals to rmin:

$$ b_n = \left\lceil{\frac{a_n-r_{min}}{d}}\right\rceil = \left\lceil{\frac{0}{d}}\right\rceil = 0 $$

These values are exceptions and shall be put into the first bin.

Example

See this code example below:

A = (1 : 8) .^ 0.5         % input vector

N = 4;                     % 4 bins
r_min = 1;                 % minimum range
r_max = 3;                 % maximum range
d = (r_max - r_min) / N    % bin size

See our data and parameters:

A =
   1.0000   1.3195   1.5518   1.7411   1.9037   2.0477   2.1779   2.2974
d =
   0.5000

There are four bins in the range 1 to 3. This means, that the bin size is 0.5, so we have the following bins:

bin #1    [1.0, 1.5]    A(1), A(2)
bin #2    ]1.5, 2.0]    A(3), A(4), A(5)
bin #3    ]2.0, 2.5]    A(6), A(7), A(8)
bin #4    ]2.5, 3.0]    -

To remove or saturate the outliers, the following code examples can be used:

% removing outliers
A(A < r_min) = [];
A(A > r_max) = [];

% saturating outliers
A(A < r_min) = r_min;
A(A > r_max) = r_max;

To calculate the bin indices for each item, we have only two further steps to do:

% bin indices for each value
b = ceil((A - r_min) / d);

% handle the extreme case of bin #0
b(b == 0) = 1

Remember: in the step above you can use multiplication instead of division to make your code even faster on large datasets.

The output equals to the desired one:

b =
   1   1   2   2   2   3   3   3

For creating two-dimensional histograms, we simply have to calculate the bin indices for x and y coordinates individually, then create the histogram from these index-pairs: see the previous post for more details. The method above gives us a flexible way to create two-dimensional histograms.

         

New comment

comments powered by Disqus