27 October, 2013

Vectorization example on MATLAB Cody problem #17

  • Can you show me a simple example on vectorization on a MATLAB Cody problem?
  • Is the vectorized code faster than the code using for loops?

MATLAB Cody is a good starting point for anyone, who wants to learn MATLAB. Multiple solutions can be given for a single task: it is worth to follow their evolution and efficiency.

The currently analyzed task is Cody problem #17:

Given an input vector x, find all elements of x less than 0 or greater than 10 and replace them with NaN.

An example input and output pair:

x = [  5  17 -20  99  3.4  2  8  -6 ]
y = [  5 NaN NaN NaN  3.4  2  8 NaN ]

The straightforward solution is:

function x = clean_data(x)
  for i = 1 : length(x)
    if x(i) < 0 || x(i) > 10
      x(i) = NaN;
    end
  end
end

To measure the run-time of this code, first we generate a random input vector having 10000 elements:

x = floor(rand(1, 10000) * 20 - 5);

The time measurement can be done by using tic and toc functions or with a more detailed function:

tic
clean_data(x);
toc

The output is:

Elapsed time is 0.09075 seconds.

Using vectorization

Now, turn to another approach using vectorization. First, have a look at the following demonstration code:

% an example input for demonstration
x = [-1 5 11 8 20 2]
gt10 = x > 10 % logical indices of elements greater than 10 lt0 = x < 0 % logical indices of elements less than 0
% combining logical indices by using OR operator
indices = gt10 | lt0

% demonstration of selecting values using logical indexing values = x(indices)
% replace values selected by indices by NaN
x(indices) = NaN

The output is:

x =
  -1   5  11   8  20   2
gt10 =
   0   0   1   0   1   0
lt0 =
   1   0   0   0   0   0
indices =
   1   0   1   0   1   0
values =
  -1  11  20
x =
 NaN   5 NaN   8 NaN   2

The most important thing to know, that when we compare a vector with a constant the result is a logical array having same dimensions as the input: it contains 1 (true), where the comparison is true and 0 (false) otherwise.

The steps in this piece of code are the following.

  • Variable gt10 contains true, where the actual value is over 10, variable lt0 is for identifying elements lower than 0.
  • Then indices variable is created by using the or operator: the resulting vector contains true where the value is not in the range of 0 and 10.
  • The values variable is a demostration that logical arrays can be used for indexing an array: those elements are selected only, where the logical index is true. By indexing the x array with indices, we can read the appropriate values.
  • In the last line logical indexing is used to set the values of that elements to NaN which are not in the range of 0 and 10.

After analyzing the steps above, we can write now the following solution:

function x = clean_data(x)
  x(x < 0 | x > 10) = NaN;
end

The code became simple, in addition the for cycle was eliminated. After measuring the run time, the result is:

Elapsed time is 0.000185013 seconds.

The vectorized code is much more faster. It is worth to analyze this approach and use it in daily work.

         

New comment

comments powered by Disqus