27 October, 2013

# Vectorization example on MATLAB Cody problem #17

• Can you show me a simple example on vectorization on a MATLAB Cody problem?
• Is the vectorized code faster than the code using for loops?

MATLAB Cody is a good starting point for anyone, who wants to learn MATLAB. Multiple solutions can be given for a single task: it is worth to follow their evolution and efficiency.

The currently analyzed task is Cody problem #17:

Given an input vector x, find all elements of x less than 0 or greater than 10 and replace them with NaN.

An example input and output pair:

x = [  5  17 -20  99  3.4  2  8  -6 ]
y = [  5 NaN NaN NaN  3.4  2  8 NaN ]


The straightforward solution is:

function x = clean_data(x)
for i = 1 : length(x)
if x(i) < 0 || x(i) > 10
x(i) = NaN;
end
end
end


To measure the run-time of this code, first we generate a random input vector having 10000 elements:

x = floor(rand(1, 10000) * 20 - 5);


The time measurement can be done by using tic and toc functions or with a more detailed function:

tic
clean_data(x);
toc


The output is:

Elapsed time is 0.09075 seconds.

## Using vectorization

Now, turn to another approach using vectorization. First, have a look at the following demonstration code:

% an example input for demonstrationx = [-1 5 11 8 20 2]
gt10 = x > 10    % logical indices of elements greater than 10
lt0  = x < 0     % logical indices of elements less than 0
% combining logical indices by using OR operatorindices = gt10 | lt0% demonstration of selecting values using logical indexing
values  = x(indices)
% replace values selected by indices by NaNx(indices) = NaN


The output is:

x =
-1   5  11   8  20   2
gt10 =
0   0   1   0   1   0
lt0 =
1   0   0   0   0   0
indices =
1   0   1   0   1   0
values =
-1  11  20
x =
NaN   5 NaN   8 NaN   2


The most important thing to know, that when we compare a vector with a constant the result is a logical array having same dimensions as the input: it contains 1 (true), where the comparison is true and 0 (false) otherwise.

The steps in this piece of code are the following.

• Variable gt10 contains true, where the actual value is over 10, variable lt0 is for identifying elements lower than 0.
• Then indices variable is created by using the or operator: the resulting vector contains true where the value is not in the range of 0 and 10.
• The values variable is a demostration that logical arrays can be used for indexing an array: those elements are selected only, where the logical index is true. By indexing the x array with indices, we can read the appropriate values.
• In the last line logical indexing is used to set the values of that elements to NaN which are not in the range of 0 and 10.

After analyzing the steps above, we can write now the following solution:

function x = clean_data(x)
x(x < 0 | x > 10) = NaN;
end

The code became simple, in addition the for cycle was eliminated. After measuring the run time, the result is:

Elapsed time is 0.000185013 seconds.


The vectorized code is much more faster. It is worth to analyze this approach and use it in daily work.