24 October, 2013

Calculate standard deviation: case of sliding window

  • I need to calculate standard deviation for sliding window: is there a way to do it?
  • How can optimize the straightforward solution?
  • How can I vectorize the calculation of standard deviation for sliding window?

In the last post a memory-friendly calculation method of the standard deviation was shown. Let us continue the discussion of the standard deviation with another use case. First have a look at the definition again:

$$
\sigma = \sqrt{\dfrac{1}{N-1} \sum\limits_{i=1}^N\left(x_i - \bar{x}\right)^2}
$$

We have an x input signal having N elements. Then we summarize the squared difference of each element from the average value x, and divide it by N - 1. In the final step, the square root of this result is calculated.

Now suppose that we want to calculate the standard deviation using sliding window having window size w. We put the window on the first element and calculate the value for the elements in the window. Then it is shifted to the next item, the calculation is done again, and so on until we reach the end of the signal. A code sample doing this process:

h = 1;                      % for half window size of 3
x = [4 8 1 1 1 7 9 3];      % input signal
N = length(x);              % length of the signal

o = ones(1, N);             % array for output
for i = 1 : N
  % calculate standard deviation using the built-in std command
  % for the current window
  o(i) = std(x(max(1, i - h) : min(N, i + h)));
end

The output is:

o =
   2.82843  3.51188  4.04145  0.00000  3.46410  4.16333  3.05505  4.24264

Altough the method works, it is a bit confusing.

We feel, that a lot of calculations had to be redone in each step – for example calculating the mean and the differences from the mean. If we had a signal of thousands of samples, this approach would be extremely slow, even if we use the built in std function, which is the MATLAB command for calculating standard deviation for an input array.

Try another point of view. In the last post it was proven, that:

$$
\sigma^2 = \dfrac{1}{N-1} \left(q - \dfrac{s^2}{N}\right)
$$

Where:

$$
\begin{aligned}
q &= \sum\limits_{i=1}^N{x_i}^2 \\
s &= \sum\limits_{i=1}^N{x_i}
\end{aligned}
$$

It seems much better, worth to analyze. What we need to use this formula is only the summarized samples and their summarized squares in each window. In a previous post it was shown, that summation for a sliding window can be done easily by convolution:

>> x = [1 7 1 4 4 7 1];
>> k = [1 1 1];
>> y = conv(x, k, 'same')
y =
   8   9  12   9  15  12   8

In the example above each element of the output is the windowed summary of the input. So, in this new approach we have to do the following:

  • Calculate the square of each element
  • Do the windowed summarizing for the original and the squared values by using convolution
  • Apply the formula
  • Take the square root of the output

The source code is:

w = 3;                      % sliding window size
x = [4 8 1 1 1 7 9 3];      % input signal
N = length(x);              % length of the signal

% element count in each window
n = conv(ones(1, N), ones(1, w), 'same');

% calculate s vector
s = conv(x, ones(1, w), 'same');

% calculate q vector
q = x .^ 2;
q = conv(q, ones(1, w), 'same');

% calculate output values
o = (q - s .^ 2 ./ n) ./ (n - 1);

% have to take the square root since output o is the 
% square of the standard deviation currently
o = o .^ 0.5

The output is:

o =
   2.82843  3.51188  4.04145  0.00000  3.46410  4.16333  3.05505  4.24264

One more important thing is the calculation of the element count at each position of the window: this is needed at the sides, where the window is sliding off.

This approach of calculating the standard deviation has several advantages:

  • Vectorized code instead of using for loops
  • Fast, since we are using convolution
  • Needs less computation than the straightforward solution
  • Can be extended to 2 dimensions easily
         

New comment

comments powered by Disqus