25 November, 2013

# Pattern search example: monotonically increasing or decreasing subset

• How can I look for a given simple pattern in a dataset?
• How can I look for monotonically increasing, decreasing patterns in a matrix or a vector?
• Is pattern search a task convolution can be used for?

Sometimes it is very useful to apply a simple pattern-search on a dataset in MATLAB. The currently explained example is to search for subsets, which are monotonically decreasing and are at least four-element long.

Suppose, that we have the following input signal:

a = [-1  2  8  9  9  7  4  5  3  2  0 -1  4]


The first step is to calculate the difference between neighboring elements, so we will know, that the function is decreasing or increasing in a given point. To have this information, apply the diff function on the input:

>> a
>> diff(a)a =
-1   2   8   9   9   7   4   5   3   2   0  -1   4
d =
3   6   1   0  -2  -3   1  -2  -1  -2  -1   5

Now we know the relation between neighboring elements: for example the difference is three between the first and the second point, and six between the second and the third element.

Another simple, but important thing to note, that the magnitude of the difference is not important, but its sign indicates the trend of increasing, decreasing or being constant. We can simplify our data by applying the signum function on the differences to get their sign. The definition of the signum function is the following:

$$sign(x) = \begin{cases} -1: & x < 0 \\ \phantom{-}0: & x = 0 \\ \phantom{-}1: & x > 0 \end{cases}$$

As it can be seen, the signum function drops the magnitude indicating only the sign. In MATLAB we can apply it by calling the sign function, so our code turns to be:

a = [-1  2  8  9  9  7  4  5  3  2  0 -1  4]
d = diff(a)
s = sign(d)


The result is the following:

a =
-1   2   8   9   9   7   4   5   3   2   0  -1   4
d =
3   6   1   0  -2  -3   1  -2  -1  -2  -1   5
s =
1   1   1   0  -1  -1   1  -1  -1  -1  -1   1

One of the appropriate subsets and the data derived from it is marked in bold. It can be seen, that a four-element-long monotonically decreasing path suits a three-element-long sequence of minus ones: if we find such a sequence, we know, that an appropriate subset belongs to it.

The only task left is to find this sequence of minus ones. Of course it could be done by using a for cycle, but convolution is a much more convenient way to go. During convolution the kernel window is slid along the input, all coherent elements are multiplied and then summarized. If you are not familiar by this operator, please read this tutorial on convolution.

What kernel shall we use? A three-element-long window containing ones is appropriate, since it gives back the summary of three neighboring elements. If in the result we have an output value of -3, it means, that we have found a good sequence. This works, because the output of the sign function contains only values of -1, 0 and 1: the summary of a three-element-long window can only be -3 if all of its elements are -1.

Our program with the modifications:

a = [-1  2  8  9  9  7  4  5  3  2  0 -1  4]
s = sign(diff(a))
c = conv2(s, [1 1 1], 'valid')


The output is:

a =
-1   2   8   9   9   7   4   5   3   2   0  -1   4
d =
1   1   1   0  -1  -1   1  -1  -1  -1  -1   1
c =
3   2   0  -2  -1  -1  -1  -3  -3  -1

The convolution is done only for the valid elements: it has the advantage, that the position of the -3 elements equal to the position of the generator window and the appropriate subsets. The result of convolution and the idea behind this method can be clearly seen from this output.

The only thing left is to extract the positions of the -3 values by using the find function. All the operations are contracted to a single line in this code example:

a = [-1  2  8  9  9  7  4  5  3  2  0 -1  4]
p = find(conv2(sign(diff(a)), [1 1 1], 'valid') == -3)


The output shows, that there are two monotonically decreasing four-element-long subsets found at positions eight and nine:

a =
-1   2   8   9   9   7   4   5   3   2   0  -1   4
p =
8   9

This approach is fast because of using convolution and can be applied to multiple lines at the same time.