Often it is necessary to find specific locations in a stream of data. For example, you might want to find out when a signal passes a given threshold (Fig. [fig:Events]). Or when you analyze hand movements, you might want to find at which point the hand starts to move, and at which the hand movement ends. These locations are often referred to as features, or events if the data is a time series.

“Events” can be the times when a signal passes a given “Threshold”.

Find simple events in 1-D data¶

In Matlab there are two methods with which you can access single events:

• logical indexing
• the find function.

We will illustrate these methods with an example. Let us take the values

xx = 3:8;
yy = xx/10;


To check whether xx is larger than 5, you can simply use a logical comparison:

isLarge = xx > 5;


The result of the comparison, which is now stored in isLarge, is the boolean vector

[0 0 0 1 1 1]


To find the corresponding data in yy, the easiest (and often most efficient) method is to use logical indexing, which would be

yy(isLarge)


or equivalently

yy(xx > 5)


If it is absolutely necessary to find the indices of xx that are larger than 5, then the find function can be used:

largeIndices = find(isLarge);


or equivalently

largeIndices = find(xx > 5);


Mostly, though, this is unnecessary, and since find is currently far slower than logical indexing, find should only be used as a last resort.

Example 1: Find large signal values in data¶

% Create dummy data
time = 0 : 0.01 : 20;
data = sin(time);

% Set a threshold
threshold = 0.7;

% Find the (binary) indices of all data above that threshold
isLarge = data > threshold;

% At the end, plot the data
subplot(3,1,1)
plot(time, data)
ylabel('All data');

subplot(3,1,2)
plot(time(isLarge), data(isLarge), '.-')
ylabel('Large data')
xlabel('Time [s]')

subplot(3,1,3)
plot(data(is_large))
ylabel('Large data only')
xlabel('Points only')

This code produces the graphs in the figure below:


Selection of large signal values

Now, if we want to eliminate all events that occur in the first 10 seconds, we could add the following code:

 % First find the late section ...
timeThreshold = 10;
isLate = time > timeThreshold;

% ... and then combine it with the other criterion with a bit-wise AND
isLateAndLarge = isLarge & isLate;


Example 2: Find the start and end of an eye movement¶

Real measurement data will always contain noise (and usually a lot more than you want). To remove noise-effects from the analysis, it is therefore often helpful to go from continuous (or almost continuous) data to yes/no representations.

For example, for finding the onset of a signal such as in the figure below, we could break down the analysis into the following steps:

1. Find a threshold (i.e. a signal value that has to be reached in order to be part of our event).

2. For each data-point, calculate if the signal is above the threshold (“yes/no”).

Comment: At this stage, you have gone from a noisy, real-valued signal, to a discrete, binary signal, which can only be 0 or 1!

3. Now the start of our event can be found easily: it is the point where $$diff(signal)$$ is $$1$$. Similarly, the end of our event can be found by checking where $$diff(signal)$$ is $$-1$$.

Real data will probably require a few more steps. But the essence of event-finding is contained elegantly in this example of signal-analysis.

% Get eye positions, sampled with 100 Hz
load HorPos;    % This file has to exist in your current directory!
rate = 100;

% Select an interesting domain
myDomain = 9000 : 13500;

% Plot 1: raw data
subplot(3,2,1)
plot(HorPos(myDomain))
ylabel('Eye Position [deg]');
axis tight

% Plot 2: absolute velocity
orderPoly = 3;
winSize = 71;
deriv = 1;
eyeVelocity = savgol(HorPos, orderPoly, winSize, deriv, rate);
eyeAbsoluteVelocity = abs(eyeVelocity);
subplot(3,2,3)
plot(eyeAbsoluteVelocity(myDomain))
ylabel('Absolute Eye Velocity [deg]')
axis tight

% Set a default threshold, in case the threshold is not determined
% interactively
threshold = 6.3;

%To find the threshold interactively, use the following lines
% set(gcf, 'Name', 'Select the threshold:')
% selectedPoint = ginput(1);
% threshold = selectedPoint(2);    % I only want the y-value
% set(gcf, 'Name', '');
line(xlim, [threshold threshold], 'Color', 'r')

% Plot3: show where the absolute velocity exceeds the threshold
isFast = eyeAbsoluteVelocity > threshold;
subplot(3,2,5)
plot(isFast(myDomain), '-x')
axis tight
ylabel('Above threshold')

% Plot4: as Plot3, but zoomed in
closeDomain = 9900 : 10600;
subplot(3,2,2)
plot(isFast(closeDomain), '-x')
axis tight
ylabel('Above threshold');

% Plot5: Find the start and end of each movement
subplot(3,2,4)
startStop = diff(isFast);
plot(startStop(closeDomain))
ylabel('Start / Stop')

% Find the start and end times for all movements (in sec)
movementStartTimes = find(startStop ==  1)/rate
movementEndTimes   = find(startStop == -1)/rate


The code produces the graphs in the figure below:

Detection of the start and end of the eye movements

Interpolation¶

When finding the point at which a curve crosses a given threshold, we may want more accuracy than we have in the data. To do so we can find datapoints between our recorded samples through interpolation.

What we discuss here is a simplified approach of interpolation, where we interpolate between fixed data points. In digital signal processing, interpolation is typically approached as a combination of low-pass filtering, followed by decimation. (Decimation by a factor of m is if we take every $$n^{th}$$ data point from our signal.) This approach allows the specification of exact frequency responses. You should look these topics up if you e.g. downsample an audio signal from a CD, where you want good control over the way your data manipulation affects the frequency content of the original signal.

Linear interpolation¶

The simplest form is linear interpolation, where we obtain points between samples through linear interpolation between the adjacent samples. While this is a computationally quick approach, it has two disadvantages:

• It is not very accurate.
• It is discontinuous at the location of the samples.

Cubic Spline Interpolation¶

To overcome these problems, cubic spline interpolations are often used. Thereby the datapoints between samples are derived from cubic polynomials. The expression spline indicats that the polynomial coefficients satisfy two conditions:

• They are continuous at the location of the samples.
• The second derivative at the end of each polynomial is zero.

Finding simple features in 2-D data¶

Many methods that are used to find features in one-dimensional signals can also be applied in a modified form to two-dimensional signals. Many examples can be found in the field of image processing.

Find the position of bright pixels in a grayscale image¶

For example, to find the bright pixels in a $$uint8$$ grayscale image, all we need to do is apply a threshold:

rawImage = imread('pout.tif');
brightThreshold = 110;
isBright = rawImage > brightThreshold;
imshow(isBright);


Binary image, showing the areas above threshold in white, the rest in black.

Alternatively, we could use the function im2bw in the Image Processing Toolbox

isBright = im2bw(rawImage, 110/255);


To get the indices of the bright pixels, we can use find again, but now with two output variables:

[rowBright, colBright] = find(isBright);


Matlab has a very powerful Image Processing Toolbox, which is { in my opinion { together with the Signal Processing Toolbox, one of the most powerful and useful tools in Matlab. If you have to work with image or video analysis, you should de nitely check it out!

Cross-correlation¶

Sometimes, the features that you need to find are more complex. For example, in an electrocardiogram, it might be necessary to locate only the peaks of the P waves because you are specifically interested in the contraction of the cardiac atria. However, such a task may be complicated in a real signal because the P wave may be very small or even non-existent.

Schematic of the electrocardiogram, taken from Wikipedia.

A real electrocardiogram, taken from Wikipedia.

There are many ways to tackle this problem. One way would be to find the peak of the R or S wave and then search backwards for the peak of the P wave. This method is fine as long as the signal is not too noisy.

However, if there is more noise in the signal, it may be useful to create a template of the ECG signal and use cross-correlation to align the template with the signal.

Comparing signals using cross-correlation¶

Let us consider how we might assess the similarity of two signals. If the reference signal is R(t) and the compared signal is C(t), we want some similarity function $$Q(R(t), C(t))$$ such that $$S$$ has a maximum when $$C = R$$, and decreases as the difference between $$C$$ and $$R$$ increases.

It can be shown that the function $$Q(t) = C(t) \cdot R(t)$$ satisfies both of these properties. Thus, all we need to do to compare two signals is to multiply the two functions together. If we want to find out how much the reference signal needs to be shifted to match the compared signal, we calculate the similarity of two signals for different relative shifts and choose the shift where the signals have the maximum amount of similarity.

For example, let us take the two signals

$R(t) = (0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0), and \ C(t) = (0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0)$

shown in (B). If we calculate the cross-correlation function, we get (C)

Plots of reference signal (top) and compared signal (bottom). C) Cross-correlation function plotted against shift m.

We can see that the maximum value is reached when the compared signal is shifted by four time steps, which is where the two signals align.

So the cross-correlation gives us two pieces of information:

• How similar signal and reference are (through the maximum of the cross-correlation).
• Where the similarity occurs (through the location of the maximum).

In general, if $$x$$ is the compared signal and $$y$$ is the reference signal, the cross-correlation function can be obtained by

$\begin{split}\label{eq:crosscorr} {Q_{xy}}(m) = \left\{ {\begin{array}{{20}{c}} {\sum\limits_{n = 0}^{N - m - 1} {{x_{n + m}}y_n^*} {\mkern 1mu} {\mkern 1mu} {\mkern 1mu} \,\,0 \le m < N - 1}\\ {Q_{yx}^*( - m)\,\,\,\,\,\,1 - N < m < 0} \end{array}} \right.\end{split}$

where x and y are length N vectors $$(N>1)$$. If x and y are not the same length, the shorter vector is zero-padded to the length of the longer vector.

Note that the complex conjugate of $$y$$ must be taken if $$y$$ is a complex function. In practice, the cross-correlation function is not calculated directly, but rather indirectly, using the Fourier transform. If you are interested, you can take a look at the implementation of xcorr in Matlab. The concept of cross-correlation can be generalised to higher dimensions, for example to two dimensions to find a known object in an image.

To visualize cross correlations, check out the Matlab-function Corrviz.m on Moodle.

Auto-correlation¶

If the two signals being compared are the same (i.e., $$x = y$$), the result is called the auto-correlation function. Naturally, the auto-correlation function is not used to find events. Rather it can be used to detect periodicity of a noisy signal.

Examples¶

Finding a pattern in a signal¶

% Create the Reference Pattern
myPattern = zeros(1,5);
myPattern(2:4)=1;

% Create the Signal
x = linspace(0,2*pi,18);
y = sin(x);
mySignal = [myPattern, zeros(1,3), y];

% Zero-pad the Reference Pattern to the same length as the Signal
myPattern = [zeros(1,length(mySignal)-length(myPattern)) myPattern];

% Plot the signals and the cross-correlation
subplot(311), plot(mySignal, '*-');
title('mySignal');
line(xlim, [0 0], 'LineStyle', '--', 'Color', [0.3 0.3 0.3]);

subplot(312), plot(myPattern, '*-');
title('myPattern');
% line(xlim, [0 0], 'LineStyle', '--, 'Color', [0.3 0.3 0.3]');

subplot(313), plot(xcorr(mySignal, myPattern), '*-')
title('Cross-Correlation');
axis tight;
line(xlim, [0 0], 'LineStyle', '--', 'Color', [0.3 0.3 0.3]);
shg


Resulting Signals

Auto-correlation functions for simple signals¶

Examples of the auto-correlation function of some simple functions.

Features of Cross-correlation functions¶

Length of cross correlation function¶

If the signal has a length of $$n$$ points, and the pattern a length of $$m$$ points then the cross-correlation has a length of $$n+m-1$$ points. For example, the auto-correlation function of a signal with 10 points has a length of 19 points.

Maximum of cross correlation function¶

If the signal is multiplied by a factor of $$a$$, and the pattern multiplied by a factor or $$b$$, then the maximum of the cross-correlation function increases by a factor $$a*b$$. For example, if a signal increases by a factor of $$2$$, the maximum of the auto-correlation function increases by a factor of $$4$$.

Normalization of auto-correlation function¶

Sometimes we want to compare the shape of signals, regardless of their amplitude. For example, myo-electric prostheses use EMG-signals for their control. Now we would like the prosthesis to behave in a repeatable manner, independently of the current quality of the electrode contacts.

In order to evaluate the shape of a signal, regardless of its overall amplitude, we have to normalize the signal.

Thereby the normalization has to take three aspects of the signal into consideration:

Offset
To normalize the offset, we can subtract the mean of the signal, or the smallest value of the signal. (The latter option ensures that our resulting signal is always positive.)
Duration
To ensure that two signals have the same length, we can interpolate them to a fixed number of data points.
Amplitude

One way to normalize the amplitude is to scale the signal such that the smallest value is $$0$$ (see above), and the largest value is $$1$$. However, a better way is to normalize it such that the maximum value of the auto-correlation function = 1. We can achieve this with

$si{g_{normalized}} = \frac{{si{g_{raw}}}}{{\sqrt {\max (xcorr(si{g_{raw}}))}}}$

Cross-correlation and convolution¶

Cross-correlation and convolution are actually very similar: a convolution of $$f(t)$$ with $$g(t)$$ is equivalent to a cross-correlation of $$f(t)$$ with $$g(-t)$$ .

Comparison of Convolution, Cross-correlation and Autocorrelation.

Exercises¶

The code below generates a signal, containing i) steps with a length of 50, and ii) sinusoids with a length of 50. Use a cross correlation, and templates for the step and the sine-wave, and find the location where each of these signals occurs.

% Generate a signal, containing steps and sine-waves

% Base signal
rate = 100;
dt = 1/rate;
noiseAmp = 0.1;
signalLength = 50;
t = 0:dt:signalLength;

% Step and Sine
sig = randn(1, length(t))*noiseAmp;
step = ones(1,50);
sineT = linspace(0,2*pi, 50);
sine = sin(sineT);

stepOnsets = [500, 1500, 4400];
sineOnsets = [1000, 2500, 4100];

for ii = 1:3
onset = stepOnsets(ii);
sig(onset:onset+49) = sig(onset:onset+49) + step;
onset = sineOnsets(ii);
sig(onset:onset+49) = sig(onset:onset+49) + sine;
end