Often it is necessary to find specific locations in a stream of data.
For example, you might want to find out when a signal passes a given
threshold (Fig. [fig:Events]). Or when you analyze hand movements, you
might want to find at which point the hand starts to move, and at which
the hand movement ends. These locations are often referred to as
*features*, or *events* if the data is a time series.

*“Events” can be the times when a signal passes a given “Threshold”.*

In Matlab there are two methods with which you can access single events:

- logical indexing
- the
*find*function.

We will illustrate these methods with an example. Let us take the values

```
xx = 3:8;
yy = xx/10;
```

To check whether xx is larger than 5, you can simply use a logical comparison:

```
isLarge = xx > 5;
```

The result of the comparison, which is now stored in *isLarge*, is the
boolean vector

```
[0 0 0 1 1 1]
```

To find the corresponding data in yy, the easiest (and often most efficient) method is to use logical indexing, which would be

```
yy(isLarge)
```

or equivalently

```
yy(xx > 5)
```

If it is absolutely necessary to find the indices of xx that are larger than 5, then the find function can be used:

```
largeIndices = find(isLarge);
```

or equivalently

```
largeIndices = find(xx > 5);
```

Mostly, though, this is unnecessary, and since *find* is currently far
slower than logical indexing, find should only be used as a last resort.

```
% Create dummy data
time = 0 : 0.01 : 20;
data = sin(time);
% Set a threshold
threshold = 0.7;
% Find the (binary) indices of all data above that threshold
isLarge = data > threshold;
% At the end, plot the data
subplot(3,1,1)
plot(time, data)
ylabel('All data');
subplot(3,1,2)
plot(time(isLarge), data(isLarge), '.-')
ylabel('Large data')
xlabel('Time [s]')
subplot(3,1,3)
plot(data(is_large))
ylabel('Large data only')
xlabel('Points only')
This code produces the graphs in the figure below:
```

*Selection of large signal values*

Now, if we want to eliminate all events that occur in the first 10 seconds, we could add the following code:

```
% First find the late section ...
timeThreshold = 10;
isLate = time > timeThreshold;
% ... and then combine it with the other criterion with a bit-wise AND
isLateAndLarge = isLarge & isLate;
```

Real measurement data will always contain noise (and usually a lot more than you want). To remove noise-effects from the analysis, it is therefore often helpful to go from continuous (or almost continuous) data to yes/no representations.

For example, for finding the onset of a signal such as in the figure below, we could break down the analysis into the following steps:

Find a threshold (i.e. a signal value that has to be reached in order to be part of our event).

For each data-point, calculate if the signal is above the threshold (“yes/no”).

**Comment:**At this stage, you have gone from a noisy, real-valued signal, to a discrete, binary signal, which can only be 0 or 1!Now the start of our event can be found easily: it is the point where \(diff(signal)\) is \(1\). Similarly, the end of our event can be found by checking where \(diff(signal)\) is \(-1\).

Real data will probably require a few more steps. But the essence of event-finding is contained elegantly in this example of signal-analysis.

```
% Get eye positions, sampled with 100 Hz
load HorPos; % This file has to exist in your current directory!
rate = 100;
% Select an interesting domain
myDomain = 9000 : 13500;
% Plot 1: raw data
subplot(3,2,1)
plot(HorPos(myDomain))
ylabel('Eye Position [deg]');
axis tight
% Plot 2: absolute velocity
orderPoly = 3;
winSize = 71;
deriv = 1;
eyeVelocity = savgol(HorPos, orderPoly, winSize, deriv, rate);
eyeAbsoluteVelocity = abs(eyeVelocity);
subplot(3,2,3)
plot(eyeAbsoluteVelocity(myDomain))
ylabel('Absolute Eye Velocity [deg]')
axis tight
% Set a default threshold, in case the threshold is not determined
% interactively
threshold = 6.3;
%To find the threshold interactively, use the following lines
% set(gcf, 'Name', 'Select the threshold:')
% selectedPoint = ginput(1);
% threshold = selectedPoint(2); % I only want the y-value
% set(gcf, 'Name', '');
line(xlim, [threshold threshold], 'Color', 'r')
% Plot3: show where the absolute velocity exceeds the threshold
isFast = eyeAbsoluteVelocity > threshold;
subplot(3,2,5)
plot(isFast(myDomain), '-x')
axis tight
ylabel('Above threshold')
% Plot4: as Plot3, but zoomed in
closeDomain = 9900 : 10600;
subplot(3,2,2)
plot(isFast(closeDomain), '-x')
axis tight
ylabel('Above threshold');
% Plot5: Find the start and end of each movement
subplot(3,2,4)
startStop = diff(isFast);
plot(startStop(closeDomain))
ylabel('Start / Stop')
% Find the start and end times for all movements (in sec)
movementStartTimes = find(startStop == 1)/rate
movementEndTimes = find(startStop == -1)/rate
```

The code produces the graphs in the figure below:

*Detection of the start and end of the eye movements*

When finding the point at which a curve crosses a given threshold, we may want more accuracy than we have in the data. To do so we can find datapoints between our recorded samples through *interpolation*.

What we discuss here is a simplified approach of interpolation, where we interpolate between fixed data points.
In digital signal processing, interpolation is typically approached as a combination of low-pass filtering,
followed by *decimation*. (Decimation by a factor of m is if we take every \(n^{th}\) data
point from our signal.) This approach allows the specification of exact frequency responses. You should look
these topics up if you e.g. downsample an audio signal from a CD, where you want good control over the way your
data manipulation affects the frequency content of the original signal.

The simplest form is *linear interpolation*, where we obtain points between samples through linear interpolation between the adjacent samples. While this is a computationally quick approach, it has two disadvantages:

- It is not very accurate.
- It is discontinuous at the location of the samples.

To overcome these problems, *cubic spline interpolations* are often used. Thereby the datapoints between samples are derived from cubic polynomials. The expression *spline* indicats that the polynomial coefficients satisfy two conditions:

- They are continuous at the location of the samples.
- The second derivative at the end of each polynomial is zero.

Many methods that are used to find features in one-dimensional signals can also be applied in a modified form to two-dimensional signals. Many examples can be found in the field of image processing.

For example, to find the bright pixels in a \(uint8\) grayscale image, all we need to do is apply a threshold:

```
rawImage = imread('pout.tif');
brightThreshold = 110;
isBright = rawImage > brightThreshold;
imshow(isBright);
```

*Binary image, showing the areas above threshold in white, the rest in
black.*

Alternatively, we could use the function *im2bw* in the Image Processing
Toolbox

```
isBright = im2bw(rawImage, 110/255);
```

To get the indices of the bright pixels, we can use *find* again, but
now with two output variables:

```
[rowBright, colBright] = find(isBright);
```

Matlab has a very powerful Image Processing Toolbox, which is { in my opinion { together with the Signal Processing Toolbox, one of the most powerful and useful tools in Matlab. If you have to work with image or video analysis, you should de nitely check it out!

Sometimes, the features that you need to find are more complex. For example, in an electrocardiogram, it might be necessary to locate only the peaks of the P waves because you are specifically interested in the contraction of the cardiac atria. However, such a task may be complicated in a real signal because the P wave may be very small or even non-existent.

*Schematic of the electrocardiogram, taken from Wikipedia.*

*A real electrocardiogram, taken from Wikipedia.*

There are many ways to tackle this problem. One way would be to find the peak of the R or S wave and then search backwards for the peak of the P wave. This method is fine as long as the signal is not too noisy.

However, if there is more noise in the signal, it may be useful to create a template of the ECG signal and use cross-correlation to align the template with the signal.

Let us consider how we might assess the similarity of two signals. If the reference signal is R(t) and the compared signal is C(t), we want some similarity function \(Q(R(t), C(t))\) such that \(S\) has a maximum when \(C = R\), and decreases as the difference between \(C\) and \(R\) increases.

It can be shown that the function \(Q(t) = C(t) \cdot R(t)\) satisfies both of these properties. Thus, all we need to do to compare two signals is to multiply the two functions together. If we want to find out how much the reference signal needs to be shifted to match the compared signal, we calculate the similarity of two signals for different relative shifts and choose the shift where the signals have the maximum amount of similarity.

For example, let us take the two signals

\[R(t) = (0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0), and \
C(t) = (0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0)\]

shown in (B). If we calculate the cross-correlation function, we get (C)

*Plots of reference signal (top) and compared signal (bottom). C) Cross-correlation function plotted against shift m.*

We can see that the maximum value is reached when the compared signal is shifted by four time steps, which is where the two signals align.

So the cross-correlation gives us two pieces of information:

- How similar signal and reference are (through the maximum of the cross-correlation).
- Where the similarity occurs (through the location of the maximum).

In general, if \(x\) is the compared signal and \(y\) is the reference signal, the cross-correlation function can be obtained by

\[\begin{split}\label{eq:crosscorr}
{Q_{xy}}(m) = \left\{ {\begin{array}{{20}{c}}
{\sum\limits_{n = 0}^{N - m - 1} {{x_{n + m}}y_n^*} {\mkern 1mu} {\mkern 1mu} {\mkern 1mu} \,\,0 \le m < N - 1}\\
{Q_{yx}^*( - m)\,\,\,\,\,\,1 - N < m < 0}
\end{array}} \right.\end{split}\]

where *x* and *y* are length *N* vectors \((N>1)\). If *x*
and *y* are not the same length, the shorter vector is zero-padded to
the length of the longer vector.

Note that the complex conjugate of \(y\) must be taken if \(y\)
is a complex function. In practice, the cross-correlation function is
not calculated directly, but rather indirectly, using the Fourier
transform. If you are interested, you can take a look at the
implementation of *xcorr* in Matlab. The concept of cross-correlation
can be generalised to higher dimensions, for example to two dimensions
to find a known object in an image.

To visualize cross correlations, check out the Matlab-function
*Corrviz.m* on Moodle.

If the two signals being compared are the same (i.e., \(x = y\)),
the result is called the *auto-correlation function*. Naturally, the
auto-correlation function is not used to find events. Rather it can be
used to detect periodicity of a noisy signal.

```
% Create the Reference Pattern
myPattern = zeros(1,5);
myPattern(2:4)=1;
% Create the Signal
x = linspace(0,2*pi,18);
y = sin(x);
mySignal = [myPattern, zeros(1,3), y];
% Zero-pad the Reference Pattern to the same length as the Signal
myPattern = [zeros(1,length(mySignal)-length(myPattern)) myPattern];
% Plot the signals and the cross-correlation
subplot(311), plot(mySignal, '*-');
title('mySignal');
line(xlim, [0 0], 'LineStyle', '--', 'Color', [0.3 0.3 0.3]);
subplot(312), plot(myPattern, '*-');
title('myPattern');
% line(xlim, [0 0], 'LineStyle', '--, 'Color', [0.3 0.3 0.3]');
subplot(313), plot(xcorr(mySignal, myPattern), '*-')
title('Cross-Correlation');
axis tight;
line(xlim, [0 0], 'LineStyle', '--', 'Color', [0.3 0.3 0.3]);
shg
```

*Resulting Signals*

*Examples of the auto-correlation function of some simple functions.*

If the signal has a length of \(n\) points, and the pattern a length of \(m\) points then the cross-correlation has a length of \(n+m-1\) points. For example, the auto-correlation function of a signal with 10 points has a length of 19 points.

If the signal is multiplied by a factor of \(a\), and the pattern multiplied by a factor or \(b\), then the maximum of the cross-correlation function increases by a factor \(a*b\). For example, if a signal increases by a factor of \(2\), the maximum of the auto-correlation function increases by a factor of \(4\).

Sometimes we want to compare the shape of signals, regardless of their amplitude. For example, myo-electric prostheses use EMG-signals for their control. Now we would like the prosthesis to behave in a repeatable manner, independently of the current quality of the electrode contacts.

In order to evaluate the shape of a signal, regardless of its overall
amplitude, we have to *normalize* the signal.

Thereby the normalization has to take three aspects of the signal into consideration:

- Offset
- To normalize the
*offset*, we can subtract the mean of the signal, or the smallest value of the signal. (The latter option ensures that our resulting signal is always positive.) - Duration
- To ensure that two signals have the same length, we can
*interpolate*them to a fixed number of data points. - Amplitude
One way to normalize the amplitude is to scale the signal such that the smallest value is \(0\) (see above), and the largest value is \(1\). However, a better way is to normalize it such that the

*maximum value of the auto-correlation function = 1*. We can achieve this with\[si{g_{normalized}} = \frac{{si{g_{raw}}}}{{\sqrt {\max (xcorr(si{g_{raw}}))}}}\]

Cross-correlation and convolution are actually very similar: a convolution of \(f(t)\) with \(g(t)\) is equivalent to a cross-correlation of \(f(t)\) with \(g(-t)\) .

*Comparison of Convolution, Cross-correlation and Autocorrelation.*

The code below generates a signal, containing i) steps with a length of 50, and ii) sinusoids with a length of 50. Use a cross correlation, and templates for the step and the sine-wave, and find the location where each of these signals occurs.

```
% Generate a signal, containing steps and sine-waves
% Base signal
rate = 100;
dt = 1/rate;
noiseAmp = 0.1;
signalLength = 50;
t = 0:dt:signalLength;
% Step and Sine
sig = randn(1, length(t))*noiseAmp;
step = ones(1,50);
sineT = linspace(0,2*pi, 50);
sine = sin(sineT);
stepOnsets = [500, 1500, 4400];
sineOnsets = [1000, 2500, 4100];
for ii = 1:3
onset = stepOnsets(ii);
sig(onset:onset+49) = sig(onset:onset+49) + step;
onset = sineOnsets(ii);
sig(onset:onset+49) = sig(onset:onset+49) + sine;
end
```