In the analysis of data we want to extract those signal components that
are interesting to us, and eliminate artifacts that may distort our
data. For example, in Fig. [fig:Filters](top) we have a 1-dimensional
data signal. The original data contain not only the sine-wave we are
interested in, but also a low-frequency drift and high-frequency noise.
Ideally, our *Filter* should eliminate all the undesired components. In
other cases, we may want to use a *Filter*to extract the outlines
from 2-dimensional images (see Fig. below, bottom).

*Filters applied to 1-dimensional (top) and two-dimensional (bottom) signals.*

The following chapter will describe how such 1-dimensional and 2-dimensional filters work

To analyse biomedical data, you have to modify them. If the incoming data are called x, and the resulting/outgoing data y, this can be indicated schematically by the diagram in the figure below.

*Transfer Function*

The box modifying the signal, here labelled *G*, is often called the
*transfer function*.

A simple case is the amplification of a signal

\[y = g \ast x\]

*g* is the *gain* of this transfer function, and a simple amplification
is commonly indicated as shown in the figure below.

*Amplification*

For discrete valued signals, the output only depends on the instantaneous input:

\[y(n) = g \cdot x(n)\]

Since the gain of many systems covers many orders of magnitude, it is often expressed on a logarithmic scale

\[attenuation = 20 \cdot log_{10} \left( \frac{a_{out}}{a_{in}} \right) dB .\]

For example, an amplitude reduction by a factor of 2 corresponds to an
attenuation of *6 dB*, and a reduction by a factor of 10 to *20 dB*.

More options become available if the output values are obtained by weighting the last k incoming data points:

\[\begin{split}\label{eq:fir}
y(n) = \sum\limits_{i = 0}^{k-1} {{w_i}*x(n - i) = \\
= {w_0}x(n) + {w_1}x(n - 1) + ... + {w_{k-1}}x(n - k)}\end{split}\]

This can be seen as a *moving window filter*, which is moved over the
incoming data points from beginning to end, and is called *finite impulse
response (FIR)* filter.

The choice of coefficients reflects conventions in the area of
*Digital Signal Processing (DSP)*. Watch out, though, because vectors
in Matlab start with *x(1)*!

We can implement an FIR-filter in Matlab as

```
y = filter(w, 1, x)
```

where *w* is a vector containing the weights, and *x* a vector containing the
input. The *1* will become clear when we discuss the more general IIR-filters
below.

In DSP, the Equation for FIR-filters is sometimes expressed as

\[y(n) = w(k) * x(n)\]

and interpreted as *y of n equals the convolution of (w of k) and (x of n)*.
So don’t get nervous about the complicated sounding term *convolution*:
it is nothing else but the application of an FIR filter.

Another expression that is used in DSP is *impulse response*: If the input consists of

\[\begin{split}x(i) &=& 1 \; for \; i=k \\
x(i) &=& 0 \; for \; i \neq k\end{split}\]

then the output around *y(k)* equals the weight coefficients. The impulse
response of a filter is the filter’s output time-domain sequence when the input
is a single unity-valued samle (impulse) preceded and followed by zero-valued
samples.

Coming from the hardware design of filters, DSP textbooks also tend to use a different depiction of FIR filters (see Figure below). Keep in mind, though, that this is equivalent to our Figure above:

When your data needs to be analysed in real-time and online, you only
have older values of x available: \(x(m)\), with \(m \le n\). This
induces a *time-delay* in your output signal. For offline analysis, it
is therefore more convenient to use a window that is centred about your
current position (Fig. below). This reduces or eliminates the
problem of time delays in the output signal y relative to the input
signal x.

\[{y_n} = \sum\limits_{m = - k}^k {{w_m}{x_{n + m}}}\]

*Filter using a centered window (offline analysis).*

If the filter values are chosen as

\[{w_i} = \frac{1}{k},\,\,\,\,i = 1:k\]

the filter averages over the last k data points. For example, with k=3 and n=10, we get

\[\begin{split}w = \left[ {\begin{array}{\*{20}{c}}
{\frac{1}{3}}&{\frac{1}{3}}&{\frac{1}{3}}
\end{array}} \right]\end{split}\]

and

\[y(10) = {w_1} \cdot x(10) + {w_2} \cdot x(9) + {w_3} \cdot x(8) = \frac{{x(10) + x(9) + x(8)}}{3}\]

A differentiation of an incoming signal, with \(\Delta(t)=1\)

\[y(n) = \frac{\Delta x}{\Delta t} = \frac{x(n) - x(n - 1)}{\Delta t}\]

gives us the filter weights (Fig. below, top)

\[\vec w = \left[ 1 \; -1 \right] / \Delta t\]

For offline analysis, a centered filter may be preferable (Fig. above, bottom):

\[\vec w = \left[ 1 \;0 \; -1 \right] * \frac{1}{2*\Delta t}\]

To optimize the noise and the frequency response, a number of differentiation algorithms have been published, which we won’t discuss here:

- Lanczos differentiators
- Parks-McClellan differentiators
- ...

The moving average and differentiation filters induce bias in the output
since they are so simple. For example, the moving average filter
underestimates the value around the peak of a signal. To overcome this
problem, the idea of the Savitzky-Golay filter is to fit a polynomial of
order \(q\) to the surrounding \(2m + 1\) data points, and use
its value at the centre for data smoothing, its first derivative for
calculating the derivative, etc. One of the best things about the
Savitzky-Golay filter is that you only have to determine the values
\(\vec{w}\) once at the beginning of your analysis; the *same*
values can then be used to estimate the derivatives of order up to
\(q-1\).

*The principle of the Savitzky-Golay filter. A moving average filter for smoothing underestimates peaks.*

The Figure above shows the principle of the Savitzky Golay filter:

- For a given x-value …
- you take a symmetric window of your dataset, then …
- calculate the best polynomial fit to these data, and …
- if you want to smooth the data, take the center-point of the fitted curve; or if you want to differentiate the data, take the first derivative at that location.

In addition, you also see that a moving average filter (dashed line) would underestimate peak value of the curve.

So to use the Savitzky-Golay filter, you have to define the:

*input data*(e.g., “x”)*order of the polynomial fit*(e.g., “2” for quadratic fits)*size of the data window*(must be odd, e.g., “9”)*order of derivative*(e.g., “0” for smoothing, “1” for 1st derivative)*sampling rate*(in Hz, e.g., “100”)

For example, to smooth data by finding the best-fit second-order polynomial (\(q=2\)) to the data in a five point window (\(m=2\), as \(2*2 + 1 = 5\)), you would need the coefficients

\[\begin{split}\vec w = \left[ {\begin{array}{*{20}{c}}
{ - 0.086}&{0.343}&{0.486}&{0.343}&{ - 0.086}
\end{array}} \right]\end{split}\]

For general values of q and m, you can use the Matlab command *sgolay* to
find the corresponding values of \(\vec{w}\). For our example
(\(q=2\) and \(m=2\)), the command

\[[b,g] = sgolay(2,5);\]

gives you the smoothing coefficients in the middle line of b, and the derivative coefficients in the columns of g.

For simple smoothing, you can also use the Matlab command *sgolayfilt*:

\[y = sgolayfilt(x, q, 2*m + 1);\]

If you want to avoid dealing with the details of applying the weights,
use the function function *savgol* (by Thomas Haslwanter), which you can
find on Moodle.

**Reasons to use the Savitzky-Golay filter:**

- It is an efficient way of smoothing data.
- It is a convenient way to find higher derivatives.
- Smoothing and calculating the higher derivatives can be done simultaneously.

**Reasons not to use the Savitzky-Golay filter:**

- It does not have a crisp frequency response. In other words, the gain decreases only gradually as frequency increases. For example, if you know that you only need frequency components below 200 Hz, other filtering techniques are preferable.
- If you are fitting data to a
*parametric model*, it is almost always better to use raw data than pre-smoothed data, since the smoothing may cause important information to be lost. - If you know the ideal signal characteristic, the
*Wiener filter*, though more complex to apply, may produce better results.

**Further information**

- Brandt, S. (1999) Datenanalyse. Spektrum Akademischer Verlag.
- Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. (1992) Numerical Recipes in C (2nd Ed.): the Art of Scientific Computing (pp. 650-655). Cambridge University Press.
- Savitzky, A., Golay, M. J. E., Smoothing and differentiation of data by simplified least squares procedures, Anal. Chem. 36 (1964) 1627-1639.
- Madden, H. H., Comments on Savitzky-Golay convolution method for least-squares fit smoothing and differentiation of digital data, Anal. Chem. 50 (1978) 1383-1386.

We have seen that the output of a FIR filter only depends on the incoming signal:

\[y(n) = \sum\limits_{i = 0}^{k-1} {{w_i}*x(n - i)}\]

In general, though, the output of a filter may also depend on the \(m\) most recent values of the output signal. In this case,

\[y(n) + {a_1}y(n - 1) + ... + {a_{m-1}}y(n - m) = {b_0}x(n) + {b_1}x(n - 1) + ... + {b_n}x(n - k)\]

or

\[\sum\limits_{j = 0}^{m-1} {{a_j}y(n - j)} = \sum\limits_{i = 0}^{k-1} {{b_i}x(n - i)}\]

where \(a_0 = 1\). In other words, the coefficients \(a_i\) and
\(b_j\) uniquely determine this type of filter. This type of filter
is sometimes referred to as an *infinite impulse response (IIR)* filter.
The *feedback character*of these filters can be made more obvious by
re-shuffling the equation:

\[y(n) = \left[{b_0}x(n) + {b_1}x(n - 1) + ... + {b_k}x(n - k)\right] - \left[ {{a_1}y(n - 1) + ... + {a_m}y(n - m)} \right]\]

*Infinite Impulse Response Filter (IIR Filter)*

While it is more difficult to find the desired coefficients $a,b$ for IIR filters than for FIR filters, IIR filters have the advantage that they are more efficient computationally, and achieve a better frequency selection with fewer coefficients.

When the change in a signal is proportional to the magnitude of the signal, the signal changes exponentially. For example

\[y_n = \alpha * y_{n-1}\]

For \(\alpha > 1\) the signal grows exponentially, and for \(\alpha < 1\) it decreases exponentially. For example, for \(\alpha=1/2\) we get

\[{y_i} = {y_0} \cdot [1,\frac{1}{2},\frac{1}{4},\frac{1}{8},...]\]

In Matlab, IIR filters can be implemented very easily by simply specifying the coefficients \(a_i\) and \(b_i\) (for FIR-filters \(a=1\)):

```
y = filter(b,a,x)
```

Comparing the general IIR-equation with the equation for an *Exponential Decay Filter*,
we see that we can implement an *Exponential Decay Filter* as follows:

```
% Dummy inData and parameters
x = randn(1,1000);
alpha = 0.5;
% IIR filter coefficients
a = [1 -0.5];
b = [];
% Apply the filter
y = filter(b,a,x);
```

The exponential decay is a special case of the *Exponential Averaging Filter*, defined as

\[y(n) = \alpha x(n) + (1-\alpha) y(n-1)\]

You see: the exponential decay is the impulse response of an exponential averaging filter.

The nice thing about this filter is that it is a smoothing filter with a single parameter: by tuning \(\alpha\), you can determine the output’s sensitivity to input noise.

The Matlab command *butter* provides the \([b,a]\) coefficients of
an IIR filter corresponding to Butterworth filters. The Butterworth
filter is a type of signal processing filter designed to have as flat a
frequency response as possible in the passband. It is also referred to
as a *maximally flat magnitude* filter.

For example, if you have a sampling rate of 1 kHz (and thus a Nyquist frequency of 500 Hz), and you want to design an IIR lowpass filter with a 3dB cut-off frequency of 40 Hz and a filter of order 5, you get the corresponding \([b,a]\) coefficients as follows:

```
nyq = 500;
cutoff = 40;
order_filter = 5;
[b,a] = butter(order_filter, cutoff/nyq);
```

The two names *finite impulse response filter* and *infinite impulse
response filter* are derived from the differing behaviour of each type
of filter to an impulse input. To demonstrate this graphically, we
implement an example of each type of filter in Matlab.

```
% Input spike at 5 sec:
xx = zeros(1,20);
xx(5) = 1;
tt = 1:20;
data.before = xx;
data.after_fir = filter(ones(1,5)/5 , 1, xx);
data.after_iir = filter([1] , [1 -0.5], xx);
% Graph
ph = plot(tt, data.before, 'bo', ...
tt, data.after_fir, 'rx-', ...
tt, data.after_iir, 'b.:');
for ii = 1:3
set(ph(ii), 'LineWidth', 2);
end
legend('input', 'FIR-filtered', 'IIR-filtered');
```

This Matlab code produces the following graph:

*Comparison of FIR and IIR filter behaviour.*

In this graph, we can see:

- the time delay of the FIR filter
- the finite effect of an impulse on FIR-filtered data
- the instant response of the IIR filter
- the infinite effect of an impulse on IIR-filtered data

For offline analysis, the Matlab command *filtfilt* filters each signal
twice, once forward and once backward. This way time delays are
eliminated.

While linear filter are good at eliminating noise that has a Gaussian distribution, it fails when you have extreme outliers. Such spikes in the data can be caused e.g. by faulty sensors, or by loose connections in the analysis setup. For such outliers, a median filter offers a better noise suppression than linear filters. In Fig. ref{fig:median} the signal has two outliers, one at t=5, and one at t=15. The averaging filter data have been adjusted to compensate for the delay, and both averaging and median filter have a window size of 3.

```
% Create data
x = zeros(1,20);
x(10:20) = 1;
% Add some spikes, at t=5 and t=15
x(15) = 3;
x(5) = 3;
% Median filtered data
xMed = medfilt1(x, 3);
% Average filtered data
b = ones(1,3)/3;
xFilt = filter(b,1,x);
% Plot the data
plot(xFilt(2:end), 'g', 'LineWidth', 2); % shift to compensate delay
hold on
plot(x, '-.o', 'LineWidth', 2)
plot(xMed, 'r', 'LineWidth', 2)
legend('Average', 'Rawdata', 'Median')
```

*Data with extreme outliers, average filtered, and median filtered.*

The simplest image-type is grayscale images: there each pixel is given a gray-level

\[\begin{split}0 < gray\_level < g_{max}\end{split}\]

*Example, with $0 < gray_levels < 255$. I have used the Matlab command
“imtool” to obtain these images.*

Here it is up to you which maximum level is used. For many images, 8-bit
is sufficient, giving you \(2^8 = 256\) gray levels. Note that a
higher image depth only makes sense if your sensing devices can show
differences at a higher level! Also keep in mind that Matlab uses – by
default – double precision values with 64 bit. This requires 8 times as
much memory, and is ok for small images. For larger images you should
stick to the format with the lower memory requirements, and use the
appropriate functions of the Matlab *Image Processing Toolbox*.

Filtering can also be performed on two-dimensional signals, like images. Instead of a one-dimensional input and a weight vector, we now have a two-dimensional input, and a weight matrix (Fig. [fig:imageFiltering]). The output is still obtained by summing up the weighted input.

\[y(n,m) = \sum\limits_{i = 1}^k {} \sum\limits_{j = 1}^l {{w_{ij}}*x(n - 1 + i,m - 1 + j)}\]

The moving window interpretation still holds, except now, the window extends in both dimensions.

*A two-dimensional filter applied to a two-dimensional windowA two-dimensional filter applied to a two-dimensional window.*

If the image is a grayscale image, you could use the command *filter2*.
However, it is recommendable to stick with the functions of the *Matlab Image
Processing Toolbox*; in this case, *imfilter* would be the corresponding
command. Using commands from the image processing toolbox has two
advantages: where possible, it does not convert the data to a larger
format (i.e., 8-bit unsigned integers are not converted to double); and
it also works on RGB data (where *filter2* would not work). For example,
in order to blur an image, enhance horizontal lines, or enhance vertical
lines, you can use the following lines of code:

```
% Get the data
data = imread('pout.tif');
% Design the filters
Filter1 = ones(7)/7^2;
Filter2 = [1 1 1; 0 0 0; -1 -1 -1];
Filter3 = Filter2';
% Apply the filters (the "*4" is only to enhance the visibility)
subplot(221), imshow(data);
subplot(222), imshow(imfilter(data, Filter1));
subplot(223), imshow(imfilter(data, Filter)*4);
subplot(224), imshow(imfilter(data, Filter3)*4);
```

*Original image (top-left), blurred version (top-right), horizontal edges enhanced (bottom-left), and vertical edges enhanced (bottom-right)*

**Tip:** Before you create your own filter, first check if the Matlab
command *fspecial* already provides the filter that you need.

Curve smoothing is a huge topic, and depending on the requirements, a number of solutions are available. For example you may have data sampled at equal intervals, or you may have recorded them randomly. In the former case, you can use Savitzky-Golay filters or IIR-filters. In the latter case you need other approaches. Some of them are listed below.

When you have irregularly sampled data points, you cannot apply a
Savitzky-Golay filter. LOESS (which can be interpreted as *LOcal
regrESSion*, and LOWESS (*LOcally WEighted Scatterplot Smoothing*) are
two strongly related non-parametric regression methods that combine
multiple regression models in a k-nearest-neighbor-based meta-model.
*loess* is a later generalization of *lowess*.

*LOWESS and LOESS smoothing of noisy data.*

In short, you specify the percentage of the data that you want to include. For these data, a weighted linear regression is applied. The traditional weight function used for LOESS is the tri-cube weight function,

\[\begin{split}w(x) = (1 - |x|^3)^3 \operatorname{I}_{\left[\left| x\right| < 1\right]}\end{split}\]

\(\operatorname{I}_{...}\) is the *indicator function*, indicating
the range over which the function is different from 0.

The methods are just differentiated by the model used in the regression:
*lowess* uses a linear polynomial, while *loess* uses a quadratic
polynomial. An example of lowess and loess smoothing is shown in
the Figure above.

```
% Generate the data
x = 0:0.1:10;
y = sin(x)+0.2*randn(size(x));
% Eliminate some, so that we don't have equal sampling distances
curInd = (x>5) & (x<6);
x(curInd) = [];
y(curInd) = [];
% Smooth the data
sm_lowess = smooth(x,y, 0.1, 'lowess');
sm_loess = smooth(x,y, 0.1, 'loess');
% Show the results
close
plot(x,y,'*')
hold on
lh(1) = plot(x, sm_lowess, 'r');
set(lh(1), 'LineWidth', 3);
lh(2) = plot(x,sm_loess, 'g--');
set(lh(2), 'LineWidth', 3);
legend('raw data', 'lowess', 'loess');
shg
```

Another type of smoothing can be needed when you want to know the
*frequency* of events. Such information is often displayed with
histograms . However, it is possible to smooth histogram data
systematically, with a technique called *Kernel Density Estimation
(KDE)*.

For example, your friends, the *Birdlife Austria*, give you a list with
the observations of bearded vultures (“Laemmergeier”), and ask you to
represent this as a smooth curve. You could do this, by representing
every observation by a Gaussian function

*Kernel Density Estimation: The “*” on the x-axis indicate
individual events, and the thin blue lines the corresponding Gaussians. The
thick red line is the sum of the thin blue lines, and provides a “density
estimate” for the event rate.*

\[\label{eq:Gauss}
g(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2 }.\]

and then sum all the resulting functions.

```
% Define the smoothness and the Gaussian functions
sigma = 1; % bandwidth
gauss = @(x,mu) 1/(sigma*sqrt(2*pi))*exp(-0.5*((x-mu)/sigma).^2)
% Define the input and output parameters
x = -10:0.1:10;
mu = [-5, -3, 2, 3,3.5, 4, 6]
total = zeros(size(x));
% Calculate and plot the individual exponentials, and the sum total
hold on
set(gca, 'XTickMode', 'manual', ...
'XTick',[]);
for ii = 1:length(mu)
out = gauss(x, mu(ii));
plot(x, out)
total = total + out;
end
lh(1) = plot(mu, zeros(size(mu)), '*');
set(lh(1), 'MarkerSize', 10);
lh(2) = plot(x, total, 'r');
set(lh(2), 'LineWidth', 2);
xlabel('X');
ylabel('Y');
hold off
shg
```

For linear filters as seen before, it holds that they are commutative. Cite from wikipedia: One says that x commutes with y under “*” if:

\[x * y = y * x\]

In other words, it does not matter how many and in which sequence different linear filters you use. E.g. if a Savitzky-Golay filter is applied to some date, and then a second Savitzky-Golay filter for calculationg the first derivative, the result is the same if the sequence of filters is reversed. It even holds, that there would have been one filter, which does the same as the two applied.

In contrast *morphological operations* on an image are non-linear operations
and the final result depends on the sequence. If we think of any image, it
is defined by pixels with values \(x_{ij}\). Further this image is
assumed to be a black-and-white image, so we have

\[x_{ij}= 0\;or\;1, \forall i,j\]

To define a morphological operation we have to set a *structural
element SE*. As example, a 3x3-Matrix as a part of
the image.

The definition of *erosion* E says:

\[\begin{split}E(M)=\left\{ \begin{align}
& 0,\ \,if\sum\limits_{i,j=0}^{3}{{{(se)}_{ij}}}<9 \\
& 1,\ \,else \\
\end{align} \right.\ \ ,with\ {{(se)}_{ij}},M\in SE\end{split}\]

So in words, if any of the pixels in the structural element M has value 0, the erosion sets the value of M, a specific pixel in M, to zero. Otherwise E(M)=1.

And for the *dilation* D it holds, if any value in SE is 1, the dilation of
M, D(M), is set to 1.

\[\begin{split}D(M)=\left\{ \begin{align}
& 1,\; if \sum_{i,j=0}^3 (se)_{ij} >=1 \\
& 0,\; else
\end{align}
\right. \; \; ,with\; (se)_{ij},M \in SE\end{split}\]

There are two compositions of dilation and erosion. One called opening the other called closing. It holds:

\[\begin{split}\begin{array}
opening &= dilation \circ erosion \\
closing &= erosion \circ dilation
\end{array}\end{split}\]

ShowMorph.m gives the Matlab code that generates Fig. ref{fig:morphological}.

- (As in the Exercise in the previous chapter:) Create a noisy sine wave, with an amplitude of 1, a frequency of 0.3 Hz, a sampling rate of 100 Hz, a duration of 10 sec, and a Gaussian random noise with a standard deviation of 0.5.
- Using a Savitzky Golay filter, calculate the first derivative using a \(2^{nd}\) order polynomial. Try different window sizes, and superpose the results in a plot. Make sure that the axes of the plot are correctly labelled.
- Apply the
*exponential averaging filter*to a step-input. Plot the response for different values of \(\alpha\), to show why this filter is in some fields referred to as*leaky integrator*.