In the analysis of data we want to extract those signal components that are interesting to us, and eliminate artifacts that may distort our data. For example, in Fig. [fig:Filters](top) we have a 1-dimensional data signal. The original data contain not only the sine-wave we are interested in, but also a low-frequency drift and high-frequency noise. Ideally, our Filter should eliminate all the undesired components. In other cases, we may want to use a Filterto extract the outlines from 2-dimensional images (see Fig. below, bottom).


Filters applied to 1-dimensional (top) and two-dimensional (bottom) signals.

The following chapter will describe how such 1-dimensional and 2-dimensional filters work

Transfer Function

To analyse biomedical data, you have to modify them. If the incoming data are called x, and the resulting/outgoing data y, this can be indicated schematically by the diagram in the figure below.


Transfer Function

The box modifying the signal, here labelled G, is often called the transfer function.

A simple case is the amplification of a signal

\[y = g \ast x\]

g is the gain of this transfer function, and a simple amplification is commonly indicated as shown in the figure below.



For discrete valued signals, the output only depends on the instantaneous input:

\[y(n) = g \cdot x(n)\]

Since the gain of many systems covers many orders of magnitude, it is often expressed on a logarithmic scale

\[attenuation = 20 \cdot log_{10} \left( \frac{a_{out}}{a_{in}} \right) dB .\]

For example, an amplitude reduction by a factor of 2 corresponds to an attenuation of 6 dB, and a reduction by a factor of 10 to 20 dB.

Finite Impulse Response (FIR) filters

More options become available if the output values are obtained by weighting the last k incoming data points:

\[\begin{split}\label{eq:fir} y(n) = \sum\limits_{i = 0}^{k-1} {{w_i}*x(n - i) = \\ = {w_0}x(n) + {w_1}x(n - 1) + ... + {w_{k-1}}x(n - k)}\end{split}\]

This can be seen as a moving window filter, which is moved over the incoming data points from beginning to end, and is called finite impulse response (FIR) filter.


Matlab Notes

The choice of coefficients reflects conventions in the area of Digital Signal Processing (DSP). Watch out, though, because vectors in Matlab start with x(1)!

We can implement an FIR-filter in Matlab as

y = filter(w, 1, x)

where w is a vector containing the weights, and x a vector containing the input. The 1 will become clear when we discuss the more general IIR-filters below.

DSP Terminology

In DSP, the Equation for FIR-filters is sometimes expressed as

\[y(n) = w(k) * x(n)\]

and interpreted as y of n equals the convolution of (w of k) and (x of n). So don’t get nervous about the complicated sounding term convolution: it is nothing else but the application of an FIR filter.

Another expression that is used in DSP is impulse response: If the input consists of

\[\begin{split}x(i) &=& 1 \; for \; i=k \\ x(i) &=& 0 \; for \; i \neq k\end{split}\]

then the output around y(k) equals the weight coefficients. The impulse response of a filter is the filter’s output time-domain sequence when the input is a single unity-valued samle (impulse) preceded and followed by zero-valued samples.

Coming from the hardware design of filters, DSP textbooks also tend to use a different depiction of FIR filters (see Figure below). Keep in mind, though, that this is equivalent to our Figure above:


Figure 3: DSP representation of an FIR filter.

Offline Analysis

When your data needs to be analysed in real-time and online, you only have older values of x available: \(x(m)\), with \(m \le n\). This induces a time-delay in your output signal. For offline analysis, it is therefore more convenient to use a window that is centred about your current position (Fig. below). This reduces or eliminates the problem of time delays in the output signal y relative to the input signal x.

\[{y_n} = \sum\limits_{m = - k}^k {{w_m}{x_{n + m}}}\]

Filter using a centered window (offline analysis).

FIR Filter 1: Moving average

If the filter values are chosen as

\[{w_i} = \frac{1}{k},\,\,\,\,i = 1:k\]

the filter averages over the last k data points. For example, with k=3 and n=10, we get

\[\begin{split}w = \left[ {\begin{array}{\*{20}{c}} {\frac{1}{3}}&{\frac{1}{3}}&{\frac{1}{3}} \end{array}} \right]\end{split}\]


\[y(10) = {w_1} \cdot x(10) + {w_2} \cdot x(9) + {w_3} \cdot x(8) = \frac{{x(10) + x(9) + x(8)}}{3}\]

FIR Filter 2: Differentiation

First-difference Differentiation

A differentiation of an incoming signal, with \(\Delta(t)=1\)

\[y(n) = \frac{\Delta x}{\Delta t} = \frac{x(n) - x(n - 1)}{\Delta t}\]

gives us the filter weights (Fig. below, top)

\[\vec w = \left[ 1 \; -1 \right] / \Delta t\]

Figure 4: Top) First-difference differentiator. Bottom) Central-difference differentiator.

Central-difference Differentiator

For offline analysis, a centered filter may be preferable (Fig. above, bottom):

\[\vec w = \left[ 1 \;0 \; -1 \right] * \frac{1}{2*\Delta t}\]

Specialized Differentiators

To optimize the noise and the frequency response, a number of differentiation algorithms have been published, which we won’t discuss here:

  • Lanczos differentiators
  • Parks-McClellan differentiators
  • ...

FIR Filter 3: Savitzky-Golay Filter

The moving average and differentiation filters induce bias in the output since they are so simple. For example, the moving average filter underestimates the value around the peak of a signal. To overcome this problem, the idea of the Savitzky-Golay filter is to fit a polynomial of order \(q\) to the surrounding \(2m + 1\) data points, and use its value at the centre for data smoothing, its first derivative for calculating the derivative, etc. One of the best things about the Savitzky-Golay filter is that you only have to determine the values \(\vec{w}\) once at the beginning of your analysis; the same values can then be used to estimate the derivatives of order up to \(q-1\).


The principle of the Savitzky-Golay filter. A moving average filter for smoothing underestimates peaks.

The Figure above shows the principle of the Savitzky Golay filter:

  1. For a given x-value …
  2. you take a symmetric window of your dataset, then …
  3. calculate the best polynomial fit to these data, and …
  4. if you want to smooth the data, take the center-point of the fitted curve; or if you want to differentiate the data, take the first derivative at that location.

In addition, you also see that a moving average filter (dashed line) would underestimate peak value of the curve.

So to use the Savitzky-Golay filter, you have to define the:

  • input data (e.g., “x”)
  • order of the polynomial fit (e.g., “2” for quadratic fits)
  • size of the data window (must be odd, e.g., “9”)
  • order of derivative (e.g., “0” for smoothing, “1” for 1st derivative)
  • sampling rate(in Hz, e.g., “100”)

For example, to smooth data by finding the best-fit second-order polynomial (\(q=2\)) to the data in a five point window (\(m=2\), as \(2*2 + 1 = 5\)), you would need the coefficients

\[\begin{split}\vec w = \left[ {\begin{array}{*{20}{c}} { - 0.086}&{0.343}&{0.486}&{0.343}&{ - 0.086} \end{array}} \right]\end{split}\]

For general values of q and m, you can use the Matlab command sgolay to find the corresponding values of \(\vec{w}\). For our example (\(q=2\) and \(m=2\)), the command

\[[b,g] = sgolay(2,5);\]

gives you the smoothing coefficients in the middle line of b, and the derivative coefficients in the columns of g.

For simple smoothing, you can also use the Matlab command sgolayfilt:

\[y = sgolayfilt(x, q, 2*m + 1);\]

If you want to avoid dealing with the details of applying the weights, use the function function savgol (by Thomas Haslwanter), which you can find on Moodle.

Reasons to use the Savitzky-Golay filter:

  • It is an efficient way of smoothing data.
  • It is a convenient way to find higher derivatives.
  • Smoothing and calculating the higher derivatives can be done simultaneously.

Reasons not to use the Savitzky-Golay filter:

  • It does not have a crisp frequency response. In other words, the gain decreases only gradually as frequency increases. For example, if you know that you only need frequency components below 200 Hz, other filtering techniques are preferable.
  • If you are fitting data to a parametric model, it is almost always better to use raw data than pre-smoothed data, since the smoothing may cause important information to be lost.
  • If you know the ideal signal characteristic, the Wiener filter, though more complex to apply, may produce better results.

Further information

  • Brandt, S. (1999) Datenanalyse. Spektrum Akademischer Verlag.
  • Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. (1992) Numerical Recipes in C (2nd Ed.): the Art of Scientific Computing (pp. 650-655). Cambridge University Press.
  • Savitzky, A., Golay, M. J. E., Smoothing and differentiation of data by simplified least squares procedures, Anal. Chem. 36 (1964) 1627-1639.
  • Madden, H. H., Comments on Savitzky-Golay convolution method for least-squares fit smoothing and differentiation of digital data, Anal. Chem. 50 (1978) 1383-1386.

Infinite Impulse Response (IIR) filters

We have seen that the output of a FIR filter only depends on the incoming signal:

\[y(n) = \sum\limits_{i = 0}^{k-1} {{w_i}*x(n - i)}\]

In general, though, the output of a filter may also depend on the \(m\) most recent values of the output signal. In this case,

\[y(n) + {a_1}y(n - 1) + ... + {a_{m-1}}y(n - m) = {b_0}x(n) + {b_1}x(n - 1) + ... + {b_n}x(n - k)\]


\[\sum\limits_{j = 0}^{m-1} {{a_j}y(n - j)} = \sum\limits_{i = 0}^{k-1} {{b_i}x(n - i)}\]

where \(a_0 = 1\). In other words, the coefficients \(a_i\) and \(b_j\) uniquely determine this type of filter. This type of filter is sometimes referred to as an infinite impulse response (IIR) filter. The feedback characterof these filters can be made more obvious by re-shuffling the equation:

\[y(n) = \left[{b_0}x(n) + {b_1}x(n - 1) + ... + {b_k}x(n - k)\right] - \left[ {{a_1}y(n - 1) + ... + {a_m}y(n - m)} \right]\]

Infinite Impulse Response Filter (IIR Filter)

While it is more difficult to find the desired coefficients $a,b$ for IIR filters than for FIR filters, IIR filters have the advantage that they are more efficient computationally, and achieve a better frequency selection with fewer coefficients.

IIR Filter 1: Exponential Decay

When the change in a signal is proportional to the magnitude of the signal, the signal changes exponentially. For example

\[y_n = \alpha * y_{n-1}\]

For \(\alpha > 1\) the signal grows exponentially, and for \(\alpha < 1\) it decreases exponentially. For example, for \(\alpha=1/2\) we get

\[{y_i} = {y_0} \cdot [1,\frac{1}{2},\frac{1}{4},\frac{1}{8},...]\]

Matlab Implementation

In Matlab, IIR filters can be implemented very easily by simply specifying the coefficients \(a_i\) and \(b_i\) (for FIR-filters \(a=1\)):

y = filter(b,a,x)

Comparing the general IIR-equation with the equation for an Exponential Decay Filter, we see that we can implement an Exponential Decay Filter as follows:

% Dummy inData and parameters
x = randn(1,1000);
alpha = 0.5;

% IIR filter coefficients
a = [1 -0.5];
b = [];

% Apply the filter
y = filter(b,a,x);

IIR Filter 2: Leaky Integrator

The exponential decay is a special case of the Exponential Averaging Filter, defined as

\[y(n) = \alpha x(n) + (1-\alpha) y(n-1)\]

You see: the exponential decay is the impulse response of an exponential averaging filter.

The nice thing about this filter is that it is a smoothing filter with a single parameter: by tuning \(\alpha\), you can determine the output’s sensitivity to input noise.

IIR Filter 3: Butterworth lowpass filter

The Matlab command butter provides the \([b,a]\) coefficients of an IIR filter corresponding to Butterworth filters. The Butterworth filter is a type of signal processing filter designed to have as flat a frequency response as possible in the passband. It is also referred to as a maximally flat magnitude filter.

For example, if you have a sampling rate of 1 kHz (and thus a Nyquist frequency of 500 Hz), and you want to design an IIR lowpass filter with a 3dB cut-off frequency of 40 Hz and a filter of order 5, you get the corresponding \([b,a]\) coefficients as follows:

nyq = 500;
cutoff = 40;
order_filter = 5;
[b,a] = butter(order_filter, cutoff/nyq);

Difference between FIR and IIR filters

The two names finite impulse response filter and infinite impulse response filter are derived from the differing behaviour of each type of filter to an impulse input. To demonstrate this graphically, we implement an example of each type of filter in Matlab.

% Input spike at 5 sec:
xx = zeros(1,20);
xx(5) = 1;
tt = 1:20;

data.before    = xx;
data.after_fir = filter(ones(1,5)/5 , 1, xx);
data.after_iir = filter([1] , [1 -0.5], xx);

% Graph
ph = plot(tt, data.before,    'bo', ...
     tt, data.after_fir, 'rx-', ...
     tt, data.after_iir, 'b.:');
for ii = 1:3
    set(ph(ii), 'LineWidth', 2);
legend('input', 'FIR-filtered', 'IIR-filtered');

This Matlab code produces the following graph:


Comparison of FIR and IIR filter behaviour.

In this graph, we can see:

  • the time delay of the FIR filter
  • the finite effect of an impulse on FIR-filtered data
  • the instant response of the IIR filter
  • the infinite effect of an impulse on IIR-filtered data

For offline analysis, the Matlab command filtfilt filters each signal twice, once forward and once backward. This way time delays are eliminated.

Median Filter

While linear filter are good at eliminating noise that has a Gaussian distribution, it fails when you have extreme outliers. Such spikes in the data can be caused e.g. by faulty sensors, or by loose connections in the analysis setup. For such outliers, a median filter offers a better noise suppression than linear filters. In Fig. ref{fig:median} the signal has two outliers, one at t=5, and one at t=15. The averaging filter data have been adjusted to compensate for the delay, and both averaging and median filter have a window size of 3.

% Create data
x = zeros(1,20);
x(10:20) = 1;

% Add some spikes, at t=5 and t=15
x(15) = 3;
x(5) = 3;

% Median filtered data
xMed = medfilt1(x, 3);

% Average filtered data
b = ones(1,3)/3;
xFilt = filter(b,1,x);

% Plot the data
plot(xFilt(2:end), 'g', 'LineWidth', 2); % shift to compensate delay
hold on
plot(x, '-.o', 'LineWidth', 2)
plot(xMed, 'r', 'LineWidth', 2)
legend('Average', 'Rawdata', 'Median')

Data with extreme outliers, average filtered, and median filtered.

Filtering images (2-D filtering)

Representation of Grayscale Images

The simplest image-type is grayscale images: there each pixel is given a gray-level

\[\begin{split}0 < gray\_level < g_{max}\end{split}\]

Example, with $0 < gray_levels < 255$. I have used the Matlab command “imtool” to obtain these images.

Here it is up to you which maximum level is used. For many images, 8-bit is sufficient, giving you \(2^8 = 256\) gray levels. Note that a higher image depth only makes sense if your sensing devices can show differences at a higher level! Also keep in mind that Matlab uses – by default – double precision values with 64 bit. This requires 8 times as much memory, and is ok for small images. For larger images you should stick to the format with the lower memory requirements, and use the appropriate functions of the Matlab Image Processing Toolbox.


Filtering can also be performed on two-dimensional signals, like images. Instead of a one-dimensional input and a weight vector, we now have a two-dimensional input, and a weight matrix (Fig. [fig:imageFiltering]). The output is still obtained by summing up the weighted input.

\[y(n,m) = \sum\limits_{i = 1}^k {} \sum\limits_{j = 1}^l {{w_{ij}}*x(n - 1 + i,m - 1 + j)}\]

The moving window interpretation still holds, except now, the window extends in both dimensions.


A two-dimensional filter applied to a two-dimensional windowA two-dimensional filter applied to a two-dimensional window.


Figure 5: Definition of “Structural Element”. Note that a structural element does not have to be a square; it can also be a circle, a rectangle, etc.

If the image is a grayscale image, you could use the command filter2. However, it is recommendable to stick with the functions of the Matlab Image Processing Toolbox; in this case, imfilter would be the corresponding command. Using commands from the image processing toolbox has two advantages: where possible, it does not convert the data to a larger format (i.e., 8-bit unsigned integers are not converted to double); and it also works on RGB data (where filter2 would not work). For example, in order to blur an image, enhance horizontal lines, or enhance vertical lines, you can use the following lines of code:

% Get the data
data = imread('pout.tif');

% Design the filters
Filter1 = ones(7)/7^2;
Filter2 = [1 1 1; 0 0 0; -1 -1 -1];
Filter3 = Filter2';

% Apply the filters (the "*4" is only to enhance the visibility)
subplot(221), imshow(data);
subplot(222), imshow(imfilter(data, Filter1));
subplot(223), imshow(imfilter(data, Filter)*4);
subplot(224), imshow(imfilter(data, Filter3)*4);

Original image (top-left), blurred version (top-right), horizontal edges enhanced (bottom-left), and vertical edges enhanced (bottom-right)

Tip: Before you create your own filter, first check if the Matlab command fspecial already provides the filter that you need.

The Next Level

Curve smoothing is a huge topic, and depending on the requirements, a number of solutions are available. For example you may have data sampled at equal intervals, or you may have recorded them randomly. In the former case, you can use Savitzky-Golay filters or IIR-filters. In the latter case you need other approaches. Some of them are listed below.

Lowess and Loess Smoothing

When you have irregularly sampled data points, you cannot apply a Savitzky-Golay filter. LOESS (which can be interpreted as LOcal regrESSion, and LOWESS (LOcally WEighted Scatterplot Smoothing) are two strongly related non-parametric regression methods that combine multiple regression models in a k-nearest-neighbor-based meta-model. loess is a later generalization of lowess.


LOWESS and LOESS smoothing of noisy data.

In short, you specify the percentage of the data that you want to include. For these data, a weighted linear regression is applied. The traditional weight function used for LOESS is the tri-cube weight function,

\[\begin{split}w(x) = (1 - |x|^3)^3 \operatorname{I}_{\left[\left| x\right| < 1\right]}\end{split}\]

\(\operatorname{I}_{...}\) is the indicator function, indicating the range over which the function is different from 0.

The methods are just differentiated by the model used in the regression: lowess uses a linear polynomial, while loess uses a quadratic polynomial. An example of lowess and loess smoothing is shown in the Figure above.

% Generate the data
x = 0:0.1:10;
y = sin(x)+0.2*randn(size(x));

% Eliminate some, so that we don't have equal sampling distances
curInd = (x>5) & (x<6);
x(curInd) = [];
y(curInd) = [];

% Smooth the data
sm_lowess = smooth(x,y, 0.1, 'lowess');
sm_loess = smooth(x,y, 0.1, 'loess');

% Show the results
hold on
lh(1) = plot(x, sm_lowess, 'r');
set(lh(1), 'LineWidth', 3);
lh(2) = plot(x,sm_loess, 'g--');
set(lh(2), 'LineWidth', 3);
legend('raw data', 'lowess', 'loess');

Kernel Density Estimation

Another type of smoothing can be needed when you want to know the frequency of events. Such information is often displayed with histograms . However, it is possible to smooth histogram data systematically, with a technique called Kernel Density Estimation (KDE).

For example, your friends, the Birdlife Austria, give you a list with the observations of bearded vultures (“Laemmergeier”), and ask you to represent this as a smooth curve. You could do this, by representing every observation by a Gaussian function


Kernel Density Estimation: The “*” on the x-axis indicate individual events, and the thin blue lines the corresponding Gaussians. The thick red line is the sum of the thin blue lines, and provides a “density estimate” for the event rate.

\[\label{eq:Gauss} g(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2 }.\]

and then sum all the resulting functions.

% Define the smoothness and the Gaussian functions
sigma = 1;  % bandwidth
gauss = @(x,mu) 1/(sigma*sqrt(2*pi))*exp(-0.5*((x-mu)/sigma).^2)

% Define the input and output parameters
x = -10:0.1:10;
mu = [-5, -3, 2, 3,3.5, 4, 6]
total = zeros(size(x));

% Calculate and plot the individual exponentials, and the sum total
hold on
set(gca, 'XTickMode', 'manual', ...
for ii = 1:length(mu)
    out = gauss(x, mu(ii));
    plot(x, out)
    total = total + out;
lh(1) = plot(mu, zeros(size(mu)), '*');
set(lh(1), 'MarkerSize', 10);
lh(2) = plot(x, total, 'r');
set(lh(2), 'LineWidth', 2);

hold off

Morphological Operations in Image Processing

Erosion and Dilation of Images

For linear filters as seen before, it holds that they are commutative. Cite from wikipedia: One says that x commutes with y under “*” if:

\[x * y = y * x\]

In other words, it does not matter how many and in which sequence different linear filters you use. E.g. if a Savitzky-Golay filter is applied to some date, and then a second Savitzky-Golay filter for calculationg the first derivative, the result is the same if the sequence of filters is reversed. It even holds, that there would have been one filter, which does the same as the two applied.

In contrast morphological operations on an image are non-linear operations and the final result depends on the sequence. If we think of any image, it is defined by pixels with values \(x_{ij}\). Further this image is assumed to be a black-and-white image, so we have

\[x_{ij}= 0\;or\;1, \forall i,j\]

To define a morphological operation we have to set a structural element SE. As example, a 3x3-Matrix as a part of the image.

The definition of erosion E says:

\[\begin{split}E(M)=\left\{ \begin{align} & 0,\ \,if\sum\limits_{i,j=0}^{3}{{{(se)}_{ij}}}<9 \\ & 1,\ \,else \\ \end{align} \right.\ \ ,with\ {{(se)}_{ij}},M\in SE\end{split}\]

So in words, if any of the pixels in the structural element M has value 0, the erosion sets the value of M, a specific pixel in M, to zero. Otherwise E(M)=1.

And for the dilation D it holds, if any value in SE is 1, the dilation of M, D(M), is set to 1.

\[\begin{split}D(M)=\left\{ \begin{align} & 1,\; if \sum_{i,j=0}^3 (se)_{ij} >=1 \\ & 0,\; else \end{align} \right. \; \; ,with\; (se)_{ij},M \in SE\end{split}\]

Opening and Closing of Images


Figure 6: Sample image for demonstrating the effect of “opening” and “closing” of binary images.


Figure 7: Compositions of Dilation and Erosion: Opening and Closing of Images.

There are two compositions of dilation and erosion. One called opening the other called closing. It holds:

\[\begin{split}\begin{array} opening &= dilation \circ erosion \\ closing &= erosion \circ dilation \end{array}\end{split}\]

matlab ShowMorph.m gives the Matlab code that generates Fig. ref{fig:morphological}.


  • (As in the Exercise in the previous chapter:) Create a noisy sine wave, with an amplitude of 1, a frequency of 0.3 Hz, a sampling rate of 100 Hz, a duration of 10 sec, and a Gaussian random noise with a standard deviation of 0.5.
  • Using a Savitzky Golay filter, calculate the first derivative using a \(2^{nd}\) order polynomial. Try different window sizes, and superpose the results in a plot. Make sure that the axes of the plot are correctly labelled.
  • Apply the exponential averaging filter to a step-input. Plot the response for different values of \(\alpha\), to show why this filter is in some fields referred to as leaky integrator.