FPGA HARDWARE DESIGN, SIMULATION AND SYNTHESIS FOR A INDEPENDENT COMPONENT ANALYSIS ALGORITHM USING SYSTEM-LEVEL DESIGN SOFTWARE

ALAN PAULO OLIVEIRA DA SILVA∗, ANA MARIA GUIMARÃES GUERREIRO∗, ADRIÃO DUARTE DÓRIA NETO∗

∗Federal University of Rio Grande do Norte,
UFRN - CT - PPgEEC,
Campus Universitário - Centro de Tecnologia - 59072-970,
Natal, RN, Brazil

Emails: alan@dca.ufrn.br, anamaria@dca.ufrn.br, adriao@dca.ufrn.br

Abstract— In this work we proposed the design, simulation and synthesis of a hardware that performs the Independent Component Analysis (ICA) in a reconfigurable hardware platform, more specifically a FPGA. The simulation of the hardware was done by models implemented in Simulink environment and the synthesis was possible through the Altera system-level design software DSP Builder, that contains specific FPGA blocks that can be synthesized in hardware. In order to validate the hardware, manually generated data and real electroencephalogram signals were used in the experiments.

Keywords— ICA, FPGA.

1 Introduction

Independent Component Analysis is a technique main applied in Blind Source Separation (BSS) problems that estimates source signals from its observed mixtures. This technique can be used in many signal process applications such as biomedical applications, noise extraction and telecommunications.

This work proposes the design, simulation and the synthesis of a reconfigurable hardware that performs the ICA technique by implementing the fixed-point algorithm, FastICA, in a Altera Cyclone II FPGA board.

Signals generated manually in Simulink and real electroencephalogram signals were used to perform the simulations while the hardware synthesis was possible through the system-level design software called DSP Builder, from Altera Corporation1, which contains specific Altera FPGA blocks for the use in Simulink models that can be synthesized in a Altera FPGA board. The electroencephalogram signals were extracted from experiments done by the International Institute of Neuroscience of Natal Edmond and Lily Safra - INNN-ELS2.

Some efforts have been done by researchers to work with ICA and implement in hardware attacking several problems. Some of these works are cited below. In 2001, Nordin, Hsu and Szu proposed a FPGA design of the algorithm FastICA, applied in hyper-spectral (HIS) image processing. These images are generally used in problems of remote recognition of geographical areas. They translated part of the code available in the FastICA Package3 in C language modules and made simulations (Anis Nordin and Szu, 2001).

Du and Qi proposed a new parallel implementation of ICA technique in a FPGA platform (Du and Qi, 2004) to perform dimension reduction in hyper-spectral images (HIS) with the main goal of reducing the computational time necessary to process the great amount of data present in HIS processing.

In 2005, Charoensak and Sattar proposed the generation of a Simulink model that performs the BSS problem using an ICA algorithm based in a modified Torkkola network and the synthesizing of the model on a Vixtex-E FPGA board through the Xilinx software System Generator (Charoensak and Sattar, 2005).

In 2006, Shyu and Li proposed the implementation of the FastICA algorithm embedded on a FPGA board in Hardware Description Language (HDL), including the implementation of an architecture that performs floating-point arithmetic in order to perform accurate calculations (Shyu and Li, 2006). Though the HDL design may be difficult, there is a greater control in hardware use.

In (Kuo-Kai Shyu and Lee, 2008), the work proposed in (Shyu and Li, 2006) was implemented in a Pipeline architecture, with the purpose of reaching higher sample rate and clock frequency, keeping the good results obtained in the previous work.

This work implements the FastICA in hardware using System-level software DSP Builder. Our goal is to develop the hardware design, preprocess the real data of electroencephalogram. The preprocessed data will be processed in a FPGA wich contains the synthesized hardware. This work is part of a higher system for disease

1http://www.altera.com
2http://natalneuroscience.com
2 Background on ICA Algorithm

Independent Component Analysis is a technique mainly applied in Blind Source Separation (BSS) problem and it consists in recovering source signals from its observed mixtures. The relationship between source signals $s$ and observed mixtures $x$ is given, in matrix notation, as follows:

$$x = As$$  \hspace{1cm} (1)

where $A$ is a full rank matrix which is called mixing matrix.

Under some assumptions, ICA performs the BSS problem by finding an inverse linear transformation such that maximizes the statistical independence between the observed mixtures, which would necessarily correspond to the original sources (Aapo Hyvärinen, 2001). In practice, ICA finds an unmixing matrix $W^*$ so that $y = W^*x$ and $\| y - s \| = \text{min}$.

### 2.1 Preprocessing in ICA

It is highly recommended that some preprocessing are made before applying the ICA algorithm in order to simplify the estimation of the sources. The first step in preprocessing is called **Centering** and consists in subtracting from each observed mixture its mean.

The second step is called **Whitening** and consists in linearly transform of the centered observed mixtures, so that we obtain new vectors which are white. The components of a whitened vector are uncorrelated and their variances equals to unity. This means that the covariance matrix of whitened data is equal to identity matrix. One way to perform whitening is using Eigenvalue Decomposition (EVD) method. The whitening matrix is now given by

$$V = ED^{-\frac{1}{2}} E^T$$  \hspace{1cm} (2)

where $E$ is the orthogonal matrix of eigenvector calculated from the covariance matrix $E\{xx^T\}$ and $D$ is the diagonal matrix of the eigenvalues associated with each eigenvector.

### 2.2 The FastICA Algorithm

A good and fast algorithm for ICA estimation is the FastICA ([Aapo Hyvärinen, 2001], (Aapo Hyvärinen, 2000) e (Hyvarinen, 1999)]. It’s a fixed-point iteration algorithm that, in each iteration, finds a vector $w$ that maximizes the statistical independence of $w^Tz$ (Aapo Hyvärinen, 2000). Based on the Central Limit Theorem, the algorithm maximizes the independence by measuring the nongaussianity and iteratively maximizing it until a stop criterion.

A good measure for nongaussianity is called **Negentropy** and is defined using the concept of the differential entropy ($H$) of a random variable $y$ with density $p_y(\eta)$, $H(y)$ (Aapo Hyvärinen, 2001), so that the Negentropy ($J$), of a random variable $y$, is defined as

$$J(y) = H(y_{\text{gauss}}) - H(y)$$  \hspace{1cm} (3)

where $y_{\text{gauss}}$ is a random variable with the same covariance matrix as $y$ but with gaussian distribution, and $H(y)$ is defined as

$$H(y) = \frac{1}{2} \int f(p_y(\eta))d\eta$$  \hspace{1cm} (4)

Due to it’s computational complexity and the need to know the probability density function, negentropy is usually approximated to more simple functions by using high-order cumulants. This gives a good approximation of negentropy based on expectations of any nonquadratic function $G$ and suggests the FastICA iteration as $w \leftarrow E\{yg(w^Tz)\} - E\{g'(w^Tz)\}w$ [see (Aapo Hyvärinen, 2001)], where $g(y) = y^3$. So, the FastICA main iteration is given by:

$$w^* \leftarrow E\{z(w^Tz)^3\} - 3w$$  \hspace{1cm} (5)

Considering that the data was preprocessed, the FastICA algorithm for the calculation of one source is given by

1. Choose an initial weight vector $w$.
2. $w \leftarrow E\{yg(w^Tz)\} - E\{g'(w^Tz)\}w$.
3. $w \leftarrow \frac{w}{\|w\|}$.
4. If not converged, back to step 2.

The algorithm needs to be executed once for each source. In order to prevent that the algorithm estimates the same component more than one time, the following orthogonalization is made:

$$w = w - W^T W w$$ \hspace{1cm} (6)

$$w = w / \| w \|$$ \hspace{1cm} (7)

where $W$ is the matrix containing the $w$ vectors previously calculated.

### 3 Hardware Design

In this section we present the hardware designed to perform the FastICA algorithm. Basically, the hardware was designed based on the
In order to perform an orthogonalization step in order to estimate all of the components. This process may be achieved through the Gram-Schmidt method, in expression 7. The hardware for this step follows the equation 9 and is shown in figure 2.

\[ w_{i,\text{new}} = w_i - (W_{1i}(W_{11}w_1 + W_{12}w_2 + W_{13}w_3) + W_{2i}(W_{21}w_1 + W_{22}w_2 + W_{23}w_3) + W_{3i}(W_{31}w_1 + W_{32}w_2 + W_{33}w_3)) \]  

\[ \eta_i \leq w_{i,\text{new}} + w_i \]

After the orthogonalization step, we take the norm of the new projection \( w_{i,\text{new}} \) and the stop criterion calculation is made. This calculation consists in observe if the difference between the new and previous projection is within a tolerance previously specified (\( \eta \)), ie: \( w_{i,\text{new}} - w_i \leq \eta \). As consequence of one of the ambiguities of the basic ICA model, the sources can be reversed, so, the convergence of the algorithm must also take into account the criterion \( w_{i,\text{new}} + w_i \).

When the stop criterion is reached, the return of the algorithm is calculated and the process is repeated for the remaining independent components.

### 3.1 System-level Design

Altera DSP Builder software offers a set of specific Altera FPGA blocks that can be used in MATLAB Simulink models for functional simulation of the FPGA hardware design. After the simulation, it’s possible to automatically generate a HDL (Hardware Description Language) and synthesize the design on a FPGA platform through the DSP Builder Signal Compiler block.

In figure 3 we show the DSP Builder design for the main iteration of the FastICA algorithm and how to import data from MAT files, which usually are floating-point signals, into DSP Builder blocks, casting these signals to fixed-point format. In this work, the signals are represented using 12 bits for the integer and fractional parts, for a total of 24 bits.

In figure 4 we show the DSP Builder design for the orthogonalization process.

Although we show only the DSP Builder designs for the main iteration and orthogonalization, the whole design consists in more subsystems. These subsystems were developed to control all the process. It contains a set of multiplexer, bus builder, divider, magnitude and other blocks.

### 4 Results

In this section we show the obtained results from the experiments made to validate the hardware designed. The experiments were done by generating MATLAB Simulink models using Altera DSP Builder blocks for the FPGA functional simulation and synthesis, some signals to test the hardware and electroencephalogram biological signals.
Figure 3: System-level design for the FastICA main iteration.

Figure 4: System-level design for the orthogonalization step.
Figure 5: System-level design of the process output using Simulink scope blocks.

Figure 6: Source signals from first experiment.

Figure 7: Observed mixtures from the first experiment.

Figure 5 shows how to use the Simulink scope block to display the signals generated by the FPGA platform during the process.

The first application is a computational experiment proposed in (Haykin, 2001). We use their experiment to test our developed system. The experiment proposes the estimation of the independent components using the following source signals and mixing matrix.

1. \( u_1(k) = 0.1 \sin(400k) \cos(30k) \)
2. \( u_2(k) = 0.01 \text{sign}(\sin((500k) + 9\cos(40k))) \)
3. \( u_3(k) \) = Uniform distributed noise in the interval \([-1, 1] \)

The source signals and mixtures manually generated are shown in figures 6 and 7. Figure 8 shows the independent components obtained through our developed system. The system was able to separate the signals.

In the second experiment, biological electroencephalogram signals given by the International Institute of Neuroscience of Natal Edmond and Lily Safra - IINN-ELS were used. Those signals were extracted in experiments where 22 sensors were put on the subject’s scalp to measure EEG signals, and other 4 sensors measure muscular activity form ocular movement, chin muscle activity and heart beat, in a total of 26 signals. It would be necessary a much more bigger hardware than the one we present here in this work to estimate all of the independent components. To overcome this, as in (Ricardo Vigário and Oja, 2000), we made a dimension reduction through the PCA process. In this experiment, the PCA was performed to decrease the dimension to \( n = 3 \).

In figure 9 we have the estimated independent components from the second experiment. The first and third components represent EEG signals and the second represents muscular activity.

5 Conclusions and Future Works

In this work we propose the design, simulation and synthesis of a hardware that performs the FastICA algorithm, by generating MATLAB Simulink models and using Altera DSP Builder software, which offers a set of blocks that can be synthesized on a FPGA platform, to perform the FPGA functional simulation and synthesis on a Altera Cyclone II board of the design. Through the DSP Builder software we may also use Simulink scope blocks to display the signals generated by the FPGA on the process.

While all the signals from the first experiment were successfully estimated, this is far from being a real time application. In the second experiment, though we used real electroencephalogram signals and three independent components were also successfully estimated, it is also an experiment that can’t be said to be real time.
As future work, we pursue the improving of the design so that it can be used in real time applications, such as Blind Source Separation in biological environments for the purpose of healthy care. Also, this work will integrate a bigger system implemented in the same FPGA, for the purpose of disease detection.

References


