2.2.11. ICA - Independent Component Analysis
The independent component analysis or blind source separation, as well as principal component analysis, is one technique that allows the reduction of the total number of variables by using linear combination of the original variables in a new set of variables. However, while PCA searches components or directions that capture the biggest possible variance of the data, the sndependent component analysis searches statistically independent components or sources.
The independent component analysis can be considered an extension of the PCA, being however more powerful and capable to identify factors - sources - that principal component analysis do not obtain.
ICA technique considers that a series of sources, in not superior number to the total number of variables, is mixed (mixing) and from them data or values are observed. These variables can again, then, be separate (demixing) and sources identified. These sources are mutuamente independent and have a not normal distribution (in the limit, a source can have normal distribution, what it does not represent a restriction in the case of financial data).
Important:
A source is said independent of another one when, observing this source, there is no information on the behavior of the other and vice-versa!
|
For the reduction of the total number of sources, the same PCA criteria is adopted, that is, sources related to very low (or relatively low) eigenvalues must be rejected. Tracing a parallel with sonorous signals, the reduction in the number of sources is equivalent to reduce the present noise in the sonorous signals - mixed! However, the importance of an independent component is not directly associated with the importance or ranking of the corresponding eigenvalue. A second crietria as amplitude of variation can also be used to select the independent components.
Let us consider the example of two mixed signals. ICA technique consists of determining the sources, independents, from these mixtures.
ICA works the possible sources in order to get not correlated results and uses rotational processes for convergence, searching to minimize the mutual information of these sources or to maximize the not-normality (kurtosis or Negentropy - more robust, for example) of the sources. The Theorem of the Central Limit is one of the lines of direction to justify the determination of sources.
Before the determination of the independent components, the analyzed data needs to pass for treatments (centralization, cancellation of the correlations etc.). These treatments do not affect the results and are used to only facilitate components determination. The treatment to cancel correlations between the data is known by whitening and its inverse is dewhitening.
ICA is an analysis that presents some ambiguities. That means that the matrices of mixing and demixing allow identifying the sources, but the intensity of these sources is not necessarily unique, that is, it can have numbers multiplying the values of the original sources. Another ambiguity is that the order of determination of the components is not determinable, that is, by changing the order of the sources and the vectors in the matrices one can get the same observed result.
The independent component analysis is very used in treatment of audio and separation of sonorous sources or coktail-party problem.
ICA financial applicability includes disclosure of mechanisms not showed in other analyses of secular series. In some cases, its use together with PCA can be complementary up to become a powerful tool in the economic-financial events analysis.
Another example of financial application for research areas, can be in the identification of common factors for one economic sector that are responsible for alterations in cash flow of the companies of this sector.
Important:
The independent component analysis can be carried through only if just one source shows a normal distribution!
|
2.2.11.1. ICA command
Access:
- Menu - Metrixus | ICA
- Toolbar Metrixus
Description:
Runs independent component analysis - ICA - of selected data range. Returns the independent components, mixing matrix and demixing matrix. Also returns indicators and tools to help the analysis of results like covariance matrix eigenvalues of centralized data.
The number of independent components presented can be limited in the option Number of components limited to:. The objective to limit the number of independent components is to carry through the reduction of the total number of sources or, like sonorous signals, to reduce the noise. The ex-post analysis of the eigenvalues generated from the matrix of covariance of the centralized data shows the total number of sources that must be extracted, which is equivalent to disregard sources associated to relatively small eigenvalues. The analysis of the importance of each independent component is not directly related to ranking of the associated eigenvalue.
Show "mixing" and "demixing" matrix option indicates if the cited matrices must or not be presented in the results.
Show independent components option indicates if source values (obtained from data sample) must be shown.
The process of iteration for determination of the sources uses the logic of point-fixture based on the FastICA algorithm - algorithm of Aapo Hyvarinen, with precision of convergence of 0.0001 (or maximum of 10,000 iterations). To maximize non-normality it is used the measure of Negentropy with Gaussian moments and function of contrast given for:
Selected data range must be a contiguous area where each column represents values for each variable. At least two columns are necessary to run ICA and an equal number of lines. Fields with text format or empty are disregarded, as well as all correspondent columns of disregarded fields. Data range must be selected before calling this command.
Important:
If some mathematical operator is applied to the original data (like log etc.), analysis ICA will represent the linear combination of the modified variables. One remembers to revert the operation after using the independent components!
|
This command generates a new file contends the results in the table form.
The generation of spread sheets without colors allows an easy impression of the data beyond representing gains of execution performance.
The result of the independent component analysis is a new spread sheet with static data, that is, without links with original data. This new spread sheet it has the following information, where n is the total of valid data and m is the number of variables or signals and mi are the extracted number of sources or independent components:
- Means: average of each variable or signal or column.

- St. Dev.: standard deviation of each 0 variable or signal or column of the sample.

Important:
For the determination of the statistical parameters of the data - as mean and standard deviation - no mathematical operator is applied (logarithmic, for example). In such a way, the presented average must be understood as the average of values, as well as standard deviation must be understood as the volatility of values!
|
Important:
All the data are considered samples and therefore all the statistical standard deviations are based on samples and not in the population.
|
- Covariances: matrix m x m contends covariances between data or signals or columns.

Important:
Covariance calculation can differ from Microsoft Excel covariance, because calculations for sample are carried through here and not for population! This matrix is the matrix of covariances of the original data (not-centralized)!
|
- Eigenvalues: characteristic value of the matrix of covariance of the centralized data. The objective of these centralized data is to allow an ex-post analysis of the total number of sources that must be extracted from data or signals. The number of eigenvalues presented equals the number of variables (or columns) (m). For the determination of the eigenvalues, the transformation of Hessenberg on the matrix of correlations and algorithm QR - from the transformation of Householder - is used for iterations. The precision of iterations QR for determination of eigenvalues is fixed in 0.00001.

Important:
The generated eigenvalues are originated from the matrix of covariances of centralized data, that is different of the informed matrix of covariances!
|
- Mixing: matrix m x mi responsible for sources mixture to generate each signal or column or variable. Informed only if set.
 Fontes = Sources Sinais = Signals
- Demixing: matrix mi x m to convert signals or variables or columns into independent sources. Informed only if set.
 Fontes = Sources Sinais = Signals
- Sources: matrix n x mi with values of independent components or sources sources equivalents for each point of the sample. All components up to limit set aer informed (maximum of 99 independent components).
 Fontes = Sources Sinais = Signals
Important:
T index represents the transposed matrix!
|
Important:
The independent components, mixing matrix and demixing matrix can be different through different simulations of applications, because one can use different numbers multiplying both sides of the matrical equation (ambiguity - it does not influence in the results). ICA command uses a deterministic algorithm for generation of the initial points of the iterations. In such a way, the same result will always be gotten even in different simulations that have the same data, facilitating repetitions of calculation and auditorships.
|
Example of ICA for stock returns:
The following example shows ICA for stock analysis. The goal is to identify sources or independent components from historical data series and to reproduce historical data from these sources, identifying the factors that direct the mechanisms of prices.
Data range represents prices of some stock traded in the main Brazilian stock exchange BOVESPA from October 1997 to July 1998. These data are close to historical prices, but may be consider hypothetical here!
- Number of independent components 5
- Show mixing and demixing
- Show fontes
- Stock prices of 10 traded companies. Range L4:U205
Due to stock return distribution, considered log-normal here, and due to stationary data condition (limited values), the log-return will be analyzed instead of prices.
Input data were not represented here but can be obtained through log operator applied to stock returns:
preço= price
ln = natural log
Results:
|
|
Total number of signals or variables. The sample includes 202 days and not 203 because we use log operator over returns!
| S1 | S2 | S3 | S4 | S5 | S6 | S7 | S8 | S9 | S10 |
| Means | -0.0017 | -0.0004 | 0.0008 | -0.0027 | -0.0053 | -0.0028 | -0.0059 | -0.0015 | -0.0012 | -0.0020 |
| St. Dev. | 0.0303 | 0.0370 | 0.0502 | 0.0381 | 0.0440 | 0.0369 | 0.0447 | 0.0305 | 0.0474 | 0.0378 |
|
| Convariances | S1 | S2 | S3 | S4 | S5 | S6 | S7 | S8 | S9 | S10 |
| S1 | 0,0009 | 0,0004 | 0,0006 | 0,0005 | 0,0006 | 0,0005 | 0,0003 | 0,0004 | 0,0006 | 0,0005 |
| S2 | 0,0004 | 0,0014 | 0,0008 | 0,0007 | 0,0008 | 0,0006 | 0,0005 | 0,0004 | 0,0008 | 0,0007 |
| S3 | 0,0006 | 0,0008 | 0,0025 | 0,0009 | 0,0013 | 0,0010 | 0,0010 | 0,0007 | 0,0013 | 0,0010 |
| S4 | 0,0005 | 0,0007 | 0,0009 | 0,0014 | 0,0009 | 0,0008 | 0,0007 | 0,0005 | 0,0010 | 0,0008 |
| S5 | 0,0006 | 0,0008 | 0,0013 | 0,0009 | 0,0019 | 0,0010 | 0,0010 | 0,0007 | 0,0013 | 0,0011 |
| S6 | 0,0005 | 0,0006 | 0,0010 | 0,0008 | 0,0010 | 0,0014 | 0,0007 | 0,0005 | 0,0011 | 0,0009 |
| S7 | 0,0003 | 0,0005 | 0,0010 | 0,0007 | 0,0010 | 0,0007 | 0,0020 | 0,0003 | 0,0009 | 0,0007 |
| S8 | 0,0004 | 0,0004 | 0,0007 | 0,0005 | 0,0007 | 0,0005 | 0,0003 | 0,0009 | 0,0006 | 0,0006 |
| S9 | 0,0006 | 0,0008 | 0,0013 | 0,0010 | 0,0013 | 0,0011 | 0,0009 | 0,0006 | 0,0022 | 0,0011 |
| S10 | 0,0005 | 0,0007 | 0,0010 | 0,0008 | 0,0011 | 0,0009 | 0,0007 | 0,0006 | 0,0011 | 0,0014 |
|
Statistical indicators for log-return of stocks. Once again, the covariance table may differ from Microsoft Excel covariance, because here is for sample!
|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| Eigenvalues | 0.0089 | 0.0015 | 0.0012 | 0.0009 | 0.0008 | 0.0008 | 0.0006 | 0.0006 | 0.0005 | 0.0004 |
|
Eigenvalue table must be used to help to identify sources that are close to noise then an independent component. Once again, a second criteria can be used to select sources, like amplitude for each component.
|
| Mixing | ICA1 | ICA2 | ICA3 | ICA4 | ICA5 |
| S1 | -0.002 | -0.008 | 0.005 | 0.000 | 0.003 |
| S2 | 0.000 | -0.007 | -0.001 | -0.003 | 0.018 |
| S3 | -0.001 | -0.016 | -0.037 | 0.001 | 0.018 |
| S4 | -0.014 | -0.011 | 0.003 | 0.000 | 0.023 |
| S5 | 0.006 | -0.019 | -0.001 | 0.008 | 0.022 |
| S6 | -0.00 | -0.010 | -0.001 | 0.002 | 0.015 |
| S7 | -0.002 | -0.011 | -0.007 | 0.038 | 0.013 |
| S8 | 0.002 | -0.004 | -0.001 | -0.001 | 0.025 |
| S9 | -0.001 | -0.042 | -0.003 | 0.002 | 0.015 |
| S10 | 0.002 | -0.011 | -0.002 | 0.005 | 0.017 |
|
| Demixing | S1 | S2 | S3 | S4 | S5 | S6 | S7 | S8 | S9 | S10 |
| ICA1 | -2.147 | 1.983 | -0.955 | -16.967 | 8.534 | 1.031 | -1.594 | 2.707 | 0.398 | 3.414 |
| ICA2 | -0.633 | 5.000 | 0.916 | 1.166 | -2.668 | 8.382 | 1.630 | 4.826 | -27.423 | 3.033 |
| ICA3 | 8.413 | -0.182 | -25.960 | 6.184 | 7.158 | 5.536 | -1.132 | 1.280 | 1.830 | 0.783 |
| ICA4 | 2.752 | -3.622 | -4.846 | -4.571 | -1.715 | -3.684 | 26.081 | -0.352 | -3.479 | 3.691 |
| ICA5 | -13.955 | 5.846 | -1.248 | 12.997 | 2.291 | -3.522 | 1.396 | 26.514 | -4.187 | 0.561 |
|
Mixing matrix and demixing matrix. Use theses matricies to obtains sources or independent components for these stocks.
|
Important:
Determining independent components is an iterative process. If upper limit of iterations is achieved with no convergence, all values in mixing and demixing matrices will be set up to 0. After running ICA, if one get many zeros, test for identity matrix form multiplying mix and demix matrices to check if convergence has failed!
|
|
The following charts represent stock prices rebuild of Petrobrás ON and Banco do Brasil ON through just 5 independent components.
PETR3 (ON) chart with just 5 independent components.
BBAS3 (ON) chart with just 5 independent components.
Analyzing the charts above, we can realize that is possible to decompose stock return in just a few statistically independent sources. Yet, theses sources contributes to price formation and, being so, studying it may drive to a better understanding of some mechanism that affects stock markets.
2.2.11.2. ICA Excel worksheet example
ICA_EXAMPLE.xls
|
|