Entropy-Based Independence Test

. This paper presents a new test of independence (linear and nonlinear) among distributions, based on the entropy of Shannon. The main advantages of the presented approach are the fact that this measure does not need to assume any type of theoretical probability distribution and has the ability to capture the linear and nonlinear dependencies, without requiring the specification of any kind of dependence model


Introduction
The notion of "independence", "distance" and "divergence" between distributions has been central in statistical inference and econometrics from the earliest stages. This is also evident in the work of Kullback Zellner and many others (see Maasoumi [8]). Some authors, such as Cover et al. [1] and Maasoumi [8], moved by the "elegance" and the potential power of information theory, brought a new way of interpretation and motivation for the research in statistical inference. In addition, the axiomatic systems in information theory suggest principles of decomposition that distinguish between different information functions and "entropies", and identify desirable measures, decision criteria and indices (Maasoumi [8]).
Several measures have been used as independence tests and/or dependence measures in this field. The most known measure of dependence between random variables is the Pearson correlation coefficient. How-ever, this is nothing but a normalized covariance and only accounts for linear (or linearly transformed) relationships (see e.g. Granger et al. [4], Maasoumi et al. [9]). In general, this statistic may not be helpful to capture serial dependence when there are nonlinearities in the data. In this context, it seems that a measure of global dependence is required, that is, a measure that captures both linear and nonlinear dependencies without requiring the specification of any kind of model of dependence. Urbach [11] defends a strong relationship between entropy, dependence and predictability. This relation has been studied by several authors, namely Granger and Lin [4], Maasoumi and Racine [9], Darbellay and Wuertz [3]. On the basis of the above arguments we aim to evaluate in this paper the efficiency of a new entropy-based independence test without requiring the specification of mean-variance models and theoretical distribution probabilities. Thus, in the next section we discuss the subject of information and predictability in the context of entropy, and we then illustrate our test using evidence based on empirical financial data.

Information and Predictability
A measure that takes the value 0 when there is total independence and 1 when there is total dependence is one of the most practical ways to evaluate (in)dependence between two vectors of random variables be the joint probability distribution of (X,Y) and If the two events are independent, then , and so equation (1) will be equal to zero.
Granger, Maasoumi and Racine [5] consider that a good measure of dependence should satisfy the following six "ideal" properties: , p p X Y and p X,Y be the probability density function (pdf) of X, Y and the joint probability distribution of (X,Y), respectively. Denote by ( ) H Y X the entropy of X, the joint entropy of the two arguments (X,Y) and the conditional entropy of Y given X. Then, mutual information can be given defined by the following expression: , assuming the equality iff X and Y are statistically independent. Thus, the mutual information between the vectors of random variables X and Y can be considered as a measure of dependence between these variables or, even better, the statistical correlation between X and Y.
The statistic defined in equation (2) satisfies some of the desirable properties of a good measure of dependence ([see Granger et al. [5]). In equation (2), we have , which renders difficult the comparisons between different samples. In this context Granger and Lin [4] and Darbellay [2], among others, use a standardised measure for the mutual information, the global correlation coefficient, defined by According to the properties displayed by mutual information, and because independence is one of the most valuable concepts in econometrics, we can construct a test of independence based on the following hypothesis: : , .
H p x y p x p y and we conclude that there is independence between the variables. If 1 and we reject the null hypothesis of independence. The above hypothesis can be reformulated as follows: In order to test adequately for the independence between variables (or vectors of variables) we need to calculate the corresponding critical values. In our case, we have simulated critical values for the null distribution or the percentile approach. 1 One of the problems with calculating mutual information from empirical data lies in the fact that the underlying pdf is unknown. There are, essentially, three different methods for estimating mutual information: histogram-based estimators; kernel-based estimators; parametric methods. According to Kraskov, Stogbauer and Grassberger [6] and Moddemeijer [10], the most straightforward and widespread approach to estimate mutual information consists of partitioning the supports of X and Y into bins of finite size, i.e. using histogram-based estimators. The histogram-based estimators are divided in two groups: equidistant cells (see e.g. Moddemeijer [10]) and equiprobable cells, i.e. marginal equiquantisation (see e.g. Darbellay [2]). The second approach, marginal equiquantisation, has some advantages, since it allows for a better adherence to the data and maximizes mutual information (Darbellay [2]).
The definition of mutual information is expressed in an abstract way and it is based on space partitions.
The supremum is taken over all finite partitions of The mutual information and the global correlation coefficient (λ) almost satisfy the property of a good measure of dependence here presented, but they are not measures of "distance", since they don't verify the triangle inequality. Kraskov, Stogbauer, Andrzejak, and Grassberger [7] present a modified mutual information-based measure, such that the resulting is a metric in strict sense. According to these authors, this modification presents some difficulties when we deal with continuous random variables. One solution for this problem consists of dividing mutual information by the sum or by the maximum of dimensions of the continuous variable in study (Kraskov et al. [7]).

Empirical Evidence
We now apply the concepts of mutual information and global correlation coefficient as measures of dependence in financial time series, in order to evaluate the overall performance of these measures and to extract the advantages of this approach face to the traditional linear correlation coefficient. Mutual information was estimated through marginal equiquantisation, and was applied to a number of stock market indexes.
From the data base DataStream we selected the daily closing prices of several stock market indexes:  We should note the presence of values of mutual information statistically significant for some lags in all the indexes, denoting the presence of nonlinear dependence for those lags.
The results obtained for the mutual information allow us to identify possible lags relatively to which it is necessary to proceed with a detailed analysis, in attempting to identify the type of nonlinearity. It is inferred to here, that the sources of captured nonlinearity won't be just the existence of nonlinearity in the mean and heteroscedasticity. Mutual information does not provide guidance about the type of nonlinearity, but it informs about which are the "most problematic" lags and on the level of existent nonlinear dependence through the calculation of the global correlation coefficient (λ).
To conclude, we can say that the main advantage of the application of mutual information to financial time series is the fact that this measure captures the global serial dependence (linear and nonlinear) without requiring a theoretical probability distribution or specific model of dependency. Even if this dependence is not able to refute the efficient market hypothesis, it is important to the investor to know that the rate of return is not independent and identically distributed. Granger, Maasoumi and Racine [5], and according to these authors, the critical values can be used as the base to test for time series serial independence.

Appendix A
Tables of critical values for testing serial independence through mutual information for ( )