An independence test is used to check the absence of a statistical link between two variables X and Y . The two are said to be independent when there is no statistical link between them, in other words, the knowledge of X does not in any way allow an opinion on Y .
We can check the independence between two variables by a test χ2 (chi-2) of independence or χ2 of Pearson.
Carrying out a χ2 independence test
Formulation of a hypothesis
A null hypothesis (H0) is formulated, the latter and the variables X and Y are independent from each other.
Calculate a distance
The hypothesis formulated implies that the variables X and Y are not related to each other, under this condition, the expectation of a class can be defined as follows:
Knowing that a class is defined by a couple of values of the variables X and Y.
E is the expectation, O is the observed value, I is the number of values of the variable X, J is the number of values of the variable Y, and N is the number d 'samples.
A distance measurement χ2 is made between the value expected above and the value observed.
Analysis of results
The distance χ2 is compared according to the degree of freedom to a reference table . It is generally considered that a hypothesis is validated when the p-value associated with the distance χ2 is less than 0.05.
If the value is below this threshold then the hypothesis is validated, otherwise, the hypothesis is invalidated.
If the independence hypothesis is confirmed, it is not possible to find a link between the two variables.
If the hypothesis is invalidated, we can deduce a variable thanks to the values of the second variable.
Condition for performing the χ2 independence test
The independence χ2 test can only be performed when the number of samples is greater than 30.
The Cochran criterion must also be respected, it states that:
- All classes have a non-zero expectation
- 80% of the classes have an expectation greater than 5