R provides several methods of testing the independence of the categorical variables. In my tutorial, I will show three tests such as the chi-square test of independence, the Fisher exact test, and the Cochran-Mantel–Haenszel test.

Chi-Square test is a statistical method used to determine if two categorical variables have a significant correlation between them. The two variables are selected from the same population. Furthermore, these variables are then categorised as Male/Female, True/False, etc.

The function `chisq.test()`

is used to perform this operation. I will show an example with builtin data on `vcd`

package. You can always import data into R using CSV, Excel or SPSS data file. Also, we will see how to interpret the results of the Chi-square test.

#### Hypotheses of Chi-Square test

Null hypothesis – Assumes that there is no association between the two variables.

Alternative hypothesis – Assumes that there is an association between the two variables.

Let us see an example now.

#### Example

To install *vcd package* use the command `install.packages("vcd")`

. Then use the following code to performs Chi-Square test in R for two different sets of variables and to understand when to accept and when to reject the hypothesis.

> library(vcd) > chisq.test(Arthritis$Treatment,Arthritis$Improved) Pearson's Chi-squared test data: Arthritis$Treatment and Arthritis$Improved X-squared = 13.055, df = 2, p-value = 0.001463 > chisq.test(Arthritis$Improved,Arthritis$Sex) Pearson's Chi-squared test data: Arthritis$Improved and Arthritis$Sex X-squared = 4.8407, df = 2, p-value = 0.08889 Warning message: In chisq.test(Arthritis$Improved, Arthritis$Sex) : Chi-squared approximation may be incorrect

From the result of `chisq.test(Arthritis$Treatment,Arthritis$Improved)`

, there appears to be a relationship between treatment received and level of improvement, We come to this conclusion because the p-value is less than 0.01. i.e, p < 0.01. Hence, we reject the null hypothesis and accept the alternative hypothesis.

But the result of `chisq.test(Arthritis$Improved,Arthritis$Sex)`

shows that there doesn’t appear to be a relationship between patient sex and improvement because the p-value is greater than 0.01 or 0.05 i.e, p > 0.05. Hence, we reject the alternative hypothesis and accept the null hypothesis.

The warning message is produced because one of the six cells in the table (male-some improvement) has an expected value of less than five, which may invalidate the chi-square approximation. Use the code `head(Arthritis)`

to check this.

So, this is how you can perform a Chi-Square test in R and interpret the result.