Table of Contents
The phi coefficient is a measure of the strength of association between two binary variables. It can be calculated in R using the phi() command from the ‘psych’ package. This function takes the two binary variables as inputs and returns the phi coefficient which is a value between -1 and 1, with a value closer to 1 indicating a strong association between the two variables.
A Phi Coefficient (sometimes called a mean square contingency coefficient) is a measure of the association between two binary variables.
For a given 2×2 table for two random variables x and y:
The Phi Coefficient can be calculated as:
Φ = (AD-BC) / √(A+B)(C+D)(A+C)(B+D)
Example: Calculating a Phi Coefficient in R
Suppose we want to know whether or not gender is associated with political party preference so we take a of 25 voters and survey them on their political party preference.
The following table shows the results of the survey:
We can use the following code to enter this data into a 2×2 matrix in R:
#create 2x2 table data = matrix(c(4, 8, 9, 4), nrow = 2) #view dataset data [,1] [,2] [1,] 4 9 [2,] 8 4
We can then use the function from the psych package to calculate the Phi Coefficient between the two variables:
#load psych package library(psych) #calculate Phi Coefficient phi(data) [1] -0.36
The Phi Coefficient turns out to be -0.36.
Note that the phi function rounds to 2 digits by default, but you can specify the function to round to as many digits as you’d like:
#calculate Phi Coefficient and round to 6 digits phi(data, digits = 6) [1] -0.358974
How to Interpret a Phi Coefficient
- -1 indicates a perfectly negative relationship between the two variables.
- 0 indicates no association between the two variables.
- 1 indicates a perfectly positive relationship between the two variables.
In general, the further away a Phi Coefficient is from zero, the stronger the relationship between the two variables.
In other words, the further away a Phi Coefficient is from zero, the more evidence there is for some type of systematic pattern between the two variables.