How can I calculate the p-value of an F-statistic in R?

The process of calculating the p-value of an F-statistic in R involves performing a statistical test to determine the significance of the F-statistic. This can be done by using the appropriate function or package in R, such as the “pf” or “anova” function. The p-value is a measure of the probability of obtaining a result at least as extreme as the observed F-statistic, given the null hypothesis is true. A lower p-value indicates a higher level of significance, providing evidence against the null hypothesis. By calculating the p-value in R, one can assess the strength of the relationship between variables and make informed decisions based on the significance of the F-statistic.

Calculate the P-Value of an F-Statistic in R


An F-test produces an F-statistic. To find the p-value associated with an F-statistic in R, you can use the following command:

pf(fstat, df1, df2, lower.tail = FALSE)

  • fstat – the value of the f-statistic
  • df1 – degrees of freedom 1
  • df2 – degrees of freedom 2
  • lower.tail – whether or not to return the probability associated with the lower tail of the F distribution. This is TRUE by default.

For example, here is how to find the p-value associated with an F-statistic of 5, with degrees of freedom 1 = 3 and degrees of freedom 2 = 14:

pf(5, 3, 14, lower.tail = FALSE)

#[1] 0.01457807

One of the most common uses of an F-test is for . In the following example, we show how to calculate the p-value of the F-statistic for a regression model.

Example: Calculating p-value from F-statistic

Suppose we have a dataset that shows the total number of hours studied, total prep exams taken, and final exam score received for 12 different students:

#create dataset
data <- data.frame(study_hours = c(3, 7, 16, 14, 12, 7, 4, 19, 4, 8, 8, 3),                   prep_exams = c(2, 6, 5, 2, 7, 4, 4, 2, 8, 4, 1, 3),                   final_score = c(76, 88, 96, 90, 98, 80, 86, 89, 68, 75, 72, 76))#view first six rows of dataset
head(data)

#  study_hours prep_exams final_score
#1           3          2          76
#2           7          6          88
#3          16          5          96
#4          14          2          90
#5          12          7          98
#6           7          4          80

Next, we can fit a linear regression model to this data using study hours and prep exams as the predictor variables and final score as the response variable. Then, we can view the output of the model:

#fit regression model
model <- lm(final_score ~ study_hours + prep_exams, data = data)

#view output of the model
summary(model)

#Call:
#lm(formula = final_score ~ study_hours + prep_exams, data = data)
#
#Residuals:
#    Min      1Q  Median      3Q     Max 
#-13.128  -5.319   2.168   3.458   9.341 
#
#Coefficients:
#            Estimate Std. Error t value Pr(>|t|)    
#(Intercept)   66.990      6.211  10.785  1.9e-06 ***
#study_hours    1.300      0.417   3.117   0.0124 *  
#prep_exams     1.117      1.025   1.090   0.3041    
#---
#Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
#Residual standard error: 7.327 on 9 degrees of freedom
#Multiple R-squared:  0.5308,	Adjusted R-squared:  0.4265 
#F-statistic: 5.091 on 2 and 9 DF,  p-value: 0.0332

On the very last line of the output we can see that the F-statistic for the overall regression model is 5.091. This F-statistic has 2 degrees of freedom for the numerator and 9 degrees of freedom for the denominator. R automatically calculates that the p-value for this F-statistic is 0.0332.

In order to calculate this equivalent p-value ourselves, we could use the following code:

pf(5.091, 2, 9, lower.tail = FALSE)

#[1] 0.0331947

Notice that we get the same answer (but with more decimals displayed) as the linear regression output above.

x