What is the process for creating and interpreting a Q-Q plot in R?

The process for creating and interpreting a Q-Q (quantile-quantile) plot in R involves the following steps:

1. Import the dataset: The first step is to import the dataset into R, either from a file or by manually entering the data.

2. Sort the data: The data needs to be sorted in ascending order in order to create a Q-Q plot.

3. Calculate quantiles: R has a built-in function, “quantile()”, which can be used to calculate the quantiles for the data.

4. Create the plot: Using the function “qqplot()”, a Q-Q plot can be created by passing the calculated quantiles as arguments.

5. Interpret the plot: The Q-Q plot displays the quantiles of the data on the y-axis and the theoretical quantiles on the x-axis. A perfect fit between the two lines would indicate that the data follows a normal distribution. Deviations from the diagonal line can indicate the presence of outliers or non-normality in the data.

6. Add a line of best fit: A line of best fit can be added to the Q-Q plot using the function “abline()”. This can help in visually determining the normality of the data.

7. Add labels and titles: It is important to add appropriate labels and titles to the plot for better interpretation.

In conclusion, a Q-Q plot in R can be created by sorting the data, calculating quantiles, and passing them as arguments to the “qqplot()” function. The plot can then be interpreted to determine the normality of the data and the presence of outliers.

Create & Interpret a Q-Q Plot in R


Q-Q plot, short for “quantile-quantile” plot, is a type of plot that we can use to determine whether or not a set of data potentially came from some theoretical distribution. 

Many statistical tests make the assumption that a set of data follows a normal distribution, and a Q-Q plot is often used to assess whether or not this assumption is met.

Although a Q-Q plot isn’t a formal statistical test, it does provide an easy way to visually check whether a dataset follows a normal distribution, and if not, how this assumption is violated and which data points potentially cause this violation.

We can create a Q-Q plot by plotting two sets of quantiles against one another. If both sets of quantiles came from the same distribution, then the points on the plot should roughly form a straight diagonal line.

Q-Q plot example in R
Example of Q-Q plot

Quantiles represent points in a dataset below which a certain portion of the data fall. For example, the 0.9 quantile represents the point below which 90% of the data fall below. The 0.5 quantile represents the point below which 50% of the data fall below, and so on.

Q-Q plots identify the quantiles in your sample data and plot them against the quantiles of a theoretical distribution. In most cases the normal distribution is used, but a Q-Q plot can actually be created for any theoretical distribution.

If the data points fall along a straight diagonal line in a Q-Q plot, then the dataset likely follows a normal distribution.

How to Create a Q-Q Plot in R

We can easily create a Q-Q plot to check if a dataset follows a normal distribution by using the built-in qqnorm() function.

For example, the following code generates a vector of 100 random values that follow a normal distribution and creates a Q-Q plot for this dataset to verify that it does indeed follow a normal distribution:

#make this example reproducible
set.seed(11)

#generate vector of 100 values that follows a normal distribution
data <- rnorm(100)

#create Q-Q plot to compare this dataset to a theoretical normal distribution
qqnorm(data)

Normal Q-Q Plot

To make it even easier to see if the data falls along a straight line, we can use the qqline() function:

#create Q-Q plot
qqnorm(data)

#add straight diagonal line to plot
qqline(data)

Q-Q plot with a diagonal line in R

We can see that the data points near the tails don’t fall exactly along the straight line, but for the most part this sample data appears to be normally distributed (as it should be since we told R to generate the data from a normal distribution).

#make this example reproducible
set.seed(11)

#generate vector of 100 values that follows a gamma distribution
data <- rgamma(100, 1)

#create Q-Q plot to compare this dataset to a theoretical normal distribution
qqnorm(data)
qqline(data)

Q-Q plot in R for a gamma distribution

We can see the clear departure from the straight line in this Q-Q plot, indicating that this dataset likely does not follow a normal distribution.

Consider another chunk of code that generates a vector of 100 random values that follow a Chi-Square distribution with 5 degrees of freedom and creates a Q-Q plot for this data to check if it follows a normal distribution:

#make this example reproducible
set.seed(11)

#generate vector of 100 values that follows a Chi-Square distribution
data <- rchisq(100, 5)

#create Q-Q plot to compare this dataset to a theoretical normal distribution
qqnorm(data)
qqline(data)

QQplot for Chi-Square distribution

Once again we can see that this dataset does not appear to follow a normal distribution, especially near the tails.

Modifying the Aesthetics of a Q-Q Plot in R

We can modify some of the aesthetics of the Q-Q plot in R including the title, axis labels, data point colors, line color, and line width.

The following code modifies the titles, axis labels, and color of the points in the plot:

#make this example reproducible
set.seed(11)

#generate vector of 100 values that follows a normal distribution
data <- rnorm(100)

#create Q-Q plot
qqnorm(data, main = 'Q-Q Plot for Normality', xlab = 'Theoretical Dist',
       ylab = 'Sample dist', col = 'steelblue')

Q-Q plot in R

Next, the following code adds a straight diagonal line to the plot with a color of red, a line width of 2 (lwd = 2, default is 1), and a dashed line (lty = 2, default is 1):

qqline(data, col = 'red', lwd = 2, lty = 2)

Q-Q plot with straight diagonal line

Technical Notes

Keep in mind that a Q-Q plot is simply a way to visually check if a dataset follows a theoretical distribution. To formally test whether or not a dataset follows a particular distribution, the following tests can be performed (assuming you’re comparing your dataset to a normal distribution):

Anderson-Darling Test
Shapiro-Wilk Test
Kolmogorov-Smirnov Test

x