Table of Contents
Creating a scatterplot with a regression line in R can be done by using the “plot” function, which allows you to plot two numerical variables against each other. To add a regression line to the plot, you can use the “abline” function and specify the regression equation. This will create a straight line that best fits the data points on the scatterplot. Additionally, you can customize the appearance of the scatterplot and regression line by using various arguments in the “plot” and “abline” functions, such as color, point shape, and line type. This process allows you to visually analyze the relationship between the two variables and determine if there is a linear correlation.
Create a Scatterplot with a Regression Line in R
Often when we perform simple linear regression, we’re interested in creating a to visualize the various combinations of x and y values.
Fortunately, R makes it easy to create scatterplots using the plot() function. For example:
#create some fake data data <- data.frame(x = c(1, 1, 2, 3, 4, 4, 5, 6, 7, 7, 8, 9, 10, 11, 11), y = c(13, 14, 17, 12, 23, 24, 25, 25, 24, 28, 32, 33, 35, 40, 41)) #create scatterplot of data plot(data$x, data$y)
It’s also easy to add a regression line to the scatterplot using the abline() function.
For example:
#fit a simple linear regression model model <- lm(y ~ x, data = data) #add the fitted regression line to the scatterplot abline(model)
We can also add confidence interval lines to the plot by using the predict() function:
#define range of x values newx = seq(min(data$x),max(data$x),by = 1) #find 95% confidence interval for the range of x values conf_interval <- predict(model, newdata=data.frame(x=newx), interval="confidence", level = 0.95) #create scatterplot of values with regression line plot(data$x, data$y) abline(model) #add dashed lines (lty=2) for the 95% confidence interval lines(newx, conf_interval[,2], col="blue", lty=2) lines(newx, conf_interval[,3], col="blue", lty=2)
Or we could instead add prediction interval lines to the plot by specifying the interval type within the predict() function:
#define range of x values newx = seq(min(data$x),max(data$x),by = 1) #find 95% prediction interval for the range of x values pred_interval <- predict(model, newdata=data.frame(x=newx), interval="prediction", level = 0.95) #create scatterplot of values with regression line plot(data$x, data$y) abline(model) #add dashed lines (lty=2) for the 95% confidence interval lines(newx, pred_interval[,2], col="red", lty=2) lines(newx, pred_interval[,3], col="red", lty=2)
Lastly, we can make the plot more aesthetically pleasing by adding a title, changing the axes names, and changing the shape of the individual points in the plot.
plot(data$x, data$y, main = "Scatterplot of x vs. y", #add title pch=16, #specify points to be filled in xlab='x', #change x-axis name ylab='y') #change y-axis name abline(model, col='steelblue')#specify color of regression line
Additional Resources
The following tutorials explain how to perform other common tasks in R: