## How to Use Method of Least Squares in R

The Method of Least Squares in R is a powerful tool for fitting linear and nonlinear models. To use Method of Least Squares in R, first define the model by specifying the independent and dependent variables. Then, use the lm() function to fit the model, and the summary() function to view the summary of the fitted model. Finally, use the plot() function to visualize the model fit. The Method of Least Squares in R provides a great way to analyze data and understand relationships between variables.


The method of least squares is a method we can use to find the regression line that best fits a given dataset.

The following video provides a brief explanation of this method:

To use the method of least squares to fit a regression line in R, we can use the lm() function.

This function uses the following basic syntax:

model <- lm(response ~ predictor, data=df)

The following example shows how to use this function in R.

Example: Method of Least Squares in R

Suppose we have the following data frame in R that shows the number of hours studied and the corresponding exam score for 15 students in some class:

#create data frame
df <- data.frame(hours=c(1, 2, 4, 5, 5, 6, 6, 7, 8, 10, 11, 11, 12, 12, 14),
                 score=c(64, 66, 76, 73, 74, 81, 83, 82, 80, 88, 84, 82, 91, 93, 89))

#view first six rows of data frame
head(df)

  hours score
1     1    64
2     2    66
3     4    76
4     5    73
5     5    74
6     6    81

We can use the lm() function to use the method of least squares to fit a regression line to this data:

#use method of least squares to fit regression line
model <- lm(score ~ hours, data=df)

#view regression model summary
summary(model)

Call:
lm(formula = score ~ hours, data = df)

Residuals:
   Min     1Q Median     3Q    Max 
-5.140 -3.219 -1.193  2.816  5.772 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   65.334      2.106  31.023 1.41e-13 ***
hours          1.982      0.248   7.995 2.25e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.641 on 13 degrees of freedom
Multiple R-squared:  0.831,	Adjusted R-squared:  0.818 
F-statistic: 63.91 on 1 and 13 DF,  p-value: 2.253e-06

From the values in the Estimate column of the output, we can write the following fitted regression line:

Exam Score = 65.334 + 1.982(Hours)

Here’s how to interpret each coefficient in the model:

  • Intercept: For a student who studies 0 hours, the expected exam score is 65.334.
  • hours: For each additional hour studied, the expected exam score increases by 1.982.

We can use this equation to estimate the exam score a student will receive based on their hours studied.

Exam Score = 65.334 + 1.982(5) = 75.244

Lastly, we can create a scatter plot of the original data with the fitted regression line overlaid on the plot:

#create scatter plot of data
plot(df$hours, df$score, pch=16, col='steelblue')

#add fitted regression line to scatter plot
abline(model)

The blue circles represent the data and the black line represents the fitted regression line.

The following tutorials explain how to perform other common tasks in R:

x