Computing the coefficient of determination for the 1:1 line (intercept = 0 and slope = 1).

1 Make some data:

Some random data

Some random data

2 The 1:1 line

The simple linear regression is of the form y = a + b*x + e. For a 1:1 line the intercept a = 0 and the slope b = 1, so that y = x + e, where e are the errors (residuals/deviations from the 1:1 line). In R, a 1:1 line can be simply plotted with abline(0,1).

1:1 line in red and least squares regression line in blue

1:1 line in red and least squares regression line in blue

Note that, for the 1:1 line, the errors (residuals) are simply the differences: e = y - x

error (residuals/deviations from the 1:1 line)

error (residuals/deviations from the 1:1 line)

3 The coefficient of determination

The coefficient of determination “is the proportion of the variance in the dependent variable that is predictable from the independent variable”.

Checking the Wikipedia page, we can see that “the most general definition of the coefficient of determination” is given in relation to the unexplained variance - the fraction of variance unexplained (FVU):

R2 = 1 - FVU = 1 - SSres ⁄ SStot

where FVU is the sum of squares of residuals (SSres) divided by the total sum of squares (SStot)

SSres = ∑(yi - ŷi)2 = ∑ei2

SStot = ∑(yi - ȳi)2

As already pointed above, for the 1:1 line, the errors (residuals) are the differences: e = y - x. Therefore, SSres can be written as:

SSres = ∑ei2 = ∑(yi - xi)2

The R implementation for the 1:1 line is:

## [1] -1.032755

Here we get a value outside of the usual range 0 to 1 because the 1:1 line fits the data worse than just the ȳ horizontal line, that is, the line of intercept = ȳ and slope = 0, for which R2 = 0 because ȳi = ŷi, which makes SSres = SStot.

4 Extra thoughts

4.1 Flipping axis

We can put the above in a simple R function:

## [1] -1.032755
## [1] -1.014109

Note that, if we inverse x and y we get a different coefficient of determination (R squared) for the 1:1 line. However, if fitting the usual simple linear regression (no constraints on intercept or slope), then we get the same coefficient of determination when we switch x with y.

## [1] 4.091335e-05
## [1] 4.091335e-05
## [1] TRUE

4.2 Another way - lm(y ~ 0 + offset(x))

Another way of getting to the R squared is to fit a linear model with a fixed slope as explained here:

## 
## Call:
## lm(formula = y ~ 0 + offset(1 * x))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.6403 -1.0285 -0.0701  0.8686  3.5844 
## 
## No Coefficients
## 
## Residual standard error: 1.438 on 100 degrees of freedom
## [1] 0
## [1] TRUE
## [1] -1.032755
## [1] TRUE