1 Make some data:

2 The 1:1 line

3 The coefficient of determination

4 Extra thoughts

4.1 Flipping axis
4.2 Another way - lm(y ~ 0 + offset(x))

Computing the coefficient of determination for the 1:1 line (intercept = 0 and slope = 1).

1 Make some data:

set.seed(2018)
y <- rnorm(n = 100, mean = 0, sd = 1)
x <- rnorm(n = 100, mean = 0, sd = 1)
plot(y ~ x)
# Add least squares regression line
abline(lm(y ~ x), col = "blue")

Some random data

2 The 1:1 line

The simple linear regression is of the form y = a + b*x + e. For a 1:1 line the intercept a = 0 and the slope b = 1, so that y = x + e, where e are the errors (residuals/deviations from the 1:1 line). In R, a 1:1 line can be simply plotted with abline(0,1).

plot(x, y, xlim = c(-2, 3), ylim = c(-2, 3))
# add grid
abline(h = seq(-2, 3, 1),
       v = seq(-2, 3, 1),
       lty = "dashed",
       col = "gray70")
abline(lm(y ~ x), col = "blue") # least squares regression line
abline(0, 1, col = "red", lwd = 2) # 1:1 line

1:1 line in red and least squares regression line in blue

Note that, for the 1:1 line, the errors (residuals) are simply the differences: e = y - x

plot(x, y, xlim = c(-2, 3), ylim = c(-2, 3))
abline(0, 1, col = "red") # 1:1 line
segments(x0 = x, 
         y0 = y,
         x1 = x,
         y1 = x, # y1 = y - e = y - y + x = x
         col = "red",
         lty = "dashed")

error (residuals/deviations from the 1:1 line)

3 The coefficient of determination

The coefficient of determination “is the proportion of the variance in the dependent variable that is predictable from the independent variable”.

Checking the Wikipedia page, we can see that “the most general definition of the coefficient of determination” is given in relation to the unexplained variance - the fraction of variance unexplained (FVU):

R² = 1 - FVU = 1 - SS_res ⁄ SS_tot

where FVU is the sum of squares of residuals (SS_res) divided by the total sum of squares (SS_tot)

SS_res = ∑(y_i - ŷ_i)² = ∑e_i²

SS_tot = ∑(y_i - ȳ_i)²

As already pointed above, for the 1:1 line, the errors (residuals) are the differences: e = y - x. Therefore, SS_res can be written as:

SS_res = ∑e_i² = ∑(y_i - x_i)²

The R implementation for the 1:1 line is:

SS_res <- sum((y - x) ^ 2)
SS_tot <- sum((y - mean(y)) ^ 2)
1 - SS_res / SS_tot

## [1] -1.032755

Here we get a value outside of the usual range 0 to 1 because the 1:1 line fits the data worse than just the ȳ horizontal line, that is, the line of intercept = ȳ and slope = 0, for which R² = 0 because ȳ_i = ŷ_i, which makes SS_res = SS_tot.

4 Extra thoughts

4.1 Flipping axis

We can put the above in a simple R function:

r2_1to1 <- function(xx, yy) {
  SS_res <- sum((yy - xx) ^ 2)
  SS_tot <- sum((yy - mean(yy)) ^ 2)
  r2 <- 1 - SS_res / SS_tot
  return(r2)
}
r2_1to1(x, y)

## [1] -1.032755

r2_1to1(y, x)

## [1] -1.014109

Note that, if we inverse x and y we get a different coefficient of determination (R squared) for the 1:1 line. However, if fitting the usual simple linear regression (no constraints on intercept or slope), then we get the same coefficient of determination when we switch x with y.

# R squared doesn't change in case of least square line when switching x with y
summary(lm(y ~ x))$r.squared

## [1] 4.091335e-05

summary(lm(x ~ y))$r.squared

## [1] 4.091335e-05

all.equal(summary(lm(y ~ x))$r.squared, 
          summary(lm(x ~ y))$r.squared)

## [1] TRUE

4.2 Another way - `lm(y ~ 0 + offset(x))`

Another way of getting to the R squared is to fit a linear model with a fixed slope as explained here:

lm_1to1 <- lm(y ~ 0 + offset(1*x)) 
# 1* - indicates that the slope is constrained to 1; can simply be offset(x)
# 0  - indicates that there is no intercept (-1 has the same effect)
# Note, if you need a certain value for the intercept, check https://stackoverflow.com/a/7333292/5193830
summary(lm_1to1) # is ok to see "No Coefficients" since they are constrained to 0 and 1

## 
## Call:
## lm(formula = y ~ 0 + offset(1 * x))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.6403 -1.0285 -0.0701  0.8686  3.5844 
## 
## No Coefficients
## 
## Residual standard error: 1.438 on 100 degrees of freedom

summary(lm_1to1)$r.squared # this is not valid!

## [1] 0

# Another function to compute R squared
r2 <- function(yy, model) {
  SS_res <- sum(model$residuals ^ 2) # residuals are taken from the model
  SS_tot <- sum((yy - mean(yy)) ^ 2)
  r2 <- 1 - SS_res / SS_tot
  return(r2)
}

# This tests that function r2() is working properly
all.equal(summary(lm(y ~ x))$r.squared,
          r2(y, lm(y ~ x)))

## [1] TRUE

# This is R squared for the 1:1 fit. 
# Result identical with what we did previously using r2_1to1() function.
r2(y, lm_1to1)

## [1] -1.032755

all.equal(r2(y, lm_1to1),
          r2_1to1(x, y))

## [1] TRUE

Coefficient of determination for the ‘one to one’ line

by: Valentin Stefan

last update: 2018-05-27

1 Make some data:

2 The 1:1 line

3 The coefficient of determination

4 Extra thoughts

4.1 Flipping axis

4.2 Another way - `lm(y ~ 0 + offset(x))`

Coefficient of determination for the ‘one to one’ line

by: Valentin Stefan

last update: 2018-05-27

1 Make some data:

2 The 1:1 line

3 The coefficient of determination

4 Extra thoughts

4.1 Flipping axis

4.2 Another way - lm(y ~ 0 + offset(x))

4.2 Another way - `lm(y ~ 0 + offset(x))`