library(tidyverse)
library(parallel)
theme_set(theme_bw())
Nonlinear Regression
Lecture Notes
Preliminary
Only functions from R
-base and stats (preloaded) are required plus packages from the tidyverse for data representation and manipulation. You could also try the package broom that standardizes the output of built-in R functions for statistical modelling
1 Introduction
The faithful
data (provided by the R
base package datasets) consist of the waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA.
Let us see how these data look like.
Show the code
<-
faithful_plot %>%
faithful ggplot() + aes(x = eruptions, y = waiting) +
geom_point(size=2, colour="#993399") +
ylab("Waiting time to next eruption (mn)") + xlab("duration of the eruption (mn)")
faithful_plot
We aim to fit a model to this data that describes the relationship between duration and waiting time.
If we try to fit a fit a polynomial model, we can check that a polynomial of degree 4 is considered as the “best polynomial model”.
<- lm(waiting ~poly(eruptions, 4), data = faithful) poly4
<-
faithful_plot +
faithful_plot geom_smooth(method = "lm", formula = y ~ poly(x, 4), se = FALSE, colour="#339900")
faithful_plot
Even if this model is the “best” polynomial model, we may have serious doubts on the capabilities of the model to predict waiting times for durations outside of the observed range of durations.
Furthermore, parameters of the model, i.e. the polynomials’ coefficients, have no obvious physical interpretation. Using a polynomial model here, we are therefore not seeking to build a structural model that approximates a physical phenomenon, but merely seeking to rely the variability in the observations to the explanatory variables , , … We therefore need to consider other types of models, i) that do not necessarily assume linear relationships between the response variable and the explanatory variables, ii) whose parameters have some physical interpretation.
A logistic function (or logistic curve) is a common “S” shape (sigmoid curve), with equation:
Here, is the limiting value (when ), measure the steepness of the curve and is the -value of the sigmoid’s midpoint.
This model is a nonlinear model in the sense that the regression function is a nonlinear function of the parameters.We can fit this model to our data using the nls
function.
<- nls(waiting ~ A / ( 1 + exp(- gamma * (eruptions -tau))), faithful, start = c(A=70, gamma=2, tau=1))
nlm1 summary(nlm1)
Formula: waiting ~ A/(1 + exp(-gamma * (eruptions - tau)))
Parameters:
Estimate Std. Error t value Pr(>|t|)
A 93.1097 4.5080 20.654 < 2e-16 ***
gamma 0.6394 0.1022 6.254 1.57e-09 ***
tau 1.4623 0.1092 13.391 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.763 on 269 degrees of freedom
Number of iterations to convergence: 9
Achieved convergence tolerance: 6.211e-06
We will see in the next sections what these results are and how they are computed.
An extension of the logistic function assumes a minimum waiting time between eruptions:
We can again use nls
to fit this nonlinear model to the data:
<- nls(waiting ~ (A-S) / ( 1 + exp(-gamma * (eruptions - tau)) ) + S, faithful, start = c(A=90, gamma=2, tau=2, S=50))
nlm2 summary(nlm2)
Formula: waiting ~ (A - S)/(1 + exp(-gamma * (eruptions - tau))) + S
Parameters:
Estimate Std. Error t value Pr(>|t|)
A 82.4659 0.9973 82.689 < 2e-16 ***
gamma 2.2539 0.4355 5.175 4.47e-07 ***
tau 3.0553 0.1107 27.610 < 2e-16 ***
S 51.3221 1.8303 28.040 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.622 on 268 degrees of freedom
Number of iterations to convergence: 9
Achieved convergence tolerance: 6.243e-06
We can now compute and plot the waiting times predicted with these two fitted models
Show the code
<- faithful_plot +
faithful_plot geom_smooth(
method = "nls", se = FALSE, color = "#E69F00",
formula = y ~ A / ( 1 + exp(- gamma * (x-tau))),
method.args = list(start = c(A=70, gamma=2, tau=1))) +
geom_smooth(
method = "nls", se = FALSE, color = "#56B4E9",
formula = y ~ (A-S) / ( 1 + exp(-gamma * (x-tau)) ) + S,
method.args = list(start=c(A=90, gamma=2, tau=2, S=50)))
faithful_plot
We will see in the next sections
- how to fit a nonlinear model to the data. We will use the model and show how to retrieve the results of
nlm1
. - how to evaluate the capability of the model to describe the observed data,
- how to compare possible models,
- how to compute confidence intervals and prediction intervals for predicted values.
2 Fitting a nonlinear model
2.1 Estimation of the parameters of the model
2.1.1 Least squares estimation
In the model
The least squares (LS) estimator of minimizes the residual sum of squares (RSS)
Here, there is no closed-form expression for . An optimization procedure is then used for computing .
We define the first model as
where is the logistic function defined above and where . In R
,
<- function(beta, x) {
f_1 <- beta[1]; gamma <- beta[2]; tau <- beta[3]
A / ( 1 + exp(- gamma * (x-tau)))
A }
Let us check that function nls
computes the nonlinear least-squares estimates of the parameters of the (nonlinear) model, that is, solve the above optimization problem in . We first create a function that computes the residual sum of squares for a given vector of parameters , which will be the objective (or cost) function from the optimization point of view:
<- function(beta, x, y) sum( (y - f_1(beta, x) )^2 ) rss_1
Then the LS estimate of can be computed using nlm
(nonlinear minimization) which minimizes the residuals sum of squares using a Newton-type algorithm.
<- nlm(rss_1, c(A = 90, gamma = 2, tau = 2), faithful$eruptions, faithful$waiting)
optim_nlm1 <- setNames(optim_nlm1$estimate, names(coef(nlm1)))
beta_hat beta_hat
A gamma tau
93.1100965 0.6393835 1.4622674
Assume now that the residual errors are random variables with mean 0 and variance
Then, following the approach for linear models, if is a vector of length , there are degrees of freedom,the residual error variance is defined as
and the so-called residual standard error is
<- nrow(faithful)
n <- length(beta_hat)
p <- (n - p)
df <- sqrt(rss_1(beta_hat, faithful$eruptions, faithful$waiting)/df)
sigma_hat sigma_hat
[1] 5.762887
2.1.2 Maximum likelihood estimation
Let where is a sequence of independent and normally distributed random variables with mean 0 and variance
We can then rewrite the model as follows:
The maximum likelihood (ML) estimator of coincides with the least squares estimator
and the ML estimators of and are
<- sqrt(rss_1(beta_hat, faithful$eruptions, faithful$waiting) / n)
sigma_hat_ML sigma_hat_ML
[1] 5.731018
2.2 Standard errors of the parameter estimates
Several methods exist for estimating the standard errors of the parameter estimates. In particular, the nls
function uses a linear approximation of the model, but a likelihood approach or a parametric bootstrap may also provide estimates of these s.e.
2.2.1 Linearization approach
An nls
object has methods for several generic functions, including vcov
which computes the variance-covariance matrix of the estimated parameters .
vcov(nlm1)
A gamma tau
A 20.3217751 -0.451414904 0.432032648
gamma -0.4514149 0.010452941 -0.008769221
tau 0.4320326 -0.008769221 0.011923449
The standard errors of the estimates are then the square roots of the diagonal elements of this matrix
sqrt(diag(vcov(nlm1)))
A gamma tau
4.5079680 0.1022396 0.1091945
The nls
function linearizes the model for computing this variance-covariance matrix. Indeed, for any ``close’’ to ,
where is the gradient of , i.e. the row vector of the first derivatives of with respect to the components of . Setting and , the original model can be approximated by the linear model
Writing this model in the matrix form where is the th row of matrix , we can check that the LS estimator of for this model is the LS estimator of the original model
Proposition 1 (Equivalence of the two LS estimate) The LS estimator of the linearized model is equivalent to the LS estimator of the original model:
Let be the LS estimator of the linearized model. Then,
By definition, minimizes . Then,
Thus, .
Let us check this property numerically:
<- deriv(
hx expr = y ~ A / ( 1 + exp(- gamma * (x - tau))),
namevec = c("A", "gamma", "tau"),
function.arg = function(A, gamma, tau, x) { }
) <- hx(beta_hat[1], beta_hat[2], beta_hat[3], faithful$eruptions)
fr <- attr(fr, "gradient")
G <- faithful$waiting - f_1(beta_hat, faithful$eruptions) + G %*% beta_hat
z solve(crossprod(G)) %*% crossprod(G,z)
[,1]
A 93.110121
gamma 0.639383
tau 1.462268
Since , the variance-covariance of can be approximated by
<- sigma_hat^2*solve(t(G)%*%G)
V_lin V_lin
A gamma tau
A 20.3230488 -0.451427057 0.432076372
gamma -0.4514271 0.010452837 -0.008769862
tau 0.4320764 -0.008769862 0.011924743
and we can derive standard errors for the parameter estimates,
<- sqrt(diag(V_lin))
se_lin se_lin
A gamma tau
4.5081092 0.1022391 0.1092005
2.2.2 Maximum likelihood approach
Let be the observed Fisher information matrix at :
Then, the Central Limit Theorem states that the variance of can be approximated by the inverse of :
Function nlm
can return the Hessian of the function rss_1
to minimize, i.e. the matrix of the second derivatives
<- nlm(rss_1, c(90, 2, 2), faithful$eruptions, faithful$waiting, hessian = "true")
optim_nlm1 <- optim_nlm1$hessian
H H
[,1] [,2] [,3]
[1,] 324.8939 10848.5 -3794.477
[2,] 10848.4973 379318.3 -114782.340
[3,] -3794.4768 -114782.3 58845.488
We then derive the FIM and the variance :
<- solve(H/(2*sigma_hat_ML^2))
V_ml V_ml
[,1] [,2] [,3]
[1,] 17.4344834 -0.386666080 0.369991066
[2,] -0.3866661 0.008998208 -0.007381367
[3,] 0.3699911 -0.007381367 0.010576191
and
<- sqrt(diag(V_ml))
se_ml se_ml
[1] 4.17546206 0.09485888 0.10284061
Beside, using the fact that , we can show that . Then the standard error of is approximately .
/sqrt(2*n) sigma_hat
[1] 0.2470817
2.2.3 Parametric bootstrap
If we were able to repeat the same experiment under the same conditions, we would observe and we would compute , an estimate of . Then, if we could repeat this experiment times, we would get estimates , , , of . This sequence of estimates would be a sample of random variables distributed as and could therefore be used for estimating this distribution.
When such replicates are not available, parametric bootstrap (or Monte-Carlo simulation) is a way to mimic the repetition of an experiment.
For , we generate “observations” with the model of interest, the original explanatory variables and using the estimated parameters :
where . We also compute the LS /ML estimate of :
The variance-covariance matrix of is then estimated by the empirical variance-covariance matrix of :
where .
<- 1000
L <- predict(nlm1)
y_hat_ref <- coef(nlm1)
beta_hat <- faithful$eruptions
x <- parallel::mclapply(1:L, function(b) {
betas_boot <- y_hat_ref + sigma_hat * rnorm(length(y_hat_ref))
y_b coef(suppressWarnings(nls(y_b ~ f_1(beta, x), start = list(beta = beta_hat))))
%>% unlist() %>% matrix(ncol = L) %>% t()
}) <- cov(betas_boot)
V_mc V_mc
[,1] [,2] [,3]
[1,] 27.0193082 -0.49290098 0.73934710
[2,] -0.4929010 0.01036602 -0.01164034
[3,] 0.7393471 -0.01164034 0.02408383
<- sqrt(diag(V_mc))
se_mc se_mc
[1] 5.1980100 0.1018137 0.1551896
Remark. It would be equivalent to directly compute the empirical standard deviation of each component of the sequence :
apply(betas_boot, 2, sd)
[1] 5.1980100 0.1018137 0.1551896
One of the main advantages of this method is that it doesn’t make any assumption on , contrary to the maximum likelihood estimator which asymptotic distribution is known to be normal, with a known asymptotic variance. Then, this asymptotic distribution is used with a finite set of observations for approximating the distribution of the ML estimator, but without knowing how good this approximation is. On its part, the linearization approach makes use of an approximation of the structural model, without knowing how good this approximation is.
In this example, the ML estimator seems to underestimate the standard error of the estimates. On the other hand, results obtained with the linearization approach are very similar to those obtained by Monte Carlo simulation.
2.3 Statistical tests for the model parameters
The summary of model nlm1
includes several informations about the model parameters:
summary(nlm1)$coefficient
Estimate Std. Error t value Pr(>|t|)
A 93.1097350 4.5079680 20.654480 1.974732e-57
gamma 0.6393917 0.1022396 6.253854 1.567061e-09
tau 1.4622598 0.1091945 13.391326 1.110071e-31
Let be the -vector of parameters of the model. In the linearized model ,
follows a -distribution with degrees of freedom. We can then perform a -test to test if .
The test statistics is and the -value for this test is
<- beta_hat/se_lin
t_stat <- 2*(1 - pt(abs(t_stat), n-p))
p_value cbind(beta_hat, se_lin, t_stat, p_value) %>% round(4)
beta_hat se_lin t_stat p_value
A 93.1097 4.5081 20.6538 0
gamma 0.6394 0.1022 6.2539 0
tau 1.4623 0.1092 13.3906 0
2.4 Confidence intervals for the model parameters
2.4.1 Linearization approach
Using the linearized model , we can compute a confidence interval for each component of as we do with any linear model:
where is the quantile of order for a -distribution with degree of freedom.
<- 0.95
level <- 1 - level
alpha <-
CI_linearized cbind(
+ qt(alpha/2, n-p) * se_lin,
beta_hat + qt(1-alpha/2,n-p) * se_lin) %>% as.data.frame() %>%
beta_hat setNames(c(paste0((1-level)/2*100,"%"),paste0((1+level)/2*100,"%")))
CI_linearized
2.5% 97.5%
A 84.2340705 101.9853995
gamma 0.4381011 0.8406823
tau 1.2472635 1.6772560
2.4.2 Maximum likelihood approach
We can adopt the same approach with the ML estimate. Here, the standard errors are derived from the asymptotic variance-covariance matrix of the parameter estimates .
<-
CI_ML cbind(
+ qt(alpha/2,n-p) * se_ml,
beta_hat + qt(1-alpha/2,n-p) * se_ml) %>% as.data.frame() %>%
beta_hat setNames(c(paste0((1-level)/2*100,"%"),paste0((1+level)/2*100,"%")))
CI_ML
2.5% 97.5%
A 84.8889935 101.3304765
gamma 0.4526314 0.8261519
tau 1.2597849 1.6647346
2.4.3 Parametric bootstrap
The sequence obtained by Monte Carlo simulation can be used for computing an empirical confidence interval:
where, for any , is the empirical quantile of order of :
<-
CI_bootstrap apply(as.matrix(betas_boot), 2, quantile, probs = c(alpha/2, 1-alpha/2)) %>%
t() %>% as.data.frame() %>%
setNames(c(paste0((1-level)/2*100,"%"),paste0((1+level)/2*100,"%")))
CI_bootstrap
2.5% 97.5%
1 86.7787669 106.9205636
2 0.4561864 0.8477302
3 1.3088312 1.9176370
Remark. These confidence intervals are slightly biased. We will use a linear model to explain where this bias comes from and show how to remove it.
Consider the linear model . A confidence interval of level for is
where .
On the other hand, for ,
and
Thus, conditionnally to the observations , i.e. conditionnally to , . Then, the empirical quantile is an estimator of the quantile of order of a normal distribution with mean and variance . In other words, the confidence interval is an estimator of the interval , where is the quantile of order for a distribution.
We see that these quantiles for a normal distribution should be tranformed into quantiles for a -ditribution with df.
An unbiased confidence interval for is therefore
The same correction can be used for nonlinear models:
<- qt(1-alpha/2,df)/qnorm(1-alpha/2)
rq + rq*(CI_bootstrap - beta_hat) beta_hat
2.5% 97.5%
1 86.7501543 106.9829812
2 0.4553584 0.8486718
3 1.3081377 1.9196951
2.4.4 Profile likelihood
Function confint
uses the profile likelihood method for computing confidence intervals for parameters in a fitted model.
<- confint(nlm1, level = level)
CI_profiled CI_profiled
2.5% 97.5%
A 87.1321726 105.5760368
gamma 0.4625255 0.8324095
tau 1.3109469 1.8569351
Profile likelihood confidence intervals are based on the log-likelihood function.
Imagine that we want to compute a confidence interval for . The profile likelihood of is defined by
does no longer depend on since it has been profiled out.
As an example, let us compute and display the profile log-likelihood of for model nlm1
<- function(gamma,tau,x,A){A/(1+exp(-gamma*(x-tau)))}
f_1A <- seq(86, 110, by = 0.1)
values_A <- list(gamma = beta_hat[2], tau = beta_hat[3])
start <- map(values_A, ~
logLik_A nls(waiting ~ .x / ( 1 + exp(- gamma * (eruptions - tau))), faithful, start = start)
%>% map(logLik) %>% map_dbl(as.numeric) )
Show the code
data.frame(A = values_A, logLik = logLik_A) %>%
ggplot() + aes(x = A, y = logLik) + geom_line(color="blue", linewidth=1) +
geom_vline(xintercept = beta_hat[1], color="red", linetype = "longdash") +
scale_x_continuous(breaks = c(seq(85,110, by = 5), round(beta_hat[[1]],2)),"A")
Consider the test of against . The likelihood ratio statistics is
where is the value of that maximises the likelihood under .
Under , follows a distribution with 1 df. Then, the test is significant (i.e. we reject ), if where is the quantile of order for a distribution with 1 df.
A “profile likelihood confidence interval” of level for consists of those values for which the test is not significant.
<- qchisq(level,1)
qlevel
<- function(A, qlevel) {
lp_A <- nls(waiting ~ A / ( 1 + exp(- gamma * (eruptions - tau))), faithful,
nlm1_A start = list(gamma = beta_hat[2], tau = beta_hat[3]))
<- as.numeric(2*(logLik(nlm1) - logLik(nlm1_A) ) - qlevel)
res
res
}
<- uniroot(lp_A, c(80, beta_hat[1]), qlevel)$root
c1_A <- uniroot(lp_A, c(beta_hat[1], 110), qlevel)$root c2_A
Show the code
<- 2*(as.numeric(logLik(nlm1)) - logLik_A)
dlogLik_A data.frame(A = values_A, dlogLik = dlogLik_A) %>%
ggplot() + geom_line(aes(values_A,dlogLik), size=1, color="blue") +
geom_hline(yintercept=qlevel, color="red", linetype = "longdash") +
geom_segment(aes(x = c1_A, xend = c1_A, y=-Inf, yend=qlevel), color="red", linetype = "longdash") +
geom_segment(aes(x = c2_A, xend = c2_A, y=-Inf, yend=qlevel), color="red", linetype = "longdash") +
scale_y_continuous(breaks = c(0,2,3,6,round(qlevel,2)),1) +
scale_x_continuous(breaks = c(seq(85,100,by=5),round(c1_A,1),round(c2_A,1), 110),"A")
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Let us now compute the profile likelihood confidence intervals for and
<- function(gamma) {
lp_gamma <- nls(waiting ~ A/(1 + exp(-gamma*(eruptions-tau))), faithful,
nlm1_gamma start = list(A=beta_hat[1],tau=beta_hat[3]))
as.numeric(2*(logLik(nlm1) - logLik(nlm1_gamma) ) - qlevel)
}<- uniroot(lp_gamma, lower = 0.40, upper = beta_hat[2])$root
c1_gamma <- uniroot(lp_gamma, lower = beta_hat[2], upper=1)$root
c2_gamma
<- function(tau) {
lp_tau <- nls(waiting ~ A/(1+exp(-gamma*(eruptions - tau))), faithful,
nlm1_tau start=list(A=beta_hat[1],gamma=beta_hat[2]))
as.numeric(2*(logLik(nlm1) - logLik(nlm1_tau) ) - qlevel)
}<- uniroot(lp_tau, lower=1.25, upper=beta_hat[3])$root
c1_tau <- uniroot(lp_tau, lower=beta_hat[3], upper=2.25)$root
c2_tau
<-
CI_profiled_custom rbind(A = c(c1_A , c2_A), gamma = c(c1_gamma, c2_gamma), tau = c(c1_tau, c2_tau)) %>%
as.data.frame() %>%
setNames(c(paste0((1-level)/2*100,"%"),paste0((1+level)/2*100,"%")))
CI_profiled_custom
2.5% 97.5%
A 87.1523081 105.4511191
gamma 0.4636369 0.8310879
tau 1.3118040 1.8520538
Remark. The confint
R function doesn’t use a distribution with 1 df for the LRT statistics (which is theoretically the right asymptotic distribution).
On the contrary, the square root of is assumed to follow a half -distribution with df. Then, the null hypothesis is rejected when .
<- qt(1-alpha/2,df)^2
qlevel <- uniroot(lp_A, c(80, beta_hat[1]), qlevel)$root
c1R_A <- uniroot(lp_A, c(beta_hat[1], 110), qlevel)$root
c2R_A c(c1R_A, c2R_A)
[1] 87.13229 105.53713
The two tests - and then the two confidence intervals - are equivalent for large since a -distribution with df converges to a when goes to infinity. Then, for any ,
3 Diagnostic plots
Let us plot
- the observed waiting times versus predicted waiting times,
- the residuals versus eruption times,
- the residuals versus predicted waiting times,
- the distribution of the residuals
Show the code
## unfortunately, no plotting function is defined for 'nls' object
<- resid(nlm1)/sd(resid(nlm1))
residual_nlm1 par(mfrow=c(2,2))
plot(predict(nlm1), faithful$waiting)
abline(a=0, b=1, lty=1, col="magenta")
plot(x, residual_nlm1)
abline(a=0, b=0, lty=1, col="magenta")
plot(predict(nlm1), residual_nlm1)
abline(a=0, b=0, lty=1, col="magenta")
boxplot(residual_nlm1)
abline(a=0, b=0, lty=2, col="magenta")
abline(a=qnorm(0.25), b=0, lty=2, col="magenta")
abline(a=qnorm(0.75), b=0, lty=2, col="magenta")
On one hand, observations and predictions look well randomly distributed around the line . On the other hand, residual look well distributed around 0, with a constant variance. Furthermore, the distribution of the residuals appears to be symmetrical with quantiles close to those of a normal distribution
Then, based on these graphs, we don’t have any good reason for rejecting model nlm1
… which doesn’t mean we should stay with this model as our final model!
4 Model comparison
We can produce the same the diagnostic plots with model nlm2
and arrive at the same conclusion concerning this model.
Show the code
<- resid(nlm2)/sd(resid(nlm2))
residual_nlm2 par(mfrow=c(2,2))
plot(predict(nlm2), faithful$waiting)
abline(a=0, b=1, lty=1, col="magenta")
plot(x, residual_nlm2)
abline(a=0, b=0, lty=1, col="magenta")
plot(predict(nlm2), residual_nlm2)
abline(a=0, b=0, lty=1, col="magenta")
boxplot(residual_nlm2)
abline(a=0, b=0, lty=2, col="magenta")
abline(a=qnorm(0.25), b=0, lty=2, col="magenta")
abline(a=qnorm(0.75), b=0, lty=2, col="magenta")
Since nlm1
and nlm2
are two possible model for fitting our data, we need some criteria for comparing them. The statistical tests and the information criteria used for comparing linear models can also be used for comparing nonlinear models.
First, we can perform a ANOVA for testing model nlm1
against model nlm2
since these two models are nested (nlm1
correponds to nlm2
when )
anova(nlm1, nlm2)
Analysis of Variance Table
Model 1: waiting ~ A/(1 + exp(-gamma * (eruptions - tau)))
Model 2: waiting ~ (A - S)/(1 + exp(-gamma * (eruptions - tau))) + S
Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
1 269 8933.7
2 268 8469.4 1 464.3 14.692 0.0001578 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Let and be the residual sums of squares under, respectively, nlm1
and nlm2
, and let and be lengths of vectors and . Then,
<- sum(resid(nlm1)^2)
RSS1 <- sum(resid(nlm2)^2)
RSS2 <- length(coef(nlm1))
p1 <- length(coef(nlm2))
p2 <- ( (RSS1-RSS2)/(p2-p1) ) / ( RSS2/(n-p2) )
F.stat c(RSS2, RSS1 - RSS2, F.stat, 1-pf(F.stat, p2-p1, n-p2))
[1] 8.469424e+03 4.642987e+02 1.469192e+01 1.577976e-04
Remark: since the model is nonlinear, we cannot decompose the residual sum of squares as we did with linear models. Indeed, here,
c(RSS1-RSS2, sum((predict(nlm1)-predict(nlm2))^2))
[1] 464.2987 520.3559
Another way to test nlm1
against nlm2
consists in testing if in model nlm2
:
summary(nlm2)$coefficients
Estimate Std. Error t value Pr(>|t|)
A 82.465891 0.9973073 82.688550 8.973936e-193
gamma 2.253936 0.4355265 5.175197 4.469535e-07
tau 3.055263 0.1106569 27.610221 2.420336e-80
S 51.322077 1.8302942 28.040343 1.110792e-81
Even if both tests clearly prefer model nlm2
, we can remark that the -test and the -test are not equivalent since the models are nonlinear.
Information criteria such as AIC and BIC also prefer model nlm2
:
as.matrix(AIC(nlm1, nlm2))
df AIC
nlm1 4 1729.668
nlm2 5 1717.152
as.matrix(BIC(nlm1, nlm2))
df BIC
nlm1 4 1744.092
nlm2 5 1735.181
5 Confidence intervals and prediction intervals
There is no ready-made functions to calculate confidence intervals for predicted values and prediction intervals for new data. We will see how to do it by implementing two different methods.
5.1 The delta-method
For a given value of the explanatory variable , we can use the model with the estimated parameter and predict the response as .
Since is a random vector with variance-covariance matrix , is also a random variable that can be approximated by a linear function of
Then, the so-called delta-method consists in using this approximation for approximating the variance of by
and we can now use this approximation for computing a confidence interval for each prediction :
where is the standard error of defined as
We can also compute a prediction interval for a future observation
The prediction for is
Then, the standard error for this prediction should take into account the uncertainty on and the variability of the residual error :
Then,
As an example, let us compute the variance of for ,
<- deriv(y ~ (A-S)/(1+exp(-gamma*(x-tau))) + S, c("A", "gamma", "tau", "S"), function(A, gamma, tau, S, x){} )
f_prime <- seq(1, 6, by=0.1)
x_new <- coef(nlm2)
beta_hat <- f_prime(beta_hat[1], beta_hat[2], beta_hat[3], beta_hat[4], x_new)
f_new <- attr(f_new, "gradient")
grad_new <- rowSums((grad_new %*% vcov(nlm2)) * grad_new) GS
We can then derive a confidence interval for each
<- 0.05
alpha <- sqrt(GS) * qt(1-alpha/2, n - length(beta_hat))
delta_f <- data.frame(
df_delta x = x_new,
f = f_new,
lwr_conf = f_new - delta_f,
upr_conf = f_new + delta_f
)
and for each
<- sqrt(GS + sigma(nlm2)^2)*qt(1-alpha/2,df)
delta_y c("lwr_pred","upr_pred")] <- cbind(f_new - delta_y,f_new + delta_y) df_delta[
We can now plot these two intervals together with the data:
Show the code
ggplot(faithful) + geom_point(aes(x = eruptions, y = waiting)) +
geom_ribbon(data=df_delta, aes(x=x_new, ymin=lwr_pred, ymax=upr_pred), alpha=0.1, fill="blue") +
geom_ribbon(data=df_delta, aes(x=x_new, ymin=lwr_conf, ymax=upr_conf), alpha=0.2, fill="#339900") + geom_line(data=df_delta, aes(x=x_new, y=f_new), colour="#339900", size=1)
5.2 Parametric bootstrap
A we have already seen, parametric bootstrap consists in simulating replicates of the experiment, by drawing random observations with the fitted model, i.e. using the estimated parameters $ $. Then, for each replicate,
- an estimate of the vector of parameters is computed,
- a prediction is computed for each value of ,
- a new observation is randomly drawn for each value of .
We can then use the empirical quantiles of and to compute a confidence interval for and a prediction interval for .
Let and be the empirical quantiles of order of and , respectively. Instead of using the empirical intervals
we can define the confidence and prediction intervals using the correction previously introduced for obtaining unbiased intervals in the case of a linear model:
Let us compute these confidence and prediction interval bu boostraping for the :
<- function(beta, x){ beta[4] + (beta[1]-beta[4])/(1+exp(-beta[2]*(x-beta[3]))) }
f_hat <- coef(nlm2)
beta_hat <- f_hat(beta_hat, x)
y_hat_ref <- data.frame( x =x_new, f = f_new)
df_mc
<- parallel::mclapply(1:1000, function(b) {
res <- y_hat_ref + sigma(nlm2) * rnorm(n)
y_b <- nls(y_b ~ f_hat(beta, x), start = list(beta = beta_hat))
nlm2_b <- predict(nlm2_b, newdata = df_mc)
f_b list(f_hat = f_b, y_hat = f_b + rnorm(1, 0, sigma(nlm2)))
})
c("lwr_conf","upr_conf")] <-
df_mc[map(res, "f_hat") %>% reduce(rbind) %>%
apply(2, quantile, c(alpha/2,1-alpha/2)) %>% t()
c("lwr_pred","upr_pred")] <-
df_mc[map(res, "y_hat") %>% reduce(rbind) %>%
apply(2, quantile, c(alpha/2,1-alpha/2)) %>% t()
## removing bias
3:6)] <- f_new + rq*(df_mc[,(3:6)] - f_new) df_mc[,(
Show the code
ggplot(faithful) + geom_point(aes(x = eruptions, y = waiting)) +
geom_ribbon(data=df_mc, aes(x=x_new, ymin=lwr_pred, ymax=upr_pred), alpha=0.1, fill="blue") +
geom_ribbon(data=df_mc, aes(x=x_new, ymin=lwr_conf, ymax=upr_conf), alpha=0.2, fill="#339900") +
geom_line(data=df_mc, aes(x=x_new, y=f_new), colour="#339900", size=1)