Class StabilityPath
Class StabilityPath
Details
Class of object returned by the QuadrupenFit$cross_validate() method or the
cross_validate() function. Owns print() and plot() methods.
Public fields
probabilitiesa
Matrixobject containing the estimated probabilities of selection along the path of solutions.regParama lsit with the levels of the regularizing parameters used
subsamplesa list that contains the folds used for each subsample.
Active bindings
nvarnumber of variables (without intercept)
nobsnumber of observation/sample size
nonzerovariables with a non-null probability of selection along the stability path
nonzeroprobsubset of the probabilities stability path on the nonzero variables
Methods
Method new()
Constructor for a StabilityPath object
Should be called internally by an object QuadrupenFit$stability()
Usage
StabilityPath$new(probabilities, regParam, subsamples)Method selection()
Perform variable selection based on the stability path
Arguments
sel_modea character string, either
"rank"or"PFER". In the first case, the selection is based on the rank of total probabilties by variables along the path: the firstnvarselvariables are selected (see below). In the second case, the PFER control is used as described in Meinshausen and Buhlmannn's paper. Default is"rank".cutoffvalue of the cutoff probability (only relevant when
sel_modeequals"PFER").PFERvalue of the per-family error rate to control (only relevant when
sel_modeequals"PFER").nvarselnumber of variables selected (only relevant when
sel_modeequals"rank". Default isfloor(n/log(p)).
Method plot()
Produce a plot of the stability path obtained by stability selection.
Arguments
xvarvariable to plot on the X-axis: either
"lambda"(first penalty level) or"fraction"(fraction of the penalty level applied tune by \(\lambda_1\)). Default is"lambda".titletitle title. If none given, a somewhat appropriate title is automatically generated.
labelsan optional vector of labels for each variable in the path (e.g., 'relevant'/'irrelevant'). See examples.
sel_modea character string, either
"rank"or"PFER". In the first case, the selection is based on the rank of total probabilties by variables along the path: the firstnvarselvariables are selected (see below). In the second case, the PFER control is used as described in Meinshausen and Buhlmannn's paper. Default is"rank".cutoffvalue of the cutoff probability (only relevant when
sel_modeequals"PFER").PFERvalue of the per-family error rate to control (only relevant when
sel_modeequals"PFER").nvarselnumber of variables selected (only relevant when
sel_modeequals"rank". Default isfloor(n/log(p)).log_scalelogical; indicates if a log-scale should be used when
xvar="lambda". Default isTRUE.plotlogical; indicates if the graph should be plotted. Default is
TRUE. IfFALSE, only the ggplot2 object is sent back.
Returns
a list with a ggplot2 object which can be plotted
via the print method, and a vector of selected variables
corresponding to method of choice ("rank" or
"PFER").
Examples
\dontrun{
## Simulating multivariate Gaussian with blockwise correlation
## and piecewise constant vector of parameters
beta <- rep(c(0,1,0,-1,0), c(25,10,25,10,25))
Soo <- matrix(0.75,25,25) ## bloc correlation between zero variables
Sww <- matrix(0.75,10,10) ## bloc correlation between active variables
Sigma <- bdiag(Soo,Sww,Soo,Sww,Soo) + 0.2
diag(Sigma) <- 1
n <- 100
x <- as.matrix(matrix(rnorm(95*n),n,95) %*% chol(Sigma))
y <- 10 + x %*% beta + rnorm(n,0,10)
## Build a vector of label for true nonzeros
labels <- rep("irrelevant", length(beta))
labels[beta != 0] <- c("relevant")
labels <- factor(labels, ordered=TRUE, levels=c("relevant","irrelevant"))
## Call to stability selection function, 200 subsampling
stab <- stability(x,y, subsamples=200, lambda2=1, minratio=1e-2)
## Build the plot an recover the selected variable
plot(stab, labels=labels)
plot(stab, xvar="fraction", labels=labels, sel_mode="PFER", cutoff=0.75, PFER=2)
}
Examples
## ------------------------------------------------
## Method `StabilityPath$plot`
## ------------------------------------------------
if (FALSE) { # \dontrun{
## Simulating multivariate Gaussian with blockwise correlation
## and piecewise constant vector of parameters
beta <- rep(c(0,1,0,-1,0), c(25,10,25,10,25))
Soo <- matrix(0.75,25,25) ## bloc correlation between zero variables
Sww <- matrix(0.75,10,10) ## bloc correlation between active variables
Sigma <- bdiag(Soo,Sww,Soo,Sww,Soo) + 0.2
diag(Sigma) <- 1
n <- 100
x <- as.matrix(matrix(rnorm(95*n),n,95) %*% chol(Sigma))
y <- 10 + x %*% beta + rnorm(n,0,10)
## Build a vector of label for true nonzeros
labels <- rep("irrelevant", length(beta))
labels[beta != 0] <- c("relevant")
labels <- factor(labels, ordered=TRUE, levels=c("relevant","irrelevant"))
## Call to stability selection function, 200 subsampling
stab <- stability(x,y, subsamples=200, lambda2=1, minratio=1e-2)
## Build the plot an recover the selected variable
plot(stab, labels=labels)
plot(stab, xvar="fraction", labels=labels, sel_mode="PFER", cutoff=0.75, PFER=2)
} # }