A function to select groups of variables which are good at predicting a given phenotype. The groups considered corresponds to various cut levels in a user defined hierarchy. The selection is performed by various penalty-based regression methods (weighted-LASSO or group-LASSO).
getHierLevel( X, y, hc.object, selection = c("rho-sicomore", "sicomore", "mlgl"), compression = "mean", depth.cut = 3, choice = c("lambda.min", "lambda.1se"), mc.cores = 2, stab = FALSE, stab.param = list(B = c(100, 100), cutoff = c(0.75, 0.75), PFER = c(1, 1)) )
X | input matrix |
---|---|
y | response variable |
hc.object | output of a hierarchical clustering algorithm in the |
selection | method used to perform variable selection. Either 'sicomore', 'rho-sicomore' or 'mlgl' (see details). Default is 'rho-sicomore'. |
compression | a string (either "mean" or "SNP.dist"). Indicates how groups of variables are compressed before variable selection is performed at each level of the hierarchy. Only relevant for 'sicomore' or 'rho-sicomore'. |
depth.cut | an integer specifying the depth of the search space for the variable selection part of the algorithm. This argument allows to increase the speed of the algorithm by restraining the search space without affecting too much the performance. A value between 3 and 6 is recommended, the smaller the faster. |
choice | a string (either "lambda.min" or "lambda.1se"). Indicates how the tuning parameter is chosen in the penalized regression approach |
mc.cores | an integer for the number of cores to use in the parallelization of the cross-validation and some other functions. Default is 1. |
stab | A boolean indicating if the algorithm perform a lasso stability selection using stabsel function from stabs package. |
stab.param | A list of parameter for the stabsel function if stab = TRUE. The parameters to choose are the FWER (1 by default), cut-off (0.75 by default) and bootstrap number (200 by default). |
an RC object with class 'sicomore-model', with methods nGrp()
, nVar()
, getGrp()
, getVar()
, getCV()
, getX.comp()
, getCoef()
and with the following fields:
groups:a list with the selected groups of predictors
coefficients:a vector with the estimated coefficients (one per selected group) if stab=FALSE
X.comp:The compressed version of the original input matrix (as many columns as number of selected groups)
cv.error:for the best grouping, a data frame showing the cross-validation error used in the variable selection procedure if stab=FALSE
selection:the selection method used
compression:the compression method used
group_inference:the group selection infered by "lasso" or "hclust" if no selection by lasso.
The methods for variable selection are variants of the LASSO or the group-LASSO designed to perform selection of interaction between multiple hierarchies: 'sicomore' and 'rho-sicomore' (see Ambroise et al. (2018) , Park et al. (2007) ) use a LASSO penalty on compressed groups of variables along the hierarchies to select the interactions. The rho-sicomore variant is a weighted version of sicomore, which weights depend on the levels in the hierarchies. The method 'mlgl' of Grimonprez (2016) uses a weigthed group-Lasso penalty which does not require compression but is more computationally demanding.
Ambroise C, Chiquet J, Guinot F, Szafranski M (2018).
“Fast Computation of Genome-Metagenome Interaction Effects.”
arXiv preprint arXiv:1810.12169.
Grimonprez Q (2016).
Selection de groupes de variables corrélées en grande dimension.
Ph.D. thesis, Université de Lille.
Park MY, Hastie T, Tibshirani R (2007).
“Averaged gene expressions for regression.”
Biostatistics, 8(2), 212--227.