sicomore — sicomore • sicomore

Selection of Interaction effects in COmpressed Multiple Omics REpresentation

sicomore(
  y,
  X.list,
  compressions = rep("mean", length(X.list)),
  selection = c("rho-sicomore", "sicomore", "mlgl"),
  choice = c("lambda.min", "lambda.min"),
  method.clus = c("ward.D2", "ward.D2"),
  depth.cut = c(3, 3),
  mc.cores = 1,
  taxonomy = NULL,
  verbose = TRUE,
  stab = TRUE,
  stab.param = list(B = c(100, 100), cutoff = c(0.75, 0.75), PFER = c(1, 1))
)

Arguments

y	response variable (phenotype)
X.list	a list of input matrices. Must have the same number of rows (number of observations). May have different number of columns (predictors)
compressions	a vector of string for the compression methods for each data sets. Default used the mean to compressed predictors in the same group.
selection	method used to perform variable selection. Either 'sicomore', 'mlgl' or 'rho-sicomore' (see details). Default is 'sicomore'.
choice	a string (either "lambda.min" or "lambda.1se"). Indicates how the tuning parameter is chosen in the penalized regression approach.
method.clus	a vector of string specifying the method used for the hierarchical clustering, one for each input matrix in X.list. By default, the hierarchy is obtain by a WARD clustering on the scaled input matrix. To use an SNP-specific spatially contrained hierarchical clustering (Dehman et al. 2015) from package adjclust, specify "snpClust". It is also possible to specify no hierarchy with "noclust".
depth.cut	a vector of integers specifying the depth of the search space for the variable selection part of the algorithm. This argument allows to increase the speed of the algorithm by restraining the search space without affecting too much the performance. A value between 3 and 6 is recommended, the smaller the faster.
mc.cores	an integer for the number of cores to use in the parallelization of the cross-validation and some other functions. Default is 1.
taxonomy	a hierarchical tree object constructed using taxonomical unit. This argument could be useful if one want to provid a specific hierarchical tree based on taxonomy rather than euclidean distance. Recommended for those who want to detect genomic-metagenomic interactions.
verbose	not yet documented
stab	A boolean indicating if the algorithm perform a lasso stability selection using stabsel function from stabs package.
stab.param	A list of parameter for the stabsel function if stab = TRUE. The parameters to choose are the FWER (1 by default), cut-off (0.75 by default) and bootstrap number (200 by default).

Value

an RC object with class sicomore with methods plot(), getSignificance() and the following fields

pval:A matrix of p-values for each interactions effects between the compressed variables originating from 2 input matrices.
pval.beta1:A matrix of p-values for the corresponding main effects of the compressed variables originating from the first input matrix
pval.beta2:A matrix of p-values for the corresponding main effects of the compressed variables originating from thesecond input matrix
modelsA class'sicomore-model' RC object obtain from getHierLevel function and with methods nGrp(), nVar(), getGrp(), getVar(), getCV(), getX.comp(), getCoef()
tuplets:A list of integer vector specifying the indexes of the compressed variables which are fitted together in a linear model with interaction.
dim: A array of integers specifying the dimension of the compressed matrices.

Details

From a set of input matrices and phenotype related to the same set of individual, sicomore is a two-step method which 1. find and select groups of correlated variables in each input matrix which are good predictors for the common phenotype 2. find the most predictive interaction effects between the set of data by testing for interaction between the selected groups of each input matrix

The methods for variable selection are variants of Lasso or group-Lasso designed to perform selection of interaction between multiple hierarchies: 'sicomore' and 'rho-sicomore' (see Ambroise et al. (2018) ) use a LASSO penalty on compressed groups of variables along the hierarchies to select interactions. rho-sicomore is a variant where a more sound weighting scheme is used dependending on the level of the hierarchy considered. The method 'mlgl' of Grimonprez (2016) uses a group-Lasso penalty which does not require compression but requires heavier computational resources.

References

Ambroise C, Chiquet J, Guinot F, Szafranski M (2018). “Fast Computation of Genome-Metagenome Interaction Effects.” arXiv preprint arXiv:1810.12169.

Dehman A, Ambroise C, Neuvial P (2015). “Performance of a blockwise approach in variable selection using linkage disequilibrium information.” BMC Bioinformatics, 16, 148.

Grimonprez Q (2016). Selection de groupes de variables corrélées en grande dimension. Ph.D. thesis, Université de Lille.