library(igraph) # graph manipulation
library(sbm) # stochastic bloc model
library(tidyverse) # data manipulation
library(aricode) # clustering measures comparison
library(corrplot) # plot of covariance/correlation matrices
Graph Clustering: Stochastic Blockmodels
An illustration on antogonistic tree/fungus network
1 Introduction
This tutorial introduces the model-based approaches, namely, stochastic blockmodels (binary, weighted, bipartite, w/o covariates)
These methods will be illustrated for the analysis of an (ecological) network data set.
1.1 Requirements
The packages required for the analysis are sbm
and igraph
, plus some others for data manipulation and representation.
2 Advice
Use the documentation of the aforementioned packages!!, and the vignettes available at https://grosssbm.github.io/sbm/
2.1 Data set: antagonistic tree/fungus interaction network
We consider the fungus-tree interaction network studied by Vacher, Piou, and Desprez-Loustau (2008), available with the package sbm:
data("fungusTreeNetwork")
str(fungusTreeNetwork, max.level = 1)
List of 5
$ tree_names : Factor w/ 51 levels "Abies alba","Abies grandis",..: 1 2 3 14 42 4 5 6 7 8 ...
$ fungus_names: Factor w/ 154 levels "Amphiporthe leiphaemia",..: 1 2 3 4 5 6 7 8 9 10 ...
$ tree_tree : num [1:51, 1:51] 0 12 9 3 2 2 0 0 2 7 ...
$ fungus_tree : int [1:154, 1:51] 0 0 0 0 1 1 1 0 0 0 ...
$ covar_tree :List of 3
This data set provides information about 154 fungi sampled on 51 tree species. It is a list with the following entries:
tree_names
: list of the tree species namesfungus_names
: list of the fungus species namestree_tree
: weighted tree-tree interactions (number of common fungal species two tree species host)fungus_tree
: binary fungus-tree interactionscovar_tree
: covariates associated to pairs of trees (namely genetic, taxonomic and geographic distances)
During this tutorial we are going to explore successive variants of the Stochastic Blockmodels to analyse binary, weighted, then bipartite network, also by introducing external information via covariates.
3 Analysis of the tree/tree data
The tree-tree interactions result into a simple network.
3.1 Weighted interaction network with Poisson model
We consider the weighted network where the link between two trees is the number of fungi they share.
\begin{aligned} (Z_i) \text{ i.i.d.} \qquad & Z_i \sim \mathcal{M}(1, \pi) \\ (Y_{ij}) \text{ indep.} \mid (Z_i) \qquad & (Y_{ij} \mid Z_i=k, Z_j = \ell) \sim \mathcal{P}(\exp(\alpha_{kl})) = \mathcal{P}(\lambda_{kl}) \end{aligned}
Remove the isolates node from the weighted adjacency matrix (with degree equal to 0)
Adjust a collection of Poisson SBM and select the best model accoriding to ICL.
Adjust the spectral clustering (coded last time) and modularity-based hierarchical (
igraph::cluster_fast_greedy
) clustering methodsCompare spectral, hierarchical clustering and Poisson SBM with ARI/NID (package
aricode
) and alluvial plots (plotAlluvial
) (chose the number of block selected by the Poisson SBM).
3.2 Including covariate effects
We have on each pair of trees 3 covariates, namely the genetic distance, the taxonomic distance and the geographic distance.
Each covariate has to be introduced as a matrix: X^k_{ij} corresponds to the value of the k-th covariate describing the couple (i,j).
Z_i \sim^{\text{iid}} \mathcal{M}(1, \alpha) \\ Y_{ij} \mid Z_i=k, Z_j = \ell \sim \mathcal{P}(\exp(\pi_{kl} + x_{ij}^\intercal \theta)) = \mathcal{P}(\gamma_{kl}\exp(x_{ij}^\top \theta))
Questions
Remove the isolated nodes in the matrices of covariates (use the same ones as in the graph).
Adjust a Poisson SBM with covariates (1, 2, all of them). Use ICL to select the best model.
Compare the obtained clustering with ARI/NID and alluvial plots of your final model with the model without covariates.