library(igraph) # graph manipulation
library(igraphdata) # network data manipulation
library(tidyverse) # data manipulation
library(corrplot) # fancy matrix representation
library(aricode) # clustering measures comparison
Graph Clustering
Spectral and hierarchical methods
1 Introduction
This tutorial introduces the graph clustering techniques seen during the lectures, that is,
- Hierarchical clustering for network data
- Spectral clustering and its variants
These methods will be illustrated for the analysis of a (friendship/social) network data set.
1.1 Requirements
The packages required for the analysis are igraph
, igraphdata
, aricode
plus some others for data manipulation and representation.
2 Advice
Use the documentation of the aforementioned packages!!
2.1 Data set: Friendship network of a UK university faculty
The personal friendship network of a faculty of a UK university, consists of 81 vertices (individuals) and 817 directed and weighted connections. The school affiliation of each individual is stored as a vertex attribute. This dataset can serve as a tested for community detection algorithms.
We consider a undirected version of this network, available with the package igraphdata:
data("UKfaculty", package = "igraphdata")
<- as.undirected(UKfaculty)
UKfaculty UKfaculty
IGRAPH 34aa9ed U-W- 81 577 --
+ attr: Type (g/c), Date (g/c), Citation (g/c), Author (g/c), Group
| (v/n), weight (e/n)
+ edges from 34aa9ed:
[1] 1-- 4 3-- 4 5-- 6 5-- 7 6-- 7 3-- 9 4-- 9 5-- 9 5--10 7--10
[11] 5--12 7--12 5--13 7--13 10--13 12--13 2--15 14--15 5--16 7--16
[21] 12--16 13--16 3--17 4--17 9--17 2--18 7--18 15--18 15--19 18--19
[31] 2--20 6--20 8--20 14--20 15--20 18--20 19--20 2--21 8--21 10--21
[41] 13--21 15--21 19--21 5--22 7--22 10--22 12--22 21--22 5--23 7--23
[51] 10--23 12--23 13--23 22--23 2--25 15--25 20--25 21--25 2--26 14--26
[61] 20--26 21--26 5--27 7--27 10--27 13--27 16--27 19--27 22--27 23--27
+ ... omitted several edges
This data set is an igraph object with various attributes
V(UKfaculty)$Group
: a node attribute for the school affiliationE(UKfaculty)$weight
: an edge attribute for the weights / freindship strength
3 Analysis of the Friendship network
3.1 Binary interaction network
We first consider a binary version of this network where an edge is drawn between two individual when they are friend, just to become familiar with igraph.
Questions
Create an igraph object similar to
UKfaculty
with binary edgesCheck the attributes of the vertices and edges. Plot the network by adding color to the node related to the group.
Extract the binary adjacency matrix and plot it with the corrplot package.
Plot the distribution of the degree
Plot the matrix of shortest-path distance
Perform hierarchical clustering using various algorithm (modularity, edges betweeness, etc). What do you think?
3.2 Weighted interaction network
We now study the original weighted graph. Plot the graph, the matrix and the distribution of the degree. Put color on node depending on the group attribute, node size depending on the node degree, and edge thickness depending on the weigths.
Perform hierarchical clustering using various distances, in particular the shortest path distance. Plot the dendrogram (use the as.hclust function if necessary)
Compare the AHC clustering to the reference clustering with ARI (package
aricode
) for a varying number of groups.Plot the Fiedler vector for this data (use normalized Laplacian). Comment (use tree names no find some structure in the data).
Implement different the normalized and absolute spectral clusterings and test them on this data. Plot the data matrix reordered by row and column according to these clustering.
Compare all the methods with ARI/NID (package
aricode
) to the reference clsutering for various number of groups.What is your best final model?