Graph Clustering

Spectral and hierarchical methods

1 Introduction

This tutorial introduces the graph clustering techniques seen during the lectures, that is,

  • Hierarchical clustering for network data
  • Spectral clustering and its variants

These methods will be illustrated for the analysis of a (friendship/social) network data set.

1.1 Requirements

The packages required for the analysis are igraph, igraphdata, aricode plus some others for data manipulation and representation.

library(igraph)     # graph manipulation
library(igraphdata) # network data manipulation
library(tidyverse)  # data manipulation
library(corrplot)   # fancy matrix representation
library(aricode)    # clustering measures comparison

2 Advice

Use the documentation of the aforementioned packages!!

2.1 Data set: Friendship network of a UK university faculty

The personal friendship network of a faculty of a UK university, consists of 81 vertices (individuals) and 817 directed and weighted connections. The school affiliation of each individual is stored as a vertex attribute. This dataset can serve as a tested for community detection algorithms.

We consider a undirected version of this network, available with the package igraphdata:

data("UKfaculty", package = "igraphdata")
UKfaculty <- as.undirected(UKfaculty)
UKfaculty
IGRAPH 34aa9ed U-W- 81 577 -- 
+ attr: Type (g/c), Date (g/c), Citation (g/c), Author (g/c), Group
| (v/n), weight (e/n)
+ edges from 34aa9ed:
 [1]  1-- 4  3-- 4  5-- 6  5-- 7  6-- 7  3-- 9  4-- 9  5-- 9  5--10  7--10
[11]  5--12  7--12  5--13  7--13 10--13 12--13  2--15 14--15  5--16  7--16
[21] 12--16 13--16  3--17  4--17  9--17  2--18  7--18 15--18 15--19 18--19
[31]  2--20  6--20  8--20 14--20 15--20 18--20 19--20  2--21  8--21 10--21
[41] 13--21 15--21 19--21  5--22  7--22 10--22 12--22 21--22  5--23  7--23
[51] 10--23 12--23 13--23 22--23  2--25 15--25 20--25 21--25  2--26 14--26
[61] 20--26 21--26  5--27  7--27 10--27 13--27 16--27 19--27 22--27 23--27
+ ... omitted several edges

This data set is an igraph object with various attributes

  • V(UKfaculty)$Group : a node attribute for the school affiliation
  • E(UKfaculty)$weight : an edge attribute for the weights / freindship strength

3 Analysis of the Friendship network

3.1 Binary interaction network

We first consider a binary version of this network where an edge is drawn between two individual when they are friend, just to become familiar with igraph.

Questions

  1. Create an igraph object similar to UKfaculty with binary edges

  2. Check the attributes of the vertices and edges. Plot the network by adding color to the node related to the group.

  3. Extract the binary adjacency matrix and plot it with the corrplot package.

  4. Plot the distribution of the degree

  5. Plot the matrix of shortest-path distance

  6. Perform hierarchical clustering using various algorithm (modularity, edges betweeness, etc). What do you think?

3.2 Weighted interaction network

  1. We now study the original weighted graph. Plot the graph, the matrix and the distribution of the degree. Put color on node depending on the group attribute, node size depending on the node degree, and edge thickness depending on the weigths.

  2. Perform hierarchical clustering using various distances, in particular the shortest path distance. Plot the dendrogram (use the as.hclust function if necessary)

  3. Compare the AHC clustering to the reference clustering with ARI (package aricode) for a varying number of groups.

  4. Plot the Fiedler vector for this data (use normalized Laplacian). Comment (use tree names no find some structure in the data).

  5. Implement different the normalized and absolute spectral clusterings and test them on this data. Plot the data matrix reordered by row and column according to these clustering.

  6. Compare all the methods with ARI/NID (package aricode) to the reference clsutering for various number of groups.

  7. What is your best final model?