Maximum Flow Techniques for Network Clustering (Thesis)

Report ID: TR-652-02
Author: Tsioutsiouliklis, Kostas
Date: 2002-06-00
Pages: 142
Download Formats: |PDF|
Abstract:

The problem of clustering a data-set according to certain optimization criteria is of great theoretical interest, but also of major practical importance. The classification of large data collections (i.e. web-pages, scientific literature, etc.) requires methods that produce clusters of high quality and are efficient in practice, as well.

This thesis focuses on network clustering, based on maximum flow techniques. Central notion in our methods are various definitions of a community within the network, and key tool for extracting commmunities is the minimum cut tree (or Gomory-Hu tree). We study the properties of the produced clusters and present experimental results for real-world data.

We conclude that the maximum flows of a network provide strong relations between vertices, and allow for clustering algorithms of high quality. From a theoretical point of view we can prove strong performance bounds under several settings. Experimentally, the algorithms are relatively simple to implement and perform well in practice.