COS 435, Spring 2006
Summary of the algorithms for ranking
nodes in social networks
We calculate scores on the nodes of a directed graph (the
social network).
Notation: We consider
a directed graph with n
nodes. E is the
adjacency
matrix, i.e. E[i,j] = 1 if
there is an edge from node i to node j in the graph and E[i,j] = 0 otherwise.
The HITS (hubs and authorities)
algorithm:
Let a
be a vector of the n
authority values of the n
nodes in the graph and h be a vector of the n hub
values.
- initialize a
= (1, 1, ... , 1)T and h = (1, 1, ... ,
1)T;
repeat until convergence {
anew
= ET h;
hnew
= E a;
a =
normalized anew;
h =
normalized hnew;
}
The normalization simply divides each vector component by the vector's
Euclidean length, i.e. the square root of the sum of the squares of the
vector components. Note that this normalization step differs from
that in the reading I assigned in Mining
the Web: Discovering Knowledge from Hypertext Data. Instead
it follows the original paper by Kleinberg. Note that (anew
= ET h) is simply
the calculation anew[i] = SUM k (E[k,i]
*h[k])
for the n values of i and (hnew
= E a ) is simply the calculation hnew[i] = SUM k (E[i,k]
*a[k])
for the n values of i. Parameter k of the sum ranges over all n values.
The pagerank algorithm:
Let pr
denote the vector of n
pagerank values of the n
nodes in the graph, q be the
"random
jump" parameter, and tk be
the outdegree of vertex k.
initialize
pr = (1/n , 1/n,
... , 1/n)
T ;
repeat until convergence {
for i
from 1 to n {
pr[i]new = q/n + (1- q)
* SUM k (E[k,i]
* ( pr[k] / tk ) )
}
pr
= prnew
;
}
Using this update formula, the components of pr
always sum to 1, as one wants if pr represents
the probabilities of being at the different vertices. No
normalization step is necessary.