SkipIndex: Towards a Scalable Peer-to-Peer Index Service for High Dimensional Data
Abstract:
Indexing of high-dimensional data is essential for building
applications such as multimedia retrieval, data mining, and spatial
databases. Traditional index structures rely on centralized
processing. This approach does not scale with the rapidly increasing
amount of application data available on massively distributed
systems like the Internet.In this paper, we propose a distributed high-dimensional index
structure based on peer-to-peer overlay routing. A new routing scheme
is used to lookup data keys in the distributed index, which guarantees
logarithmic lookup and maintenance cost, even in the face of skewed
datasets. We propose a novel nearest neighbor (NN) query scheme that
can substantially reduce search cost by sacrificing a small amount of
precision. We propose a load-balancing mechanism that partitions the
high dimensional search space in a balanced manner. We then analyze
the performance of our proposed using a variety of metrics with
simulation as well as a functional PlanetLab implementation.