Reverse Nearest Neighbor Search in Metric Spaces

 

Yufei Tao, Man Lung Yiu, Nikos Mamoulis

 

IEEE Transactions on Knowledge and Data Engineering (TKDE), 18(9): 1239-1252, 2006

 
  Abstract

  
  Given a set D of objects, a reverse nearest neighbor (RNN) query returns the objects o Î D such that o is closer to a query object q than to any other object in D, according to a certain similarity metric. The existing RNN solutions are not sufficient because they (i) either rely on pre-computed information that is expensive to maintain in the presence of updates, or (ii) are applicable only when the data consists of "Euclidean objects", and similarity is measured using the L2 norm. In this paper, we present the first algorithms for efficient RNN search in generic metric spaces. Our techniques require no detailed representations of objects, and can be applied as long as their mutual distances can be computed, and the distance metric satisfies the triangle inequality. We confirm the effectiveness of the proposed methods with extensive experiments.
 

  Important dates
   Submitted in May 2005, revised in Dec 2005, accepted in April 2006.
  Paper download

  
 
  Implementation and datasets
 
  Before you proceed with downloading, please read and agree to the terms of using our implementation.
 
  Download our source codes (implemented: by Man Lung Yiu).
  Datasets used in our experiments: SF, TS, Color, Signature.

  Dataset description: SF contains points representing 174k locations in San Francisco. The second dataset TS is a time series containing 76k values corresponding to Dow Jones indexes on 19k consecutive days (i.e., 4 values per day). We create 5D points by placing a sliding window with a 5-value length on all positions in the series, and taking the values covered by each window as the coordinates of a data point. The third dataset Color involves 4D vectors6 representing the color histograms of 68k images. We also generate a Signature dataset, where each object is a string with 65 English letters. We first obtain 20 “anchor signatures”, whose letters are randomly chosen from the alphabet. Then, each anchor produces a cluster with 2.5k objects (resulting in the total cardinality 50k), each of which is obtained by randomly changing x positions in the corresponding anchor signature to other random letters, where x is uniformly distributed in range [1, 18].

  File format: Each dataset is in binary format. The byte ordering conforms to little endian. The size of an "integer/float" is 4 bytes and that of a "char" is 1 byte. The first 8 bytes (2 integers) constitute the header: the first number gives the dataset cardinality, while the 2nd indicates the storage size of an object (in bytes). Then, object data are stored sequentially after the headers with this format.
 

 

Back to Yufei's home, or publication list