Range Search on Multidimensional Uncertain Data

 

Yufei Tao, Xiaokui Xiao, Reynold Cheng

 

ACM Transactions on Databases Systems (TODS), 32(3), 2007

 
  Abstract

  
  In an uncertain database, every object o is associated with a probability density function, which describes the likelihood that o appears at each position in a multidimensional workspace. This article studies two types of range retrieval fundamental to many analytical tasks. Specifically, a nonfuzzy query returns all the objects that appear in a search region rq with at least a certain probability tq. On the other hand, given an uncertain object q, fuzzy search retrieves the set of objects that are within distance εq from q with no less than probability tq. The core of our methodology is a novel concept of "probabilistically constrained rectangle", which permits effective pruning/validation of nonqualifying/qualifying data. We develop a new index structure called the U-tree for minimizing the query overhead. Our algorithmic findings are accompanied with a thorough theoretical analysis, which reveals valuable insight into the problem characteristics, and mathematically confirms the efficiency of our solutions. We verify the effectiveness of the proposed techniques with extensive experiments.
 

  Paper download

  
   The short version appeared in VLDB 2005. 
 
  Implementation and datasets
 
  Before you proceed with downloading, please read and agree to the terms of using our implementation.
 
  Implementation of the U-tree  (implemented: by Xiaokui Xiao).
  Datasets used in our experiments: CA, LB.

  Dataset format:
 
CA (LB): Each line corresponds to a location in California (the Long Beach county), in the form of:
   tuple-id  <space> x-  <space> y-coordinate

   Based on each point in CA (LB), an uncertain object is defined as described in the experimental settings of our paper. The PCRs of the object are computed in the following function of the U-tree implementation: void U_Tree::init(char *_fname, int _b_length, Cache *_c, int _dimension, int nCataSize, float* pCatalog).  
 
 

Back to Yufei's home, or publication list