Output Perturbation with Query Relaxation

 

Xiaokui Xiao and Yufei Tao

 

In Very Large Data Bases conference (VLDB), 2008

 
Abstract

Given a dataset containing sensitive personal information, a statistical database answers aggregate queries in a manner that preserves individual privacy. We consider the problem of constructing a statistical database using output perturbation, which protects privacy by injecting a small noise into each query result. We show that the state-of-the-art approach, e-differential privacy, suffers from two severe deficiencies: it (i) incurs prohibitive computation overhead, and (ii) can answer only a limited number of queries, after which the statistical database has to be shut down. To remedy the problem, we develop a new technique that enforces e-different privacy with economical cost. Our technique also incorporates a query relaxation mechanism, which removes the restriction on the number of permissible queries. The effectiveness and efficiency of our solution are verified through experiments with real data.

Major theoretical results
     
1. Enforcing e-differential privacy exactly is NP-hard, because judging whether the L1 sensitivity is below a threshold is NP-hard.

2. An upper bound on the maximum number of queries that can be answered by output perturbation with Laplace noise.

3. Introduction of a novel concept called convergence which provides a 2-approximation of the L1 sensitivity, and can be computed in polynomial time. This allows us to enforce e-differential privacy in a conservative manner.
 
Paper download

  
 
Implementation and datasets


Before you proceed with downloading, please read and agree to the terms of using our implementation.

Download our implementation (by Xiaokui Xiao).
Download the adult dataset used in our experiments.
 

Supporting grants

CUHK 4161/07 and 1202/06
 
 

 

Back to Yufei's home, or publication list