m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets

 

Xiaokui Xiao and Yufei Tao

 

In ACM Conference on Management of Data (SIGMOD), 2007

 
Abstract


The previous literature of privacy preserving data publication has focused on performing "one-time" releases. Specifically, none of the existing solutions supports re-publication of the microdata, after it has been updated with insertions and deletions. This is a serious drawback, because currently a publisher cannot provide researchers with the most recent dataset continuously.

This paper remedies the drawback. First, we reveal the characteristics of the re-publication problem that invalidate the conventional approaches leveraging k-anonymity and l-diversity. Based on rigorous theoretical analysis, we develop a new generalization principle m-invariance that effectively limits the risk of privacy disclosure in re-publication. We accompany the principle with an algorithm, which computes privacy-guarded relations that permit retrieval of accurate aggregate information about the original microdata. Our theoretical results are confirmed by extensive experiments with real data.
 

Paper download

     
 
Implementation and datasets

Before you proceed with downloading, please read and agree to the terms of using our implementation.
 
Download our source codes for m-invariance and l-diversity (implemented by Xiaokui Xiao)
Datasets used in our experiments: OCC, and SAL.

Dataset formats
OCC: Each line corresponds to the personal information of an American adult, in the form of:
    tuple-id <s> age <s> gender <s> education <s> birth-place <s> omit <s> omit <s> omit <s> occupation
where <s> denotes a space, and "omit" represents an attribute that should be omitted.

SAL: Each line corresponds to the personal information of an American adult, in the form of:
    tuple-id <s> age <s> gender <s> education <s> birth-place <s> omit <s> omit <s> omit <s> salary

 

 

Back to Yufei's home, or publication list