Personalized Privacy Preservation

 

Xiaokui Xiao and Yufei Tao

 

In ACM Conference on Management of Data (SIGMOD), 2006

 
Abstract


We study generalization for preserving privacy in publication of sensitive data. The existing methods focus on a universal approach that exerts the same amount of preservation for all persons, without catering for their concrete needs. The consequence is that we may be offering insufficient protection to a subset of people, while applying excessive privacy control to another subset.

Motivated by this, we present a new generalization framework based on the concept of  personalized anonymity. Our technique performs the minimum generalization for satisfying everybody's requirements, and thus, retains the largest amount of information from the microdata. We carry out a careful theoretical study that leads to valuable insight into the behavior of alternative solutions. In particular, our analysis mathematically reveals the circumstances where the previous work fails to protect privacy, and establishes the superiority of the proposed solutions. The theoretical findings are verified with extensive experiments.
 

 

Paper download

     
 
Implementation and datasets

Before you proceed with downloading, please read and agree to the terms of using our implementation.
 
Download our source codes (implemented by Xiaokui Xiao)
Datasets used in our experiments: Primary, Nonprimary.

Dataset format: Each line corresponds to a tuple containing 9 numbers with the following semantics:
                         (tuple id, age, gender, marital status, education, occupation, income, guarding node, individual id).
All the fields are self-illustrative except guarding node (GN). GN takes 3 possible values: -1, 0, 1. Specifically, GN = -1 means that the corresponding individual specifies no guarding node, GN = 0 means that the guarding node is the same as the income value (i.e., a leaf node in the taxonomy) of the tuple, and GN = 1 means that the guarding node is the parent of the income value. For experiments with no personalization, simply ignore the GN column, and treat all the GN values as 0.
 
 

Back to Yufei's home, or publication list