|
We identify proximity breach as a privacy threat specific to
numerical sensitive attributes in anonymized data publication. Such breach
occurs when an adversary concludes with high confidence that the sensitive
value of a victim individual must fall in a short interval --- even though
the adversary may have low confidence about the victim's actual value.
None of the existing anonymization principles (e.g., k-anonymity,
l-diversity, etc.) can effectively prevent proximity breach. We remedy
the problem by introducing a novel principle called (e,
m)-anonymity. Intuitively, the principle demands that, given a QI-group
G, for every sensitive value x in G, at most 1 / m
of the tuples in G can have sensitive values "similar" to x,
where the similarity is controlled by e.
We provide a careful analytical study of the theoretical characteristics of
(e, m)-anonymity, and the corresponding
generalization algorithm. Our
findings are verified by experiments with real data.
|