|
Privacy preserving data publication (Y. Tao)
Privacy preserving data publication has become one of the most important research topics, due to the current dilemma in releasing census records. Specifically, data users (particularly, researchers) are raising never-stronger demands for accurate information that permits effective analysis. On the other hand, data owners are becoming increasingly concerned about their privacy. To illustrate the dilemma, consider that the government wants to release a table R, which has a schema (Age, Gender, Zipcode, Occupation). Each tuple describes some personal information of a tax payer, as well as her/his job title. Here, assume that Occupation is “sensitive”, meaning that each person is reluctant to disclose her/his job nature. Note that, although R does not contain a directly-identifying attribute such as name or SSN, an adversary may still figure out the identity of a patient by consulting another database. For instance, assume that a government worker has access to the voter registration list L, which includes Age, Gender, Zipcode, and Name of each voter. Then, s/he may join R and L on their common attributes, and obtain the potential names for each tuple in R. To prevent privacy intrusion, the government can publish only a distorted version R* of R. As a result, a researcher, who carries out her/his investigation on R*, cannot always draw precise conclusions about R. Thus, an interesting issue arises: how to generate an R* that satisfies both patients and researchers? In this project, we will explore techniques for achieving this goal.
|