Privacy preserving data sharing in data mining environment

Sun, Xiaoxun (2010) Privacy preserving data sharing in data mining environment. [Thesis (PhD/Research)]

Text (Introductory Pages)

Download (1MB)
Text (Whole Thesis)

Download (1MB)


Numerous organizations collect and distribute non-aggregate personal data for a variety of different purposes, including demographic and public health research. In these situations, the data distributor is often faced with a quandary: on one hand, it is important to protect
the anonymity and personal information of individuals. While one the other hand, it is also important to preserve the utility of the data for research.

This thesis presents an extensive study of this problem. We focus primarily on notions of anonymity that are defined with respect to individual identity, or with respect to the value of a sensitive attribute. We discuss the anonymization techniques over relational data and
large survey rating data. For relational data, we propose a variety of techniques that use generalization
(also called recoding) and microaggregation to produce a sanitized view, while preserving the utility of the input data. Specifically, we provide a new structure called 'Privacy Hash Table'; propose three enhanced privacy models to limit the privacy leakage; we inject the purpose and trust into the data anonymization process to increase the utility of the anonymized data, and we enhance the microaggregation method by using concepts from
Information Theory. For survey rating data, we investigate two important problems (satisfaction and publication problems) in anonymizing survey rating data. By utilizing the characteristics of sparseness and high dimensionality, we develop a slicing technique for satisfaction
problems. By using graphical representation, we provide a comprehensive analysis of graphical modification strategies. For all the techniques developed in this thesis, we include a set of extensive evaluations to indicate that the techniques are possible to distribute high-quality data that respect several meaningful notions of privacy.

Statistics for USQ ePrint 19641
Statistics for this ePrint Item
Item Type: Thesis (PhD/Research)
Item Status: Live Archive
Additional Information: Doctor of Philosophy (PhD) thesis.
Faculty/School / Institute/Centre: Historic - Faculty of Sciences - Department of Maths and Computing (Up to 30 Jun 2013)
Faculty/School / Institute/Centre: Historic - Faculty of Sciences - Department of Maths and Computing (Up to 30 Jun 2013)
Supervisors: Wang, Hua; Plank, Ashley
Date Deposited: 15 Sep 2011 05:37
Last Modified: 18 Jul 2016 02:34
Uncontrolled Keywords: privacy; data
Fields of Research (2008): 08 Information and Computing Sciences > 0806 Information Systems > 080608 Information Systems Development Methodologies
08 Information and Computing Sciences > 0803 Computer Software > 080303 Computer System Security
Fields of Research (2020): 46 INFORMATION AND COMPUTING SCIENCES > 4609 Information systems > 460905 Information systems development methodologies and practice
46 INFORMATION AND COMPUTING SCIENCES > 4604 Cybersecurity and privacy > 460499 Cybersecurity and privacy not elsewhere classified

Actions (login required)

View Item Archive Repository Staff Only