PhD Oral Exam - Rashid Hussain Khokhar, Information and Systems Engineering
Anonymizing and Trading Person-specific Data with Trust
This event is free
School of Graduate Studies
When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.
Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.
In the past decade, data privacy, security, and trustworthiness have gained tremendous attention from research communities, and these are still active areas of research with the proliferation of cloud services and social media applications. The data is growing at a rapid pace. It has become an integral part of almost every industry and business, including commercial and non-profit organizations. It often contains person-specific information and a data custodian who holds it must be responsible for managing its use, disclosure, accuracy and privacy protection. In this thesis, we present three research problems. The first two problems address the concerns of stakeholders on privacy protection, data trustworthiness, and profit distribution in the online market for trading person-specific data. The third problem addresses the health information custodians (HICs) concern on privacy-preserving healthcare network data publishing.
Our first research problem is identified in cloud-based data integration service where data providers collaborate with their trading partners in order to deliver quality data mining services. Data-as-a-Service (DaaS) enables data integration to serve the demands of data consumers. Data providers face challenges not only to protect private data over the cloud but also to legally adhere to privacy compliance rules when trading person-specific data. We propose a model that allows the collaboration of multiple data providers for integrating their data and derives the contribution of each data provider by valuating the incorporated cost factors. This model serves as a guide for business decision-making, such as estimating the potential privacy risk and finding the sub-optimal value for publishing mashup data. Experiments on real-life data demonstrate that our approach can identify the sub-optimal value in data mashup for different privacy models, including K-anonymity, LKC-privacy, and E-differential privacy, with various anonymization algorithms and privacy parameters.
Second, consumers demand a good quality of data for accurate analysis and effective decision- making while the data providers intend to maximize their profits by competing with peer providers. In addition, the data providers or custodians must conform to privacy policies to avoid potential penalties for privacy breaches. To address these challenges, we propose a two-fold solution: (1) we present the first information entropy-based trust computation algorithm, IEB_Trust, that allows a semi-trusted arbitrator to detect the covert behavior of a dishonest data provider and chooses the qualified providers for a data mashup, and (2) we incorporate the Vickrey-Clarke-Groves (VCG) auction mechanism for the valuation of data providers’ attributes into the data mashup process. Experiments on real-life data demonstrate the robustness of our approach in restricting dishonest providers from participation in the data mashup and improving the efficiency in comparison to provenance-based approaches. Furthermore, we derive the monetary shares for the chosen providers from their information utility and trust scores over the differentially-private release of the integrated dataset under their joint privacy requirements.
Finally, we address the concerns of HICs of exchanging healthcare data to provide better and more timely services while mitigating the risk of exposing patients’ sensitive information to privacy threats. We first model a complex healthcare dataset using a heterogeneous information network that consists of multi-type entities and their relationships. We then propose DiffHetNet, an edge-based differentially-private algorithm, to protect the sensitive links of patients from inbound and outbound attacks in the heterogeneous health network. We evaluate the performance of our proposed method in terms of information utility and efficiency on different types of real-life datasets that can be modeled as networks. Experimental results suggest that DiffHetNet generally yields less information loss and is significantly more efficient in terms of runtime in comparison with existing network anonymization methods. Furthermore, DiffHetNet is scalable to large network datasets.