Enhancing privacy-preserving data synthesis utility with two-stage diffusion models

Research output: Contribution to journalArticlepeer-review

Abstract

Data sharing and publishing practices are pivotal in unlocking the value of data. However, effectively addressing the challenge of safeguarding individual privacy remains a substantial hurdle. Privacy-preserving data synthesis is emerging as a novel solution for safeguarding data privacy, wherein generative models are utilized under differential privacy protection to synthesize surrogate data for downstream tasks in data publishing and sharing. The primary challenge lies in maintaining model utility while injecting the necessary noise for privacy guarantees. To address this, we propose a novel Two-stage Differentially Private Diffusion Model (TDPDM). We optimize privacy budget conservation by decoupling the training process into two stages: privacy encoding and non-privacy diffusion. Additionally, we employ a Privacy Random Variables accountant to estimate privacy budget expenditure in the privacy encoding phase. Empirical studies demonstrate that TDPDM significantly enhances synthetic data utility under identical privacy budgets. Specifically, our model achieves superior performance on both MNIST (with Fréchet Inception Distance (FID) of 15.61, Inception Score (IS) of 9.41, and 98.7 % Convolutional Neural Network (CNN) classification accuracy) and Fashion-MNIST (with FID of 42.62, IS of 7.33, and 87 % CNN classification accuracy). These results highlight the effectiveness of our model in generating high-fidelity data under strict privacy constraints.

Original languageEnglish
Article number122939
JournalInformation Sciences
Volume732
DOIs
StatePublished - 15 Apr 2026

Keywords

  • Data privacy
  • Data sharing
  • Data synthesis
  • Differential privacy
  • Generative models

Fingerprint

Dive into the research topics of 'Enhancing privacy-preserving data synthesis utility with two-stage diffusion models'. Together they form a unique fingerprint.

Cite this