Speech Sanitizer: Speech Content Desensitization and Voice Anonymization

  • Jianwei Qian
  • , Haohua Du
  • , Jiahui Hou
  • , Linlin Chen
  • , Taeho Jung
  • , Xiang Yang Li*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Voice input users' speech recordings are being collected by service providers and shared with third parties, who may abuse users' voiceprints, identify them by voice, and learn their sensitive speech content. In this work, we design Speech Sanitizer to perturb users' speech recordings so that the sanitized speech can be safely shared with third parties. First, we desensitize speech content by identifying sensitive words, localizing them in the audio using DTW-based keyword spotting, and substituting them with safe words. Both common and personalized sensitive words are identified and replaced. Then, we anonymize users' voiceprints with a carefully designed voice conversion mechanism that is resistant to de-anonymization attacks. Meanwhile, we try to preserve the utility of the sanitized speech, measured by the accuracy of speech recognition performed on it. We implement Speech Sanitizer and present extensive experimental results that validate the effectiveness and efficiency of our algorithms. It is demonstrated that we are able to reduce the chance of a user's voice being identified from 50 people by 83.7 percent while keeping the drop of speech recognition accuracy within 19.1 percent. We can also easily relax the privacy level to improve speech recognition accuracy.

Original languageEnglish
Pages (from-to)2631-2642
Number of pages12
JournalIEEE Transactions on Dependable and Secure Computing
Volume18
Issue number6
DOIs
StatePublished - 2021
Externally publishedYes

Keywords

  • Voice privacy
  • speech desensitization
  • speech privacy
  • speech sanitization
  • voice anonymization

Fingerprint

Dive into the research topics of 'Speech Sanitizer: Speech Content Desensitization and Voice Anonymization'. Together they form a unique fingerprint.

Cite this