跳到主要导航 跳到搜索 跳到主要内容

Generative Spoken Language Modeling with Quantized Feature Enhancement

  • Beihang University
  • Orange R&D Beijing Company Limited

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

In the absence of text, training generative models directly on speech data through next token prediction task, similar to text-based language models, has demonstrated its feasibility. However, speech data encompasses more intricate feature information compared to text. To capitalize on these additional features, we propose a feature-enhanced generative spoken language modeling (fGSLM). We calculate the difference between the original speech and its normalized version, and extract quantized features with a VQVAE-structured model. These features are subsequently integrated into the generative spoken language modeling (GSLM) by fine-tuning the unit language model (uLM) through a multi-stream transformer. To evaluate the effectiveness of our model, we conduct experiments on the ProsAudit evaluation task in the Zero Resource Speech Challenge. Experimental results show that our model significantly improves prosody comprehension both at the sentence and lexical levels, and achieves superior performance against baseline models.

源语言英语
主期刊名2024 International Joint Conference on Neural Networks, IJCNN 2024 - Proceedings
出版商Institute of Electrical and Electronics Engineers Inc.
ISBN(电子版)9798350359312
DOI
出版状态已出版 - 2024
活动2024 International Joint Conference on Neural Networks, IJCNN 2024 - Yokohama, 日本
期限: 30 6月 20245 7月 2024

出版系列

姓名Proceedings of the International Joint Conference on Neural Networks

会议

会议2024 International Joint Conference on Neural Networks, IJCNN 2024
国家/地区日本
Yokohama
时期30/06/245/07/24

指纹

探究 'Generative Spoken Language Modeling with Quantized Feature Enhancement' 的科研主题。它们共同构成独一无二的指纹。

引用此