跳到主要导航 跳到搜索 跳到主要内容

LYRICWHIZ: ROBUST MULTILINGUAL ZERO-SHOT LYRICS TRANSCRIPTION BY WHISPERING TO CHATGPT

  • Le Zhuo
  • , Ruibin Yuan
  • , Jiahao Pan
  • , Yinghao Ma
  • , Yizhi Li
  • , Ge Zhang
  • , Si Liu
  • , Roger Dannenberg
  • , Jie Fu
  • , Chenghua Lin
  • , Emmanouil Benetos
  • , Wenhu Chen
  • , Wei Xue
  • , Yike Guo
  • Beihang University
  • Beijing Academy of Artificial Intelligence
  • Carnegie Mellon University
  • Hong Kong University of Science and Technology
  • Queen Mary University of London
  • University of Sheffield
  • University of Waterloo

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock and metal. Our novel, training-free approach utilizes Whisper, a weakly supervised robust speech recognition model, and GPT-4, today's most performant chat-based large language model. In the proposed method, Whisper functions as the "ear"by transcribing the audio, while GPT-4 serves as the "brain,"acting as an annotator with a strong performance for contextualized output selection and correction. Our experiments show that LyricWhiz significantly reducesWord Error Rate compared to existing methods in English and can effectively transcribe lyrics across multiple languages. Furthermore, we use LyricWhiz to create the first publicly available, large-scale, multilingual lyrics transcription dataset with a CC-BY-NC-SA copyright license, based on MTG-Jamendo, and offer a human-annotated subset for noise level estimation and evaluation. We anticipate that our proposed method and dataset will advance the development of multilingual lyrics transcription, a challenging and emerging task.

源语言英语
主期刊名24th International Society for Music Information Retrieval Conference, ISMIR 2023 - Proceedings
编辑Augusto Sarti, Fabio Antonacci, Mark Sandler, Paolo Bestagini, Simon Dixon, Beici Liang, Gael Richard, Johan Pauwels
出版商International Society for Music Information Retrieval
343-351
页数9
ISBN(电子版)9781732729933
出版状态已出版 - 2023
活动24th International Society for Music Information Retrieval Conference, ISMIR 2023 - Milan, 意大利
期限: 5 11月 20239 11月 2023

出版系列

姓名24th International Society for Music Information Retrieval Conference, ISMIR 2023 - Proceedings

会议

会议24th International Society for Music Information Retrieval Conference, ISMIR 2023
国家/地区意大利
Milan
时期5/11/239/11/23

指纹

探究 'LYRICWHIZ: ROBUST MULTILINGUAL ZERO-SHOT LYRICS TRANSCRIPTION BY WHISPERING TO CHATGPT' 的科研主题。它们共同构成独一无二的指纹。

引用此