Skip to main navigation Skip to search Skip to main content

Large language model comparisons between English and Chinese query performance for cardiovascular prevention

  • Hongwei Ji
  • , Xiaofei Wang
  • , Ching Hui Sia
  • , Jonathan Yap
  • , Soo Teik Lim
  • , Andie Hartanto Djohan
  • , Yaowei Chang
  • , Ning Zhang
  • , Mengqi Guo
  • , Fuhai Li
  • , Zhi Wei Lim
  • , Ya Xing Wang
  • , Bin Sheng
  • , Tien Yin Wong
  • , Susan Cheng
  • , Khung Keong Yeo
  • , Yih Chung Tham*
  • *Corresponding author for this work
  • Tsinghua University
  • Qingdao University
  • National University of Singapore
  • National Heart Centre Singapore
  • Duke-NUS Medical School
  • Yamaguchi University
  • Shanghai Jiao Tong University
  • Singapore National Eye Center
  • Cedars-Sinai Medical Center

Research output: Contribution to journalArticlepeer-review

Abstract

Recently there has been an increase in the use of large language model (LLM) chatbots by patients seeking medical information. However, the accuracy of information provided by LLMs across different languages remain unclear. This study aimed to evaluate the performance of popular LLM chatbots, such as BARD, ChatGPT-3.5, ChatGPT-4.0, and ERNIE, in answering cardiovascular disease prevention questions in both English and Chinese. We tested these models with 75 questions each, focusing on the accuracy of their responses and their ability to improve over time. The results showed that ChatGPT-4 provided the most accurate answers in English and demonstrated the best improvement over time. In Chinese, ERNIE performed better in improving its responses over time. This research highlights the need for ongoing evaluations to ensure the spread of reliable health information by LLMs across diverse languages.

Original languageEnglish
Article number177
JournalCommunications Medicine
Volume5
Issue number1
DOIs
StatePublished - Dec 2025

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Fingerprint

Dive into the research topics of 'Large language model comparisons between English and Chinese query performance for cardiovascular prevention'. Together they form a unique fingerprint.

Cite this