TY - GEN
T1 - Rare Codes Count
T2 - 5th Workshop on Clinical Natural Language Processing, ClinicalNLP 2023. held at ACL 2023
AU - Chen, Jiamin
AU - Li, Xuhong
AU - Xi, Junting
AU - Yu, Lei
AU - Xiong, Haoyi
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - Multi-label clinical text classification, such as automatic ICD coding, has always been a challenging subject in Natural Language Processing, due to its long, domain-specific documents and long-tail distribution over a large label set. Existing methods adopt different model architectures to encode the clinical notes. Whereas without digging out the useful connections between labels, the model presents a huge gap in predicting performances between rare and frequent codes. In this work, we propose a novel method for further mining the helpful relations between different codes via a relationenhanced code encoder to improve the rare code performance. Starting from the simple code descriptions, the model reaches comparable, even better performances than models with heavy external knowledge. Our proposed method is evaluated on MIMIC-III, a common dataset in the medical domain. It outperforms the previous state-of-art models on both overall metrics and rare code performances. Moreover, the interpretation results further prove the effectiveness of our methods. Our code is publicly available1 .
AB - Multi-label clinical text classification, such as automatic ICD coding, has always been a challenging subject in Natural Language Processing, due to its long, domain-specific documents and long-tail distribution over a large label set. Existing methods adopt different model architectures to encode the clinical notes. Whereas without digging out the useful connections between labels, the model presents a huge gap in predicting performances between rare and frequent codes. In this work, we propose a novel method for further mining the helpful relations between different codes via a relationenhanced code encoder to improve the rare code performance. Starting from the simple code descriptions, the model reaches comparable, even better performances than models with heavy external knowledge. Our proposed method is evaluated on MIMIC-III, a common dataset in the medical domain. It outperforms the previous state-of-art models on both overall metrics and rare code performances. Moreover, the interpretation results further prove the effectiveness of our methods. Our code is publicly available1 .
UR - https://www.scopus.com/pages/publications/85175444249
M3 - 会议稿件
AN - SCOPUS:85175444249
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 403
EP - 413
BT - 5th Workshop on Clinical Natural Language Processing, ClinicalNLP 2023 - Proceedings of the Workshop
PB - Association for Computational Linguistics (ACL)
Y2 - 14 July 2023
ER -