Deep learning-based natural language processing for detecting medical symptoms and histories in emergency patient triage - 12/02/24

Doi : 10.1016/j.ajem.2023.11.063

Siryeol Lee, B.S. ^a, Juncheol Lee, Ph.D. ^b, Juntae Park, B.S. ^c, Jiwoo Park, B.S. ^d, Dohoon Kim, B.S. ^e, Joohyun Lee, Ph.D. ^a,^c,^{^⁎} , Jaehoon Oh, Ph.D. ^b,^{^⁎⁎}
^a Department of Applied Artificial Intelligence, Hanyang University ERICA, Ansan, Republic of Korea
^b Department of Emergency Medicine, College of Medicine, Hanyang University, Seoul, Republic of Korea
^c School of Electrical Engineering, Hanyang University ERICA, Ansan, Republic of Korea
^d Department of Emergency Medicine, Hanyang University Hospital, Seoul, Republic of Korea
^e Department of Translational Medicine, Biomedical Science and Engineering Hanyang University, Seoul, Republic of Korea

^⁎Correspondence to: Joohyun Lee, School of Electrical Engineering, Hanyang University, 55 Hanyangdaehak-ro, Sangnok-gu, Ansan 15588, Republic of Korea.School of Electrical EngineeringHanyang University55 Hanyangdaehak-ro, Sangnok-guAnsan15588Republic of Korea^⁎⁎Correspondence to: Jaehoon Oh, Department of Emergency Medicine, College of Medicine, Hanyang University, 222-1 Wangsimni-ro, Seongdong-gu, Seoul 04763, Republic of Korea.Department of Emergency Medicine, College of MedicineHanyang University222-1 Wangsimni-ro, Seongdong-guSeoul04763Republic of Korea

Abstract

Objective

The manual recording of electronic health records (EHRs) by clinicians in the emergency department (ED) is time-consuming and challenging. In light of recent advancements in large language models (LLMs) such as GPT and BERT, this study aimed to design and validate LLMs for automatic clinical diagnoses. The models were designed to identify 12 medical symptoms and 2 patient histories from simulated clinician–patient conversations within 6 primary symptom scenarios in emergency triage rooms.

Materials and method

We developed classification models by fine-tuning BERT, a transformer-based pre-trained model. We subsequently analyzed these models using eXplainable artificial intelligence (XAI) and the Shapley additive explanation (SHAP) method. A Turing test was conducted to ascertain the reliability of the XAI results by comparing them to the outcomes of tasks performed and explained by medical workers. An emergency medicine specialist assessed the results of both XAI and the medical workers.

Results

We fine-tuned four pre-trained LLMs and compared their classification performance. The KLUE-RoBERTa-based model demonstrated the highest performance (F1-score: 0.965, AUROC: 0.893) on human-transcribed script data. The XAI results using SHAP showed an average Jaccard similarity of 0.722 when compared with explanations of medical workers for 15 samples. The Turing test results revealed a small 6% gap, with XAI and medical workers receiving the mean scores of 3.327 and 3.52, respectively.

Conclusion

This paper highlights the potential of LLMs for automatic EHR recording in Korean EDs. The KLUE-RoBERTa-based model demonstrated superior classification performance. Furthermore, XAI using SHAP provided reliable explanations for model outputs. The reliability of these explanations was confirmed by a Turing test.

Le texte complet de cet article est disponible en PDF.

Highlights

•	The data was collected from simulated clinician-patient conversations.
•	The fine-tuned large language model identifies medical information included in electronic health records.
•	The outcomes of the model were interpreted through eXplainable AI.
•	The Turing test was conducted to demonstrate the reliability of the eXplainable AI results.

Le texte complet de cet article est disponible en PDF.

Keywords : Natural language processing, Electronic health record, Large language models, eXplainable artificial intelligence, Turing test

Plan

Introduction

Related work

Objective

Materials and methods

Data collection and preprocessing

Classification model training

Evaluation of the classification model

XAI using SHAP

Explanations of medical workers

Turing test with XAI and expert explanations

Ethics statement

Results

CER of transcription using STT

Classification performance and results

Expert and XAI explanation similarity

Result of the Turing test

Discussion

Conclusion

CRediT authorship contribution statement

Export

Vol 77

P. 29-38 - mars 2024 Retour au numéro

Article précédent

Methodological quality of systematic reviews on sepsis treatments: A cross-sectional study
Leonard Ho, Xi Chen, Yan Ling Kwok, Irene X.Y. Wu, Chen Mao, Vincent Chi Ho Chung

| Article suivant

The TriAGe + score for vertigo or dizziness: A validation study in a university hospital emergency department in Hong Kong
Adrian Ho-Kun Yu, Ling Yan Leung, Thomas W.H. Leung, Jill M. Abrigo, Koon Ho Cheung, Chi Hung Cheng, Colin A. Graham

Bienvenue sur EM-consulte, la référence des professionnels de santé.
L’accès au texte intégral de cet article nécessite un abonnement.

Déjà abonné à cette revue ?

connectez-vous ou créez un compte

Deep learning-based natural language processing for detecting medical symptoms and histories in emergency patient triage - 12/02/24

Abstract

Objective

Materials and method

Results

Conclusion

Highlights

Plan

Export citations

Fichier

Contenu

Accès rapides

Mon compte

Aide & support

Plateformes Elsevier Masson

Déclaration CNIL