Deep learning-based natural language processing for detecting medical symptoms and histories in emergency patient triage - 12/02/24
Abstract |
Objective |
The manual recording of electronic health records (EHRs) by clinicians in the emergency department (ED) is time-consuming and challenging. In light of recent advancements in large language models (LLMs) such as GPT and BERT, this study aimed to design and validate LLMs for automatic clinical diagnoses. The models were designed to identify 12 medical symptoms and 2 patient histories from simulated clinician–patient conversations within 6 primary symptom scenarios in emergency triage rooms.
Materials and method |
We developed classification models by fine-tuning BERT, a transformer-based pre-trained model. We subsequently analyzed these models using eXplainable artificial intelligence (XAI) and the Shapley additive explanation (SHAP) method. A Turing test was conducted to ascertain the reliability of the XAI results by comparing them to the outcomes of tasks performed and explained by medical workers. An emergency medicine specialist assessed the results of both XAI and the medical workers.
Results |
We fine-tuned four pre-trained LLMs and compared their classification performance. The KLUE-RoBERTa-based model demonstrated the highest performance (F1-score: 0.965, AUROC: 0.893) on human-transcribed script data. The XAI results using SHAP showed an average Jaccard similarity of 0.722 when compared with explanations of medical workers for 15 samples. The Turing test results revealed a small 6% gap, with XAI and medical workers receiving the mean scores of 3.327 and 3.52, respectively.
Conclusion |
This paper highlights the potential of LLMs for automatic EHR recording in Korean EDs. The KLUE-RoBERTa-based model demonstrated superior classification performance. Furthermore, XAI using SHAP provided reliable explanations for model outputs. The reliability of these explanations was confirmed by a Turing test.
Le texte complet de cet article est disponible en PDF.Highlights |
• | The data was collected from simulated clinician-patient conversations. |
• | The fine-tuned large language model identifies medical information included in electronic health records. |
• | The outcomes of the model were interpreted through eXplainable AI. |
• | The Turing test was conducted to demonstrate the reliability of the eXplainable AI results. |
Keywords : Natural language processing, Electronic health record, Large language models, eXplainable artificial intelligence, Turing test
Plan
Vol 77
P. 29-38 - mars 2024 Retour au numéroBienvenue sur EM-consulte, la référence des professionnels de santé.
L’accès au texte intégral de cet article nécessite un abonnement.
Déjà abonné à cette revue ?