ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients' disease history
ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients' disease history
Given the rapid rate at which text data are being digitally gathered in the medical domain, there is growing need for automated tools that can analyze and classify clinical notes in electronic health records. One challenge in classifying clinical notes is the sparsity of co-occurrence patterns in short texts. Current approaches for short text classification are dependent on external information to improve classification performance. However, this is implausible owing to a lack of universal medical dictionaries. This study proposes the ETM (enrichment by topic modeling) algorithm, based on latent Dirichlet allocation, to smoothen the semantic representations of short texts. The ETM enriches text representation by incorporating probability distributions generated by an unsupervised algorithm into it. It considers the length of the original texts to enhance representation by using an internal knowledge acquisition procedure. When it comes to clinical predictive modeling, interpretability improves the acceptance of the model. Thus, for clinical short text classification, the ETM approach employs an initial TFiDF (term frequency inverse document frequency) representation, where we use the support vector machine and neural network algorithms for the classification task. We conducted three sets of experiments on a dataset consisting of clinical cardiovascular notes from the Netherlands to test the classification performance of the proposed method in comparison with prevalent approaches. The results show that the proposed ETM approach outperformed state-of-the-art baselines.
Clinical sentence classification, Enriched text representation, Latent Dirichlet allocation, Sentence classification, Short text classification
329-349
Bagheri, A.
8d522dbe-4846-48a4-9302-cee56df42e6b
Sammani, A.
45c7db1b-b39e-45cb-b2cb-fc3a796f9994
Van Der Heijden, Peter
85157917-3b33-4683-81be-713f987fd612
Asselbergs, F.W
b51ea3b9-6b09-4dc4-83f4-6d141d74ff3e
Oberski, D.L
cccd0aac-c4d4-41ac-9862-d368e143751b
October 2020
Bagheri, A.
8d522dbe-4846-48a4-9302-cee56df42e6b
Sammani, A.
45c7db1b-b39e-45cb-b2cb-fc3a796f9994
Van Der Heijden, Peter
85157917-3b33-4683-81be-713f987fd612
Asselbergs, F.W
b51ea3b9-6b09-4dc4-83f4-6d141d74ff3e
Oberski, D.L
cccd0aac-c4d4-41ac-9862-d368e143751b
Bagheri, A., Sammani, A., Van Der Heijden, Peter, Asselbergs, F.W and Oberski, D.L
(2020)
ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients' disease history.
Journal of Intelligent Information Systems, 55 (2), .
(doi:10.1007/s10844-020-00605-w).
Abstract
Given the rapid rate at which text data are being digitally gathered in the medical domain, there is growing need for automated tools that can analyze and classify clinical notes in electronic health records. One challenge in classifying clinical notes is the sparsity of co-occurrence patterns in short texts. Current approaches for short text classification are dependent on external information to improve classification performance. However, this is implausible owing to a lack of universal medical dictionaries. This study proposes the ETM (enrichment by topic modeling) algorithm, based on latent Dirichlet allocation, to smoothen the semantic representations of short texts. The ETM enriches text representation by incorporating probability distributions generated by an unsupervised algorithm into it. It considers the length of the original texts to enhance representation by using an internal knowledge acquisition procedure. When it comes to clinical predictive modeling, interpretability improves the acceptance of the model. Thus, for clinical short text classification, the ETM approach employs an initial TFiDF (term frequency inverse document frequency) representation, where we use the support vector machine and neural network algorithms for the classification task. We conducted three sets of experiments on a dataset consisting of clinical cardiovascular notes from the Netherlands to test the classification performance of the proposed method in comparison with prevalent approaches. The results show that the proposed ETM approach outperformed state-of-the-art baselines.
Text
ETM Enrichment by Topic Modeling for Automated Clinical Sentence._
- Accepted Manuscript
Text
JIIS-D-19-00360_R2_Accepted
- Accepted Manuscript
Restricted to Repository staff only
Request a copy
Text
Bagheri 2020 Article ETM Enrichment By Topic Modeling
- Version of Record
More information
Accepted/In Press date: 14 April 2020
e-pub ahead of print date: 28 April 2020
Published date: October 2020
Keywords:
Clinical sentence classification, Enriched text representation, Latent Dirichlet allocation, Sentence classification, Short text classification
Identifiers
Local EPrints ID: 439589
URI: http://eprints.soton.ac.uk/id/eprint/439589
PURE UUID: bf2acb51-ea60-44fb-9adf-8b96aad3b491
Catalogue record
Date deposited: 27 Apr 2020 16:46
Last modified: 28 Apr 2022 04:34
Export record
Altmetrics
Contributors
Author:
A. Bagheri
Author:
A. Sammani
Author:
F.W Asselbergs
Author:
D.L Oberski
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics