Is language modeling enough? Evaluating effective embedding combinations
Is language modeling enough? Evaluating effective embedding combinations
Universal embeddings, such as BERT or ELMo, are useful for a broad set of natural language processing tasks like text classification or sentiment analysis. Moreover, specialized embeddings also exist for tasks like topic modeling or named entity disambiguation. We study if we can complement these universal embeddings with specialized embeddings. We conduct an in-depth evaluation of nine well known natural language understanding tasks with SentEval. Also, we extend SentEval with two additional tasks to the medical domain. We present PubMedSection, a novel topic classification dataset focussed on the biomedical domain. Our comprehensive analysis covers 11 tasks and combinations of six embeddings. We report that combined embeddings outperform state of the art universal embeddings without any embedding fine-tuning. We observe that adding topic model based embeddings helps for most tasks and that differing pre-training tasks encode complementary features. Moreover, we present new state of the art results on the MPQA and SUBJ tasks in SentEval.
Schneider, Rudolf
de17d245-1142-433b-a4db-704531922037
Oberhauser, Tom
1a354535-f0d6-4337-bdf0-f85b48920f46
Grundmann, Paul
89c6b557-3123-49a3-a110-cfcd42b65861
Gers, Felix Alexander
3558b668-ca2c-4ef0-9fd8-b2f17440e126
Löser, Alexander
d4833e3d-0f0d-40c9-86a1-a557594e60f9
Staab, Steffen
bf48d51b-bd11-4d58-8e1c-4e6e03b30c49
Schneider, Rudolf
de17d245-1142-433b-a4db-704531922037
Oberhauser, Tom
1a354535-f0d6-4337-bdf0-f85b48920f46
Grundmann, Paul
89c6b557-3123-49a3-a110-cfcd42b65861
Gers, Felix Alexander
3558b668-ca2c-4ef0-9fd8-b2f17440e126
Löser, Alexander
d4833e3d-0f0d-40c9-86a1-a557594e60f9
Staab, Steffen
bf48d51b-bd11-4d58-8e1c-4e6e03b30c49
Schneider, Rudolf, Oberhauser, Tom, Grundmann, Paul, Gers, Felix Alexander, Löser, Alexander and Staab, Steffen
(2020)
Is language modeling enough? Evaluating effective embedding combinations.
Proceedings of the 12th International Conference on Language Resources and Evaluation, , Marseille, France.
11 - 16 May 2020.
(In Press)
Record type:
Conference or Workshop Item
(Paper)
Abstract
Universal embeddings, such as BERT or ELMo, are useful for a broad set of natural language processing tasks like text classification or sentiment analysis. Moreover, specialized embeddings also exist for tasks like topic modeling or named entity disambiguation. We study if we can complement these universal embeddings with specialized embeddings. We conduct an in-depth evaluation of nine well known natural language understanding tasks with SentEval. Also, we extend SentEval with two additional tasks to the medical domain. We present PubMedSection, a novel topic classification dataset focussed on the biomedical domain. Our comprehensive analysis covers 11 tasks and combinations of six embeddings. We report that combined embeddings outperform state of the art universal embeddings without any embedding fine-tuning. We observe that adding topic model based embeddings helps for most tasks and that differing pre-training tasks encode complementary features. Moreover, we present new state of the art results on the MPQA and SUBJ tasks in SentEval.
Text
LREC20_LM_TM(27)(1)
- Author's Original
More information
Accepted/In Press date: 11 February 2020
Venue - Dates:
Proceedings of the 12th International Conference on Language Resources and Evaluation, , Marseille, France, 2020-05-11 - 2020-05-16
Identifiers
Local EPrints ID: 438613
URI: http://eprints.soton.ac.uk/id/eprint/438613
PURE UUID: df30f6d4-8937-433b-a872-ad41fa4c68c3
Catalogue record
Date deposited: 18 Mar 2020 17:33
Last modified: 17 Mar 2024 03:38
Export record
Contributors
Author:
Rudolf Schneider
Author:
Tom Oberhauser
Author:
Paul Grundmann
Author:
Felix Alexander Gers
Author:
Alexander Löser
Author:
Steffen Staab
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics