The University of Southampton
University of Southampton Institutional Repository

Is language modeling enough? Evaluating effective embedding combinations

Is language modeling enough? Evaluating effective embedding combinations
Is language modeling enough? Evaluating effective embedding combinations
Universal embeddings, such as BERT or ELMo, are useful for a broad set of natural language processing tasks like text classification or sentiment analysis. Moreover, specialized embeddings also exist for tasks like topic modeling or named entity disambiguation. We study if we can complement these universal embeddings with specialized embeddings. We conduct an in-depth evaluation of nine well known natural language understanding tasks with SentEval. Also, we extend SentEval with two additional tasks to the medical domain. We present PubMedSection, a novel topic classification dataset focussed on the biomedical domain. Our comprehensive analysis covers 11 tasks and combinations of six embeddings. We report that combined embeddings outperform state of the art universal embeddings without any embedding fine-tuning. We observe that adding topic model based embeddings helps for most tasks and that differing pre-training tasks encode complementary features. Moreover, we present new state of the art results on the MPQA and SUBJ tasks in SentEval.
Schneider, Rudolf
de17d245-1142-433b-a4db-704531922037
Oberhauser, Tom
1a354535-f0d6-4337-bdf0-f85b48920f46
Grundmann, Paul
89c6b557-3123-49a3-a110-cfcd42b65861
Gers, Felix Alexander
3558b668-ca2c-4ef0-9fd8-b2f17440e126
Löser, Alexander
d4833e3d-0f0d-40c9-86a1-a557594e60f9
Staab, Steffen
bf48d51b-bd11-4d58-8e1c-4e6e03b30c49
Schneider, Rudolf
de17d245-1142-433b-a4db-704531922037
Oberhauser, Tom
1a354535-f0d6-4337-bdf0-f85b48920f46
Grundmann, Paul
89c6b557-3123-49a3-a110-cfcd42b65861
Gers, Felix Alexander
3558b668-ca2c-4ef0-9fd8-b2f17440e126
Löser, Alexander
d4833e3d-0f0d-40c9-86a1-a557594e60f9
Staab, Steffen
bf48d51b-bd11-4d58-8e1c-4e6e03b30c49

Schneider, Rudolf, Oberhauser, Tom, Grundmann, Paul, Gers, Felix Alexander, Löser, Alexander and Staab, Steffen (2020) Is language modeling enough? Evaluating effective embedding combinations. Proceedings of the 12th International Conference on Language Resources and Evaluation, , Marseille, France. 11 - 16 May 2020. (In Press)

Record type: Conference or Workshop Item (Paper)

Abstract

Universal embeddings, such as BERT or ELMo, are useful for a broad set of natural language processing tasks like text classification or sentiment analysis. Moreover, specialized embeddings also exist for tasks like topic modeling or named entity disambiguation. We study if we can complement these universal embeddings with specialized embeddings. We conduct an in-depth evaluation of nine well known natural language understanding tasks with SentEval. Also, we extend SentEval with two additional tasks to the medical domain. We present PubMedSection, a novel topic classification dataset focussed on the biomedical domain. Our comprehensive analysis covers 11 tasks and combinations of six embeddings. We report that combined embeddings outperform state of the art universal embeddings without any embedding fine-tuning. We observe that adding topic model based embeddings helps for most tasks and that differing pre-training tasks encode complementary features. Moreover, we present new state of the art results on the MPQA and SUBJ tasks in SentEval.

Text
LREC20_LM_TM(27)(1) - Author's Original
Download (204kB)

More information

Accepted/In Press date: 11 February 2020
Venue - Dates: Proceedings of the 12th International Conference on Language Resources and Evaluation, , Marseille, France, 2020-05-11 - 2020-05-16

Identifiers

Local EPrints ID: 438613
URI: http://eprints.soton.ac.uk/id/eprint/438613
PURE UUID: df30f6d4-8937-433b-a872-ad41fa4c68c3
ORCID for Steffen Staab: ORCID iD orcid.org/0000-0002-0780-4154

Catalogue record

Date deposited: 18 Mar 2020 17:33
Last modified: 17 Mar 2024 03:38

Export record

Contributors

Author: Rudolf Schneider
Author: Tom Oberhauser
Author: Paul Grundmann
Author: Felix Alexander Gers
Author: Alexander Löser
Author: Steffen Staab ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×