A comparison of correspondence analysis with PMI-based word embedding methods
A comparison of correspondence analysis with PMI-based word embedding methods
Popular word embedding methods such as GloVe and Word2Vec are related to the factorization of the pointwise mutual information (PMI) matrix. In this paper, we link correspondence analysis (CA) to the factorization of the PMI matrix. CA is a dimensionality reduction method that uses singular value decomposition (SVD), and we show that CA is mathematically close to the weighted factorization of the PMI matrix. In addition, we present variants of CA that turn out to be successful in the factorization of the word-context matrix, i.e. CA applied to a matrix where the entries undergo a square-root transformation (ROOT-CA) and a root-root transformation (ROOTROOT-CA). An empirical comparison among CA- and PMI-based methods shows that overall results of ROOT-CA and ROOTROOT-CA are slightly better than those of the PMI-based methods.
cs.CL
Qi, Qianqian
47673ec0-7ef7-413d-8102-10789990f40c
Hessen, David J.
5e4ddabd-0df6-48e4-8c6e-478e2f1940ec
van der Heijden, Peter G.M.
85157917-3b33-4683-81be-713f987fd612
31 May 2024
Qi, Qianqian
47673ec0-7ef7-413d-8102-10789990f40c
Hessen, David J.
5e4ddabd-0df6-48e4-8c6e-478e2f1940ec
van der Heijden, Peter G.M.
85157917-3b33-4683-81be-713f987fd612
[Unknown type: UNSPECIFIED]
Abstract
Popular word embedding methods such as GloVe and Word2Vec are related to the factorization of the pointwise mutual information (PMI) matrix. In this paper, we link correspondence analysis (CA) to the factorization of the PMI matrix. CA is a dimensionality reduction method that uses singular value decomposition (SVD), and we show that CA is mathematically close to the weighted factorization of the PMI matrix. In addition, we present variants of CA that turn out to be successful in the factorization of the word-context matrix, i.e. CA applied to a matrix where the entries undergo a square-root transformation (ROOT-CA) and a root-root transformation (ROOTROOT-CA). An empirical comparison among CA- and PMI-based methods shows that overall results of ROOT-CA and ROOTROOT-CA are slightly better than those of the PMI-based methods.
Text
2405.20895v1
- Author's Original
Available under License Other.
More information
e-pub ahead of print date: 31 May 2024
Published date: 31 May 2024
Keywords:
cs.CL
Identifiers
Local EPrints ID: 491179
URI: http://eprints.soton.ac.uk/id/eprint/491179
PURE UUID: 9e48958f-8041-4c6a-a54f-637c0306230a
Catalogue record
Date deposited: 14 Jun 2024 16:40
Last modified: 16 Jul 2024 01:44
Export record
Altmetrics
Contributors
Author:
Qianqian Qi
Author:
David J. Hessen
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics