The University of Southampton
University of Southampton Institutional Repository

Exploitation of machine learning techniques in modelling phrase movements for machine translation

Exploitation of machine learning techniques in modelling phrase movements for machine translation
Exploitation of machine learning techniques in modelling phrase movements for machine translation
We propose a distance phrase reordering model (DPR) for statistical machine translation (SMT), where the aim is to learn the grammatical rules and context dependent changes using a phrase reordering classification framework. We consider a variety of machine learning techniques, including state-of-the-art structured prediction methods. Techniques are compared and evaluated on a Chinese-English corpus, a language pair known for the high reordering characteristics which cannot be adequately captured with current models. In the reordering classification task, the method significantly outperforms the baseline against which it was tested, and further, when integrated as a component of the state-of-the-art machine translation system, MOSES, it achieves improvement in translation results.
1-30
Ni, Yizhao
f509bfa2-434d-4f5e-b7b6-8cc8c13c0975
Saunders, Craig
26634635-4d4d-4469-b9ec-1d68788aa47a
Szedmak, Sandor
c6a84aa3-2956-4acf-8293-a1b676f6d7d8
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
Ni, Yizhao
f509bfa2-434d-4f5e-b7b6-8cc8c13c0975
Saunders, Craig
26634635-4d4d-4469-b9ec-1d68788aa47a
Szedmak, Sandor
c6a84aa3-2956-4acf-8293-a1b676f6d7d8
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f

Ni, Yizhao, Saunders, Craig, Szedmak, Sandor and Niranjan, Mahesan (2011) Exploitation of machine learning techniques in modelling phrase movements for machine translation. Journal of Machine Learning Research, 12, 1-30.

Record type: Article

Abstract

We propose a distance phrase reordering model (DPR) for statistical machine translation (SMT), where the aim is to learn the grammatical rules and context dependent changes using a phrase reordering classification framework. We consider a variety of machine learning techniques, including state-of-the-art structured prediction methods. Techniques are compared and evaluated on a Chinese-English corpus, a language pair known for the high reordering characteristics which cannot be adequately captured with current models. In the reordering classification task, the method significantly outperforms the baseline against which it was tested, and further, when integrated as a component of the state-of-the-art machine translation system, MOSES, it achieves improvement in translation results.

Text
__userfiles.soton.ac.uk_Users_nsc_mydesktop_272421.pdf - Version of Record
Download (1MB)

More information

Published date: 1 January 2011
Organisations: Southampton Wireless Group

Identifiers

Local EPrints ID: 272421
URI: http://eprints.soton.ac.uk/id/eprint/272421
PURE UUID: 6fc17051-a391-4199-937a-9d2a58edae89
ORCID for Mahesan Niranjan: ORCID iD orcid.org/0000-0001-7021-140X

Catalogue record

Date deposited: 06 Jun 2011 19:13
Last modified: 15 Mar 2024 03:29

Export record

Contributors

Author: Yizhao Ni
Author: Craig Saunders
Author: Sandor Szedmak
Author: Mahesan Niranjan ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×