The University of Southampton
University of Southampton Institutional Repository

Beyond multi-class – structured learning for machine translation

Beyond multi-class – structured learning for machine translation
Beyond multi-class – structured learning for machine translation
In this thesis, we explore and present machine learning (ML) approaches to a particularly challenging research area – machine translation (MT). The study aims at replacing or developing each component in the MT system with an appropriate discriminative model, where the ultimate goal is to create a powerful MT system with cutting-edge ML techniques.

The study regards each sub-problem encountered in the MT field as a classification or regression problem. To model specific mappings in MT tasks, the modern machine learning paradigm known as “structured learning” is pursued. This approach goes beyond classic multiclass pattern classification and explicitly models certain dependencies in the target domain.

Different algorithmic variants are then proposed for constructing the ML-based MT systems: the first application is a kernel-based MT system, that projects both input and output into a very high-dimensional linguistic feature space and makes use of the maximum margin regression (MMR) technique to learn the relations between input and output. It is amongst the first MT systems that work with pure ML techniques. The second application is the proposal of a max-margin structure (MMS) approach to phrase translation probability modelling in an MT system. The architecture of this approach is shown to capture structural aspects of the problem domains, leading to demonstrable performance improvements on machine translation. Finally the thesis describes the development of a phrase reordering model for machine translation, where we have compared different ML methods and discovered a particularly efficient paradigm to solve this problem.
Ni, Yizhao
f509bfa2-434d-4f5e-b7b6-8cc8c13c0975
Ni, Yizhao
f509bfa2-434d-4f5e-b7b6-8cc8c13c0975
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f

Ni, Yizhao (2010) Beyond multi-class – structured learning for machine translation. University of Southampton, School of Electronics and Computer Science, Doctoral Thesis, 173pp.

Record type: Thesis (Doctoral)

Abstract

In this thesis, we explore and present machine learning (ML) approaches to a particularly challenging research area – machine translation (MT). The study aims at replacing or developing each component in the MT system with an appropriate discriminative model, where the ultimate goal is to create a powerful MT system with cutting-edge ML techniques.

The study regards each sub-problem encountered in the MT field as a classification or regression problem. To model specific mappings in MT tasks, the modern machine learning paradigm known as “structured learning” is pursued. This approach goes beyond classic multiclass pattern classification and explicitly models certain dependencies in the target domain.

Different algorithmic variants are then proposed for constructing the ML-based MT systems: the first application is a kernel-based MT system, that projects both input and output into a very high-dimensional linguistic feature space and makes use of the maximum margin regression (MMR) technique to learn the relations between input and output. It is amongst the first MT systems that work with pure ML techniques. The second application is the proposal of a max-margin structure (MMS) approach to phrase translation probability modelling in an MT system. The architecture of this approach is shown to capture structural aspects of the problem domains, leading to demonstrable performance improvements on machine translation. Finally the thesis describes the development of a phrase reordering model for machine translation, where we have compared different ML methods and discovered a particularly efficient paradigm to solve this problem.

Text
Beyond_multiclass_-_structured_learning_for_machine_translation.pdf - Other
Download (10MB)

More information

Published date: June 2010
Organisations: University of Southampton

Identifiers

Local EPrints ID: 158565
URI: http://eprints.soton.ac.uk/id/eprint/158565
PURE UUID: bda4797d-e565-4816-8aa8-283125d7aff8
ORCID for Mahesan Niranjan: ORCID iD orcid.org/0000-0001-7021-140X

Catalogue record

Date deposited: 21 Jun 2010 14:06
Last modified: 14 Mar 2024 02:53

Export record

Contributors

Author: Yizhao Ni
Thesis advisor: Mahesan Niranjan ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×