Online machine learning for combinatorial data

Ghosh, Shaona (2016) Online machine learning for combinatorial data. University of Southampton, Doctoral Thesis, 138pp.

Record type: Thesis (Doctoral)

Abstract

With an ever increasing demand on large scale data, difficulties exist in terms of processing and utilising the information available. In particular, making decisions based upon sequentially acquired data where only limited information is initially known, is an important problem. Often the input data in such problems have a complex combinatorial structure, for example consider an internet advertising system that manages advertisement placement over a network of websites. The ways of placing m different advertisements on n websites with replacement, is an exponential number of mⁿ possible combinations that scales badly with large n. As a combinatorial problem, the data can be manipulated within a frequently occurring computational object called graph, allowing the structure to be exploited for intelligent automatic processing. Traditionally, machine learning techniques require a separate initial training phase before predictions can occur on unseen data. However, the sequential nature of some problems necessitate real-time prediction, thereby making many existing techniques unsuitable. Online learning is a field of machine learning that has an ensemble of algorithms that learn from sequential streaming data, where the learner cannot control or in influence the data collection procedure. Although these existing online methods have theoretical guarantees on performance, in the context of combinatorial complexity of graphical structures they are not yet fully matured. In this thesis, a series of algorithms that attempt to overcome the shortcomings of existing online algorithms are presented. The discrete graphical model, called the Ising model, is explored to develop online approximation algorithms for label prediction. A deterministic approximation algorithm with sequential guarantee is developed, by capturing the persistent structures of maximum flows and minimum cuts in the network and an efficient enumeration of all label consistent minimum cuts. Novel mistake bounds are provided that improve and match previous performance bounds in the literature. Additionally, a variational approximation technique using mean field approximation is built for online prediction of multi-class labelling on the Ising model. An online sequential action selection algorithm for the limited feedback setting (bandit feedback) and side information is developed with a linear programming relaxation of the classic maximal flow problem. Finally, the multiple objective optimization problem with conflicting objectives and full feedback is studied and an online algorithm is built that outperforms the traditional approaches under similar assumptions.

Text

ShaonaGhosh_PhDThesis_main - Version of Record

Available under License University of Southampton Thesis Licence.

Download (6MB)