Towards potential-based learning for pareto trade-offs in on-line prediction with experts

Ghosh, Shaona and Gunn, Steve (2012) Towards potential-based learning for pareto trade-offs in on-line prediction with experts. Women in Machine Learning 2012, Lake Tahoe, Nevada, United States. 03 Dec 2012.

Record type: Conference or Workshop Item (Poster)

Abstract

The on-line learning with experts paradigm is a well-established machine learning framework where the role of the decision maker is to predict as good as the best expert in hindsight. In our work, we apply on-line learning to embedded mobile devices where there is a continuous problem of learning conflicting trade-offs: the device should save as much power as possible for battery longevity while executing applications with minimum performance delay. These trade-offs are conflicting in nature; additional power while executing applications can only be saved at the cost of performance of the applications or vice versa. An example of a power policy in such devices is one that reduces the voltage or frequency of the cpu/core to decrease its speed and thus saving power while making the application take longer time to execute. Such trade-offs are also known as Pareto trade-offs. The on-line learning theory with experts does not include support for multiple objective learning where the learner is evaluated on how well it performs while predicting the Pareto trade-offs. Typically, for devising a general strategy for the learner, this framework uses a potential function as a function of the learner’s cumulative regret over time. We analyse this potential function in the context of our Pareto trade-off environment where the gradient of the potential function is calculated with respect to each trade-off separately. At every round, if the instantaneous regret vector or expert selected by the learner lies in the negative half space of the gradients of the potential function with respect to each trade-off, then the cumulative regret of the learner will simultaneously decrease for both trade-offs. However, this is only possible when the expert selected is far from the local Pareto optimal point and is nicely aligned with the gradients of the potential function with respect to each trade-off. At the point of local Pareto optimality, the gradients face in opposite directions and then minimization of the learner’s regret is only possible by compromising one trade-off for the other. We consider Pareto descent direction in which no other expert is superior in improving the regret of the learner for all trade-offs simultaneously and none of the Pareto descent experts are superior to each other. The Pareto descent direction turns out to be a convex combination of steepest descent directions of the potential functions with respect to each trade-off. We represent this convex combination as separate weight vector which enables combination of the trade-offs in a weighted manner; one trade-off is weighed more than the other, each weight represents vertices of a convex cone pointed at the origin. This weighted vector enables selection of an expert that reduces the learner’s regret only with respect to one trade-off if the weight vector is non-zero. As our future work, we wish to derive generalization bounds for our learning algorithm. We also hope to analyse Blackwell’s approachability condition and Hannan consistency in the context of our modified learner’s strategy.

This record has no associated files available for download.