The University of Southampton
University of Southampton Institutional Repository

Fast projection onto the capped simplex with applications to sparse regression in bioinformatics

Fast projection onto the capped simplex with applications to sparse regression in bioinformatics
Fast projection onto the capped simplex with applications to sparse regression in bioinformatics

We consider the problem of projecting a vector onto the so-called k-capped simplex, which is a hyper-cube cut by a hyperplane. For an n-dimensional input vector with bounded elements, we found that a simple algorithm based on Newton’s method is able to solve the projection problem to high precision with a complexity roughly about O(n), which has a much lower computational cost compared with the existing sorting-based methods proposed in the literature. We provide a theory for partial explanation and justification of the method. We demonstrate that the proposed algorithm can produce a solution of the projection problem with high precision on large scale datasets, and the algorithm is able to significantly outperform the state-of-the-art methods in terms of runtime (about 6-8 times faster than a commercial software with respect to CPU time for input vector with 1 million variables or more). We further illustrate the effectiveness of the proposed algorithm on solving sparse regression in a bioinformatics problem. Empirical results on the GWAS dataset (with 1,500,000 single-nucleotide polymorphisms) show that, when using the proposed method to accelerate the Projected Quasi-Newton (PQN) method, the accelerated PQN algorithm is able to handle huge-scale regression problem and it is more efficient (about 3-6 times faster) than the current state-of-the-art methods.

9990-9999
Neural Information Processing Systems Foundation
Ang, Man Shun
ed509ecd-39a3-4887-a709-339fdaded867
Ma, Jianzhu
c24fdcba-e5ea-4456-a416-3be0aea60568
Liu, Nianjun
15a79917-50c2-4c30-9b5d-a2fd2eec3b71
Huang, Kun
85ec42dd-c528-4a3c-bc26-e8fb75989962
Wang, Yijie
52f11931-664c-4fc1-8d0e-b70a1d956f46
Ranzato, Marc'Aurelio
Beygelzimer, Alina
Dauphin, Yann
Liang, Percy S.
Wortman Vaughan, Jenn
Ang, Man Shun
ed509ecd-39a3-4887-a709-339fdaded867
Ma, Jianzhu
c24fdcba-e5ea-4456-a416-3be0aea60568
Liu, Nianjun
15a79917-50c2-4c30-9b5d-a2fd2eec3b71
Huang, Kun
85ec42dd-c528-4a3c-bc26-e8fb75989962
Wang, Yijie
52f11931-664c-4fc1-8d0e-b70a1d956f46
Ranzato, Marc'Aurelio
Beygelzimer, Alina
Dauphin, Yann
Liang, Percy S.
Wortman Vaughan, Jenn

Ang, Man Shun, Ma, Jianzhu, Liu, Nianjun, Huang, Kun and Wang, Yijie (2021) Fast projection onto the capped simplex with applications to sparse regression in bioinformatics. Ranzato, Marc'Aurelio, Beygelzimer, Alina, Dauphin, Yann, Liang, Percy S. and Wortman Vaughan, Jenn (eds.) In Advances in Neural Information Processing Systems 34: 35th Conference on Neural Information Processing Systems (NeurIPS 2021). Neural Information Processing Systems Foundation. pp. 9990-9999 .

Record type: Conference or Workshop Item (Paper)

Abstract

We consider the problem of projecting a vector onto the so-called k-capped simplex, which is a hyper-cube cut by a hyperplane. For an n-dimensional input vector with bounded elements, we found that a simple algorithm based on Newton’s method is able to solve the projection problem to high precision with a complexity roughly about O(n), which has a much lower computational cost compared with the existing sorting-based methods proposed in the literature. We provide a theory for partial explanation and justification of the method. We demonstrate that the proposed algorithm can produce a solution of the projection problem with high precision on large scale datasets, and the algorithm is able to significantly outperform the state-of-the-art methods in terms of runtime (about 6-8 times faster than a commercial software with respect to CPU time for input vector with 1 million variables or more). We further illustrate the effectiveness of the proposed algorithm on solving sparse regression in a bioinformatics problem. Empirical results on the GWAS dataset (with 1,500,000 single-nucleotide polymorphisms) show that, when using the proposed method to accelerate the Projected Quasi-Newton (PQN) method, the accelerated PQN algorithm is able to handle huge-scale regression problem and it is more efficient (about 3-6 times faster) than the current state-of-the-art methods.

This record has no associated files available for download.

More information

Published date: 2021
Additional Information: Publisher Copyright: © 2021 Neural information processing systems foundation. All rights reserved.
Venue - Dates: 35th Conference on Neural Information Processing Systems, NeurIPS 2021, , Virtual, Online, 2021-12-06 - 2021-12-14

Identifiers

Local EPrints ID: 495224
URI: http://eprints.soton.ac.uk/id/eprint/495224
PURE UUID: 9da9b45e-10fa-4e0f-a003-756f2545fa54
ORCID for Man Shun Ang: ORCID iD orcid.org/0000-0002-8330-758X

Catalogue record

Date deposited: 01 Nov 2024 18:15
Last modified: 02 Nov 2024 03:08

Export record

Contributors

Author: Man Shun Ang ORCID iD
Author: Jianzhu Ma
Author: Nianjun Liu
Author: Kun Huang
Author: Yijie Wang
Editor: Marc'Aurelio Ranzato
Editor: Alina Beygelzimer
Editor: Yann Dauphin
Editor: Percy S. Liang
Editor: Jenn Wortman Vaughan

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×