Comparing onset detection and perceptual attack time
Comparing onset detection and perceptual attack time
Accurate performance timing is associated with the perceptual attack time (PAT) of notes, rather than their physical or perceptual onsets (PhOT, POT). Since manual annotation of PAT for analysis is both time-consuming and impractical for real-time applications, automatic transcription is desirable. However, computational methods for onset detection in audio signals are conventionally measured against PhOT or POT data. This paper describes a comparison between PAT and onset detection data to assess whether in some circumstances they are similar enough to be equivalent, or whether additional models for PAT-PhOT difference are always necessary. Eight published onset algorithms, and one commercial system, were tested with five onset types in short monophonic sequences. Ground truth was established by multiple human transcription of the audio for PATs using rhythm adjustment with synchronous presentation, and parameters for each detection algorithm manually adjusted to produce the maximum agreement with the ground truth. Results indicate that for percussive attacks, a number of algorithms produce data close to or within the limits of human agreement and therefore may be substituted for PATs, while for non-percussive sounds corrective measures are necessary to match detector outputs to human estimates.
9780615900650
523-528
International Society for Music Information Retrieval
Polfreman, Richard
26424c3d-b750-4868-bf6e-2bbb3990df84
de Souza Britto Junior, Alceu
November 2013
Polfreman, Richard
26424c3d-b750-4868-bf6e-2bbb3990df84
de Souza Britto Junior, Alceu
Polfreman, Richard
(2013)
Comparing onset detection and perceptual attack time.
de Souza Britto Junior, Alceu, Gouyon, Fabien and Dixon, Simon
(eds.)
In Proceedings of the 14th International Society for Music Information Retrieval Conference (ISMIR 2013).
International Society for Music Information Retrieval.
.
Record type:
Conference or Workshop Item
(Paper)
Abstract
Accurate performance timing is associated with the perceptual attack time (PAT) of notes, rather than their physical or perceptual onsets (PhOT, POT). Since manual annotation of PAT for analysis is both time-consuming and impractical for real-time applications, automatic transcription is desirable. However, computational methods for onset detection in audio signals are conventionally measured against PhOT or POT data. This paper describes a comparison between PAT and onset detection data to assess whether in some circumstances they are similar enough to be equivalent, or whether additional models for PAT-PhOT difference are always necessary. Eight published onset algorithms, and one commercial system, were tested with five onset types in short monophonic sequences. Ground truth was established by multiple human transcription of the audio for PATs using rhythm adjustment with synchronous presentation, and parameters for each detection algorithm manually adjusted to produce the maximum agreement with the ground truth. Results indicate that for percussive attacks, a number of algorithms produce data close to or within the limits of human agreement and therefore may be substituted for PATs, while for non-percussive sounds corrective measures are necessary to match detector outputs to human estimates.
Text
355943 Polfreman 35.pdf
- Version of Record
Restricted to Repository staff only
Request a copy
More information
Published date: November 2013
Organisations:
Music
Identifiers
Local EPrints ID: 355943
URI: http://eprints.soton.ac.uk/id/eprint/355943
ISBN: 9780615900650
PURE UUID: 567afe33-e402-4cdb-acc4-20e63d51288f
Catalogue record
Date deposited: 13 Sep 2013 16:03
Last modified: 14 Mar 2024 14:40
Export record
Contributors
Editor:
Alceu de Souza Britto Junior
Editor:
Fabien Gouyon
Editor:
Simon Dixon
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics