A comparison of statistical methods for deriving occupancy estimates from machine learning outputs

The combination of autonomous recording units (ARUs) and machine learning enables scalable biodiversity monitoring. These data are often analysed using occupancy models, yet methods for integrating machine learning outputs with these models are rarely compared. Using the Yucatán black howler monkey as a case study, we evaluated four approaches for integrating ARU data and machine learning outputs into occupancy models: (i) standard occupancy models with verified data, and false-positive occupancy models using (ii) presence-absence data, (iii) counts of detections, and (iv) continuous classifier scores. We assessed estimator accuracy and the effects of decision threshold, temporal subsampling, and verification strategies. We found that classifier-guided listening with a standard occupancy model provided an accurate estimate with minimal verification effort. The false-positive models yielded similarly accurate estimates under specific conditions, but were sensitive to subjective choices including decision threshold. The inability to determine stable parameter choices a priori, coupled with the increased computational complexity of several models (i.e. the detection-count and continuous-score models), limits the practical application of false-positive models. In the case of a high-performance classifier and a readily detectable species, classifier-guided listening paired with a standard occupancy model provides a practical and efficient approach for accurately estimating occupancy.

Acoustic monitoring, Autonomous recording units (ARUs), Biodiversity monitoring, False-positive models, Occupancy modelling, Yucatán black howler monkey

10.1038/s41598-025-95207-3

2045-2322

Katsis, Lydia K. D.

a90d89d0-22f0-47fd-94a6-bb7f2d9614cf

Rhinehart, Tessa a.

2d84d050-251f-46b7-b852-afbb48cb0507

Dorgay, Elizabeth

a0dfd75f-5323-4b57-bf2c-7e79daf7077e

Sanchez, Emma e.

8a93d885-94f8-4c36-af8a-6f7e902494e1

Snaddon, Jake l.

31a601f7-c9b0-45e2-b59b-fda9a0c5a54b

Doncaster, C. Patrick

0eff2f42-fa0a-4e35-b6ac-475ad3482047

Kitzes, Justin

ef5b2b2a-4b3d-44ca-b93b-e998f734087a

27 April 2025

Katsis, Lydia K. D.

a90d89d0-22f0-47fd-94a6-bb7f2d9614cf

Rhinehart, Tessa a.

2d84d050-251f-46b7-b852-afbb48cb0507

Dorgay, Elizabeth

a0dfd75f-5323-4b57-bf2c-7e79daf7077e

Sanchez, Emma e.

8a93d885-94f8-4c36-af8a-6f7e902494e1

Snaddon, Jake l.

31a601f7-c9b0-45e2-b59b-fda9a0c5a54b

Doncaster, C. Patrick

0eff2f42-fa0a-4e35-b6ac-475ad3482047

Kitzes, Justin

ef5b2b2a-4b3d-44ca-b93b-e998f734087a

Katsis, Lydia K. D., Rhinehart, Tessa a., Dorgay, Elizabeth, Sanchez, Emma e., Snaddon, Jake l., Doncaster, C. Patrick and Kitzes, Justin (2025) A comparison of statistical methods for deriving occupancy estimates from machine learning outputs. Scientific Reports, 15 (1), [14700]. (doi:10.1038/s41598-025-95207-3).

Record type: Article

Abstract

Text

s41598-025-95207-3 - Version of Record

Available under License Creative Commons Attribution.

Download (2MB)

More information

Accepted/In Press date: 19 March 2025

Published date: 27 April 2025

Additional Information: Publisher Copyright: © The Author(s) 2025.

Keywords: Acoustic monitoring, Autonomous recording units (ARUs), Biodiversity monitoring, False-positive models, Occupancy modelling, Yucatán black howler monkey

Learn more about School of Ocean and Earth Science research