Labeling post‐storm coastal imagery for machine learning: measurement of inter‐rater agreement

Classifying images using supervised machine learning (ML) relies on labeled training data—classes or text descriptions, for example, associated with each image. Data-driven models are only as good as the data used for training, and this points to the importance of high-quality labeled data for developing a ML model that has predictive skill. Labeling data is typically a time-consuming, manual process. Here, we investigate the process of labeling data, with a specific focus on coastal aerial imagery captured in the wake of hurricanes that affected the Atlantic and Gulf Coasts of the United States. The imagery data set is a rich observational record of storm impacts and coastal change, but the imagery requires labeling to render that information accessible. We created an online interface that served labelers a stream of images and a fixed set of questions. A total of 1,600 images were labeled by at least two or as many as seven coastal scientists. We used the resulting data set to investigate interrater agreement: the extent to which labelers labeled each image similarly. Interrater agreement scores, assessed with percent agreement and Krippendorff's alpha, are higher when the questions posed to labelers are relatively simple, when the labelers are provided with a user manual, and when images are smaller. Experiments in interrater agreement point toward the benefit of multiple labelers for understanding the uncertainty in labeling data for machine learning research.

classification, data annotation, data labeling, hurricane impacts, imagery, machine learning

10.1029/2021EA001896

2333-5084

Goldstein, Evan B.

25029a23-b5b1-4e08-9273-a62343cc8ec3

Buscombe, Daniel

7b779813-a764-4955-b86c-6aa4774a1cfe

Lazarus, Eli

642a3cdb-0d25-48b1-8ab8-8d1d72daca6e

Mohanty, Somya D.

0036fffa-3bfa-455d-9277-28b649ca0893

Rafique, Shah Nafis

4b8da1aa-61a6-4de4-9e2a-aa020679366b

Anarde, Katherine A.

1bb673c3-0eeb-49d3-9ca9-58c3a3ff996d

Ashton, Andrew D.

b03a9aa4-bfe5-4bff-8edd-f8c4b7ef06c3

Beuzen, Thomas

6d57240b-5a7e-4b06-a557-a6235e2e1701

Castagno, Katherine A.

350f6063-9135-4cd2-8584-baecb22b15a7

Cohn, Nicholas

284516bd-8368-4aa0-ad3b-1008ea808c00

Conlin, Matthew P.

15c6f5d7-4fee-42c6-92a5-adf5ab50c450

Ellenson, Ashley

2d7e57ef-770d-4194-af27-f61128f13168

Gillen, Megan

3cd432df-e0c7-43bf-a69e-ed60a5409a5c

Hovenga, Paige A.

fe2beb52-c45f-4791-9625-bbe3fb3c4fd5

Over, Jin-Si R.

76dc30ad-f151-4c1d-abc6-b2e00a242ecd

Palermo, Rose V.

e09ba6fc-947e-478d-ad3b-a9f12445fc14

Ratliff, Katherine M.

fa87eb55-a433-4506-aedd-b52e8abf4efd

Reeves, Ian R. B.

e94e9f98-87a3-4eeb-b881-f205038d0624

Sanborn, Lily H.

a84404f0-5f89-4c90-b41e-5187f9772421

Straub, Jessamin A.

decbcc2e-2eb9-41eb-9a59-e31193692501

Taylor, Luke, Alexander

d7c429f9-d964-4f6f-bb61-45c9b680f375

Wallace, Elizabeth J.

6be70419-fb54-45cb-933c-cc6c5113587f

Warrick, Jonathan

7f636283-3071-4f36-917c-9293c71348e7

Wernette, Phillipe

90f3ce5f-06d4-4420-b801-e2245e0cc536

Williams, Hannah

879e08ff-fc7e-4699-a27a-06be53e52732

3 September 2021

Goldstein, Evan B.

25029a23-b5b1-4e08-9273-a62343cc8ec3

Buscombe, Daniel

7b779813-a764-4955-b86c-6aa4774a1cfe

Lazarus, Eli

642a3cdb-0d25-48b1-8ab8-8d1d72daca6e

Mohanty, Somya D.

0036fffa-3bfa-455d-9277-28b649ca0893

Rafique, Shah Nafis

4b8da1aa-61a6-4de4-9e2a-aa020679366b

Anarde, Katherine A.

1bb673c3-0eeb-49d3-9ca9-58c3a3ff996d

Ashton, Andrew D.

b03a9aa4-bfe5-4bff-8edd-f8c4b7ef06c3

Beuzen, Thomas

6d57240b-5a7e-4b06-a557-a6235e2e1701

Castagno, Katherine A.

350f6063-9135-4cd2-8584-baecb22b15a7

Cohn, Nicholas

284516bd-8368-4aa0-ad3b-1008ea808c00

Conlin, Matthew P.

15c6f5d7-4fee-42c6-92a5-adf5ab50c450

Ellenson, Ashley

2d7e57ef-770d-4194-af27-f61128f13168

Gillen, Megan

3cd432df-e0c7-43bf-a69e-ed60a5409a5c

Hovenga, Paige A.

fe2beb52-c45f-4791-9625-bbe3fb3c4fd5

Over, Jin-Si R.

76dc30ad-f151-4c1d-abc6-b2e00a242ecd

Palermo, Rose V.

e09ba6fc-947e-478d-ad3b-a9f12445fc14

Ratliff, Katherine M.

fa87eb55-a433-4506-aedd-b52e8abf4efd

Reeves, Ian R. B.

e94e9f98-87a3-4eeb-b881-f205038d0624

Sanborn, Lily H.

a84404f0-5f89-4c90-b41e-5187f9772421

Straub, Jessamin A.

decbcc2e-2eb9-41eb-9a59-e31193692501

Taylor, Luke, Alexander

d7c429f9-d964-4f6f-bb61-45c9b680f375

Wallace, Elizabeth J.

6be70419-fb54-45cb-933c-cc6c5113587f

Warrick, Jonathan

7f636283-3071-4f36-917c-9293c71348e7

Wernette, Phillipe

90f3ce5f-06d4-4420-b801-e2245e0cc536

Williams, Hannah

879e08ff-fc7e-4699-a27a-06be53e52732

Goldstein, Evan B., Buscombe, Daniel, Lazarus, Eli, Mohanty, Somya D., Rafique, Shah Nafis, Anarde, Katherine A., Ashton, Andrew D., Beuzen, Thomas, Castagno, Katherine A., Cohn, Nicholas, Conlin, Matthew P., Ellenson, Ashley, Gillen, Megan, Hovenga, Paige A., Over, Jin-Si R., Palermo, Rose V., Ratliff, Katherine M., Reeves, Ian R. B., Sanborn, Lily H., Straub, Jessamin A., Taylor, Luke, Alexander, Wallace, Elizabeth J., Warrick, Jonathan, Wernette, Phillipe and Williams, Hannah (2021) Labeling post‐storm coastal imagery for machine learning: measurement of inter‐rater agreement. Earth and Space Science, 8 (9), [e2021EA001896]. (doi:10.1029/2021EA001896).

Record type: Article

Abstract

Text

2021EA001896 - Version of Record

Available under License Creative Commons Attribution.

Download (1MB)

More information

Accepted/In Press date: 26 August 2021

Published date: 3 September 2021

Additional Information: Funding Information: We thank the editor, two reviewers, and Chris Sherwood for feedback on this work. The authors gratefully acknowledge support from the U.S. Geological Survey (G20AC00403 to EBG and SDM), NSF (1953412 to EBG and SDM; 1939954 to EBG), Microsoft AI for Earth (to EBG and SDM), The Leverhulme Trust (RPG‐2018‐282 to EDL and EBG), and an Early Career Research Fellowship from the Gulf Research Program of the National Academies of Sciences, Engineering, and Medicine (to EBG). U.S. Geological Survey researchers (DB, J‐SRO, JW, and PW) were supported by the U.S. Geological Survey Coastal and Marine Hazards and Resources Program as part of the response and recovery efforts under congressional appropriations through the Additional Supplemental Appropriations for Disaster Relief Act, 2019 (Public Law 116‐20; 133 Stat. 871). Publisher Copyright: © 2021 The Authors. Earth and Space Science published by Wiley Periodicals LLC on behalf of American Geophysical Union. Copyright: Copyright 2021 Elsevier B.V., All rights reserved.

Related URLs:

https://doi.org/10.5281/zenodo.4272063

Keywords: classification, data annotation, data labeling, hurricane impacts, imagery, machine learning

Learn more about the Southampton Marine and Maritime Institute Learn more about the School of Geography and Environmental Sciences

Identifiers

Local EPrints ID: 452071

URI: http://eprints.soton.ac.uk/id/eprint/452071

DOI: doi:10.1029/2021EA001896

ISSN: 2333-5084

PURE UUID: 38d7ea7a-f6a1-4f12-a668-69cccb478b4a

ORCID for Eli Lazarus:

orcid.org/0000-0003-2404-9661

Catalogue record

Date deposited: 10 Nov 2021 17:37

Last modified: 06 Jun 2024 01:58

Export record

Altmetrics

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Evan B. Goldstein

Author: Daniel Buscombe

Author: Eli Lazarus

Author: Somya D. Mohanty

Author: Shah Nafis Rafique

Author: Katherine A. Anarde

Author: Andrew D. Ashton

Author: Thomas Beuzen

Author: Katherine A. Castagno

Author: Nicholas Cohn

Author: Matthew P. Conlin

Author: Ashley Ellenson

Author: Megan Gillen

Author: Paige A. Hovenga

Author: Jin-Si R. Over

Author: Rose V. Palermo

Author: Katherine M. Ratliff

Author: Ian R. B. Reeves

Author: Lily H. Sanborn

Author: Jessamin A. Straub

Author: Luke, Alexander Taylor

Author: Elizabeth J. Wallace

Author: Jonathan Warrick

Author: Phillipe Wernette

Author: Hannah Williams

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information