The University of Southampton
University of Southampton Institutional Repository

Data quality concepts and techniques applied to taxonomic databases

Data quality concepts and techniques applied to taxonomic databases
Data quality concepts and techniques applied to taxonomic databases

The thesis investigates the application of concepts and techniques of data quality in taxonomic databases to enhance the quality of information services and systems in taxonomy. Taxonomic data are arranged and introduced in Taxonomic Data Domains in order to establish a standard and a working framework to support the proposed Taxonomic Data Quality Dimensions, as a specialised application of conventional Data Quality Dimensions in the Taxonomic Data Quality Domains.

The thesis presents a discussion about improving data quality in taxonomic databases, considering conventional Data Cleansing techniques and applying generic data content error patterns to taxonomic data. Techniques of taxonomic error detection are explored, with special attention to scientific name spelling errors.

The spelling error problem is scrutinized through spelling error detecting techniques and algorithms. Spelling error detection algorithms are described and analysed. In order to evaluate the applicability and efficiency of different spelling error detection algorithms, a suite of experimental spelling error detection tools was developed and a set of experiments was performed, using a sample of five different taxonomic databases. The results of the experiments are analysed form the algorithm and from the database point of view.

Database quality assessment procedures and metrics are discussed in the context of taxonomic databases and the previously introduced concepts of Taxonomic Data Domains and Taxonomic Data Quality Dimensions.

Four questions related to Taxonomic Database Quality are discussed, followed by conclusions and recombinations involving information system design and implementation and the processes involved in taxonomic data management and information flow.

University of Southampton
Dalcin, Eduardo Couto
a76fa07d-86c3-4c69-947f-9a6403c41bf6
Dalcin, Eduardo Couto
a76fa07d-86c3-4c69-947f-9a6403c41bf6

Dalcin, Eduardo Couto (2005) Data quality concepts and techniques applied to taxonomic databases. University of Southampton, Doctoral Thesis.

Record type: Thesis (Doctoral)

Abstract

The thesis investigates the application of concepts and techniques of data quality in taxonomic databases to enhance the quality of information services and systems in taxonomy. Taxonomic data are arranged and introduced in Taxonomic Data Domains in order to establish a standard and a working framework to support the proposed Taxonomic Data Quality Dimensions, as a specialised application of conventional Data Quality Dimensions in the Taxonomic Data Quality Domains.

The thesis presents a discussion about improving data quality in taxonomic databases, considering conventional Data Cleansing techniques and applying generic data content error patterns to taxonomic data. Techniques of taxonomic error detection are explored, with special attention to scientific name spelling errors.

The spelling error problem is scrutinized through spelling error detecting techniques and algorithms. Spelling error detection algorithms are described and analysed. In order to evaluate the applicability and efficiency of different spelling error detection algorithms, a suite of experimental spelling error detection tools was developed and a set of experiments was performed, using a sample of five different taxonomic databases. The results of the experiments are analysed form the algorithm and from the database point of view.

Database quality assessment procedures and metrics are discussed in the context of taxonomic databases and the previously introduced concepts of Taxonomic Data Domains and Taxonomic Data Quality Dimensions.

Four questions related to Taxonomic Database Quality are discussed, followed by conclusions and recombinations involving information system design and implementation and the processes involved in taxonomic data management and information flow.

Text
1014184.pdf - Version of Record
Available under License University of Southampton Thesis Licence.
Download (30MB)

More information

Published date: 2005

Identifiers

Local EPrints ID: 465930
URI: http://eprints.soton.ac.uk/id/eprint/465930
PURE UUID: 2de5668b-b17b-4c38-a486-a4d9df5d1775

Catalogue record

Date deposited: 05 Jul 2022 03:41
Last modified: 16 Mar 2024 20:26

Export record

Contributors

Author: Eduardo Couto Dalcin

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×