Measuring the completeness of scholarly communications databases
Measuring the completeness of scholarly communications databases
As scholarly communication has been digitised and moved online, large streams of data are being generated by the millions of publications, citations, or viewership statis- tics. This data, gathered by a few specialised services, serves an important role in help- ing individual researchers conduct literature review, science policymakers to analyse the impact of research, and science as a whole to progress effectively.
This research is aiming to summarise the requirements regarding scope, quality, trans- parency and accessibility of scholarly communication databases, create a uniform method- ology of analysis of these datasets based on these requirements. The methodology is then used to analyse Google Scholar, Microsoft Academic and Scopus and the re- sults are compared to other studies of these datasets. High similarity of the results ob- tained using designed methodology to established publications show that the method- ology may be a promising method of partially automated, cross-disciplinary analysis of scholarly databases. Finally, a method of conducting an automated overlap analysis of datasets is presented as a methodological contribution, alongside relevant statistics of precision and recall.
University of Southampton
Paszcza, Bartosz
4c891abc-8dcb-45a7-8f43-dd6d145cc9b3
July 2021
Paszcza, Bartosz
4c891abc-8dcb-45a7-8f43-dd6d145cc9b3
Carr, Leslie
0572b10e-039d-46c6-bf05-57cce71d3936
Harnad, Stevan
442ee520-71a1-4283-8e01-106693487d8b
Frey, Jeremy
ba60c559-c4af-44f1-87e6-ce69819bf23f
Paszcza, Bartosz
(2021)
Measuring the completeness of scholarly communications databases.
University of Southampton, Doctoral Thesis, 86pp.
Record type:
Thesis
(Doctoral)
Abstract
As scholarly communication has been digitised and moved online, large streams of data are being generated by the millions of publications, citations, or viewership statis- tics. This data, gathered by a few specialised services, serves an important role in help- ing individual researchers conduct literature review, science policymakers to analyse the impact of research, and science as a whole to progress effectively.
This research is aiming to summarise the requirements regarding scope, quality, trans- parency and accessibility of scholarly communication databases, create a uniform method- ology of analysis of these datasets based on these requirements. The methodology is then used to analyse Google Scholar, Microsoft Academic and Scopus and the re- sults are compared to other studies of these datasets. High similarity of the results ob- tained using designed methodology to established publications show that the method- ology may be a promising method of partially automated, cross-disciplinary analysis of scholarly databases. Finally, a method of conducting an automated overlap analysis of datasets is presented as a methodological contribution, alongside relevant statistics of precision and recall.
Text
Measuring the completeness of scholarly communications databases Bartosz Pa
- Version of Record
Text
Permission to deposit thesis - BPaszcza
Restricted to Repository staff only
More information
Published date: July 2021
Identifiers
Local EPrints ID: 474095
URI: http://eprints.soton.ac.uk/id/eprint/474095
PURE UUID: 742bb97a-c084-4085-9228-eef9906668bf
Catalogue record
Date deposited: 13 Feb 2023 17:58
Last modified: 17 Mar 2024 02:41
Export record
Contributors
Author:
Bartosz Paszcza
Thesis advisor:
Stevan Harnad
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics