Hypertext’s meta-history: documenting in-conference citations, authors and keyword data, 1987-2021
Hypertext’s meta-history: documenting in-conference citations, authors and keyword data, 1987-2021
Conferences such as ACM Hypertext have been running for many decades and the metadata on their collected publications represent a valuable scholarly meta-history on areas such as the community’s health, diversity, and changing interests. But the metadata about these papers is not readily available for analysis, and the data collection and cleaning tasks appear substantial. In this paper we attempt to explore this challenge using the ACM Hypertext series as a case study. Taking the ACM Digital Library as a starting point, and using a combination of manual and automatic methods, we have constructed and released a 3-star Open Dataset representing over 1000 publications by almost 2,500 authors. An initial analysis reveals a modestly-sized but robust conference, with a changing pattern of in-citations that co-occurs with the arrival of social media, and a relatively consistent but imbalanced gender ratio of authors that shows some signs of recent improvements. The challenges encountered included identifying discrete author names, potential issues with text retrieval from PDF, and a disparate set of author keywords that reveals an absence of a common vocabulary. These insights are the results of a hard-fought process that is made complex by an incomplete digital record and a lack of consistency in naming. This Hypertext case study thus reveals a serious shortfall in the way that scholarly activity is captured and described, and questions PDF as the primary method of recording publications. Addressing these issues would make further analysis more straightforward and would allow larger events (with orders of magnitude more data) to be analysed in a similar way.
Tinderbox, analysis, citation networks, dataset, gender, hypertext, keywording, keywords, knowledge management, linkbases, links, metadata, visualisation
96-106
Association for Computing Machinery
Anderson, Mark
2413b755-3f9e-4f6c-a729-2e1b2862da16
Millard, David
4f19bca5-80dc-4533-a101-89a5a0e3b372
Anderson, Mark
2413b755-3f9e-4f6c-a729-2e1b2862da16
Millard, David
4f19bca5-80dc-4533-a101-89a5a0e3b372
Anderson, Mark and Millard, David
(2022)
Hypertext’s meta-history: documenting in-conference citations, authors and keyword data, 1987-2021.
In HT 2022: 33rd ACM Conference on Hypertext and Social Media - Co-located with ACM WebSci 2022 and ACM UMAP 2022.
Association for Computing Machinery.
.
(doi:10.1145/3511095.3531271).
Record type:
Conference or Workshop Item
(Paper)
Abstract
Conferences such as ACM Hypertext have been running for many decades and the metadata on their collected publications represent a valuable scholarly meta-history on areas such as the community’s health, diversity, and changing interests. But the metadata about these papers is not readily available for analysis, and the data collection and cleaning tasks appear substantial. In this paper we attempt to explore this challenge using the ACM Hypertext series as a case study. Taking the ACM Digital Library as a starting point, and using a combination of manual and automatic methods, we have constructed and released a 3-star Open Dataset representing over 1000 publications by almost 2,500 authors. An initial analysis reveals a modestly-sized but robust conference, with a changing pattern of in-citations that co-occurs with the arrival of social media, and a relatively consistent but imbalanced gender ratio of authors that shows some signs of recent improvements. The challenges encountered included identifying discrete author names, potential issues with text retrieval from PDF, and a disparate set of author keywords that reveals an absence of a common vocabulary. These insights are the results of a hard-fought process that is made complex by an incomplete digital record and a lack of consistency in naming. This Hypertext case study thus reveals a serious shortfall in the way that scholarly activity is captured and described, and questions PDF as the primary method of recording publications. Addressing these issues would make further analysis more straightforward and would allow larger events (with orders of magnitude more data) to be analysed in a similar way.
Text
HT2022-author-ver
- Author's Original
Text
ht22-meta-history-preprint
- Accepted Manuscript
Restricted to Repository staff only
Request a copy
More information
e-pub ahead of print date: 28 June 2022
Additional Information:
Publisher Copyright:
© 2022 ACM.
Venue - Dates:
ACM Conference on Hypertext and Social Media 2022: Hypertext '22, , Barcelona, Spain, 2022-06-29 - 2022-07-01
Keywords:
Tinderbox, analysis, citation networks, dataset, gender, hypertext, keywording, keywords, knowledge management, linkbases, links, metadata, visualisation
Identifiers
Local EPrints ID: 470579
URI: http://eprints.soton.ac.uk/id/eprint/470579
PURE UUID: 4609ddeb-eeb2-4c7a-80ba-c7feccd2d08e
Catalogue record
Date deposited: 13 Oct 2022 16:38
Last modified: 30 Nov 2024 02:37
Export record
Altmetrics
Contributors
Author:
Mark Anderson
Author:
David Millard
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics