The University of Southampton
University of Southampton Institutional Repository

Hypertext’s meta-history: Documenting in-conference citations, authors and keyword data, 1987-2021

Hypertext’s meta-history: Documenting in-conference citations, authors and keyword data, 1987-2021
Hypertext’s meta-history: Documenting in-conference citations, authors and keyword data, 1987-2021
Conferences such as ACM Hypertext have been running for many decades and the metadata on their collected publications represent a valuable scholarly meta-history on areas such as the community’s health, diversity, and changing interests. But the metadata about these papers is not readily available for analysis, and the data collection and cleaning tasks appear substantial. In this paper we attempt to explore this challenge using the ACM Hypertext series as a case study. Taking the ACM Digital Library as a starting point, and using a combination of manual and automatic methods, we have constructed and released a 3-star Open Dataset representing over 1000 publications by almost 2,500 authors. An initial analysis reveals a modestly-sized but robust conference, with a changing pattern of in-citations that co-occurs with the arrival of social media, and a relatively consistent but imbalanced gender ratio of authors that shows some signs of recent improvements. The challenges encountered included identifying discrete author names, potential issues with text retrieval from PDF, and a disparate set of author keywords that reveals an absence of a common vocabulary. These insights are the results of a hard-fought process that is made complex by an incomplete digital record and a lack of consistency in naming. This Hypertext case study thus reveals a serious shortfall in the way that scholarly activity is captured and described, and questions PDF as the primary method of recording publications. Addressing these issues would make further analysis more straightforward and would allow larger events (with orders of magnitude more data) to be analysed in a similar way.
Tinderbox, analysis, citation networks, dataset, gender, hypertext, keywording, keywords, knowledge management, linkbases, links, metadata, visualisation
96-106
ACM New York
Anderson, Mark
2413b755-3f9e-4f6c-a729-2e1b2862da16
Millard, David
4f19bca5-80dc-4533-a101-89a5a0e3b372
Anderson, Mark
2413b755-3f9e-4f6c-a729-2e1b2862da16
Millard, David
4f19bca5-80dc-4533-a101-89a5a0e3b372

Anderson, Mark and Millard, David (2022) Hypertext’s meta-history: Documenting in-conference citations, authors and keyword data, 1987-2021. In HT 2022: 33rd ACM Conference on Hypertext and Social Media - Co-located with ACM WebSci 2022 and ACM UMAP 2022. ACM New York. pp. 96-106 . (doi:10.1145/3511095.3531271).

Record type: Conference or Workshop Item (Paper)

Abstract

Conferences such as ACM Hypertext have been running for many decades and the metadata on their collected publications represent a valuable scholarly meta-history on areas such as the community’s health, diversity, and changing interests. But the metadata about these papers is not readily available for analysis, and the data collection and cleaning tasks appear substantial. In this paper we attempt to explore this challenge using the ACM Hypertext series as a case study. Taking the ACM Digital Library as a starting point, and using a combination of manual and automatic methods, we have constructed and released a 3-star Open Dataset representing over 1000 publications by almost 2,500 authors. An initial analysis reveals a modestly-sized but robust conference, with a changing pattern of in-citations that co-occurs with the arrival of social media, and a relatively consistent but imbalanced gender ratio of authors that shows some signs of recent improvements. The challenges encountered included identifying discrete author names, potential issues with text retrieval from PDF, and a disparate set of author keywords that reveals an absence of a common vocabulary. These insights are the results of a hard-fought process that is made complex by an incomplete digital record and a lack of consistency in naming. This Hypertext case study thus reveals a serious shortfall in the way that scholarly activity is captured and described, and questions PDF as the primary method of recording publications. Addressing these issues would make further analysis more straightforward and would allow larger events (with orders of magnitude more data) to be analysed in a similar way.

Text
ht22-meta-history-preprint - Accepted Manuscript
Restricted to Repository staff only
Request a copy

More information

e-pub ahead of print date: 28 June 2022
Additional Information: Publisher Copyright: © 2022 ACM.
Venue - Dates: ACM Conference on Hypertext and Social Media 2022: Hypertext '22, , Barcelona, Spain, 2022-06-29 - 2022-07-01
Keywords: Tinderbox, analysis, citation networks, dataset, gender, hypertext, keywording, keywords, knowledge management, linkbases, links, metadata, visualisation

Identifiers

Local EPrints ID: 470579
URI: http://eprints.soton.ac.uk/id/eprint/470579
PURE UUID: 4609ddeb-eeb2-4c7a-80ba-c7feccd2d08e
ORCID for Mark Anderson: ORCID iD orcid.org/0000-0001-7396-0721
ORCID for David Millard: ORCID iD orcid.org/0000-0002-7512-2710

Catalogue record

Date deposited: 13 Oct 2022 16:38
Last modified: 31 Jan 2023 02:36

Export record

Altmetrics

Contributors

Author: Mark Anderson ORCID iD
Author: David Millard ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×