The University of Southampton
University of Southampton Institutional Repository

Analyzing web archives through topic and event focused sub-collections

Analyzing web archives through topic and event focused sub-collections
Analyzing web archives through topic and event focused sub-collections
Web archives capture the history of the Web and are therefore an important source to study how societal developments have been reflected on the Web. However, the large size of Web archives and their temporal nature pose many challenges to researchers interested in working with these collections. In this work, we describe the challenges of working with Web archives and propose the research methodology of extracting and studying sub-collections of the archive focused on specific topics and events. We discuss the opportunities and challenges of this approach and suggest a framework for creating sub-collections.
291-295
Gossen, Gerhard
4e867280-b7a5-4023-925c-0f76e23042c1
Demidova, Elena
8af7dea2-8dc6-40da-98b4-ea4a6593f2af
Risse, Thomas
4a491b46-c60d-4b27-9aba-7e8e2c6373b6
Gossen, Gerhard
4e867280-b7a5-4023-925c-0f76e23042c1
Demidova, Elena
8af7dea2-8dc6-40da-98b4-ea4a6593f2af
Risse, Thomas
4a491b46-c60d-4b27-9aba-7e8e2c6373b6

Gossen, Gerhard, Demidova, Elena and Risse, Thomas (2016) Analyzing web archives through topic and event focused sub-collections. WebSci '16 Proceedings of the 8th ACM Conference on Web Science, Hannover, Germany. 22 - 25 May 2016. pp. 291-295 . (doi:10.1145/2908131.2908175).

Record type: Conference or Workshop Item (Paper)

Abstract

Web archives capture the history of the Web and are therefore an important source to study how societal developments have been reflected on the Web. However, the large size of Web archives and their temporal nature pose many challenges to researchers interested in working with these collections. In this work, we describe the challenges of working with Web archives and propose the research methodology of extracting and studying sub-collections of the archive focused on specific topics and events. We discuss the opportunities and challenges of this approach and suggest a framework for creating sub-collections.

Text
archive-recrawling-framework-websci16.pdf - Accepted Manuscript
Download (249kB)

More information

Accepted/In Press date: 23 March 2016
e-pub ahead of print date: 2016
Venue - Dates: WebSci '16 Proceedings of the 8th ACM Conference on Web Science, Hannover, Germany, 2016-05-22 - 2016-05-25
Organisations: Electronics & Computer Science

Identifiers

Local EPrints ID: 392902
URI: http://eprints.soton.ac.uk/id/eprint/392902
PURE UUID: e1e2a418-ca5f-4960-a264-5d0643d67deb

Catalogue record

Date deposited: 21 Apr 2016 11:25
Last modified: 16 Dec 2019 19:59

Export record

Altmetrics

Contributors

Author: Gerhard Gossen
Author: Elena Demidova
Author: Thomas Risse

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×