Analyzing web archives through topic and event focused sub-collections
Analyzing web archives through topic and event focused sub-collections
Web archives capture the history of the Web and are therefore an important source to study how societal developments have been reflected on the Web. However, the large size of Web archives and their temporal nature pose many challenges to researchers interested in working with these collections. In this work, we describe the challenges of working with Web archives and propose the research methodology of extracting and studying sub-collections of the archive focused on specific topics and events. We discuss the opportunities and challenges of this approach and suggest a framework for creating sub-collections.
291-295
Gossen, Gerhard
4e867280-b7a5-4023-925c-0f76e23042c1
Demidova, Elena
8af7dea2-8dc6-40da-98b4-ea4a6593f2af
Risse, Thomas
4a491b46-c60d-4b27-9aba-7e8e2c6373b6
Gossen, Gerhard
4e867280-b7a5-4023-925c-0f76e23042c1
Demidova, Elena
8af7dea2-8dc6-40da-98b4-ea4a6593f2af
Risse, Thomas
4a491b46-c60d-4b27-9aba-7e8e2c6373b6
Gossen, Gerhard, Demidova, Elena and Risse, Thomas
(2016)
Analyzing web archives through topic and event focused sub-collections.
WebSci '16 ACM Web Science Conference, , Hannover, Germany.
22 - 25 May 2016.
.
(doi:10.1145/2908131.2908175).
Record type:
Conference or Workshop Item
(Paper)
Abstract
Web archives capture the history of the Web and are therefore an important source to study how societal developments have been reflected on the Web. However, the large size of Web archives and their temporal nature pose many challenges to researchers interested in working with these collections. In this work, we describe the challenges of working with Web archives and propose the research methodology of extracting and studying sub-collections of the archive focused on specific topics and events. We discuss the opportunities and challenges of this approach and suggest a framework for creating sub-collections.
Text
archive-recrawling-framework-websci16.pdf
- Accepted Manuscript
More information
Accepted/In Press date: 23 March 2016
e-pub ahead of print date: 2016
Venue - Dates:
WebSci '16 ACM Web Science Conference, , Hannover, Germany, 2016-05-22 - 2016-05-25
Organisations:
Electronics & Computer Science
Identifiers
Local EPrints ID: 392902
URI: http://eprints.soton.ac.uk/id/eprint/392902
PURE UUID: e1e2a418-ca5f-4960-a264-5d0643d67deb
Catalogue record
Date deposited: 21 Apr 2016 11:25
Last modified: 14 Mar 2024 23:52
Export record
Altmetrics
Contributors
Author:
Gerhard Gossen
Author:
Elena Demidova
Author:
Thomas Risse
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics