The University of Southampton
University of Southampton Institutional Repository

Emergence of consensus and shared vocabularies in collaborative tagging systems

Emergence of consensus and shared vocabularies in collaborative tagging systems
Emergence of consensus and shared vocabularies in collaborative tagging systems
This article uses data from the social bookmarking site del.icio.us to empirically examine the dynamics of collaborative tagging systems and to study how coherent categorization schemes emerge from unsupervised tagging by individual users. First, we study the formation of stable distributions in tagging systems, seen as an implicit form of “consensus” reached by the users of the system around the tags that best describe a resource. We show that final tag frequencies for most resources converge to power law distributions and we propose an empirical method to examine the dynamics of the convergence process, based on the Kullback-Leibler divergence measure. The convergence analysis is performed for both the most utilized tags at the top of tag distributions and the so-called long tail. Second, we study the information structures that emerge from collaborative tagging, namely tag correlation (or folksonomy) graphs. We show how community-based network techniques can be used to extract simple tag vocabularies from the tag correlation graphs by partitioning them into subsets of related tags. Furthermore, we also show, for a specialized domain, that shared vocabularies produced by collaborative tagging are richer than the vocabularies which can be extracted from large-scale query logs provided by a major search engine. Although the empirical analysis presented in this article is based on a set of tagging data obtained from del.icio.us, the methods developed are general, and the conclusions should be applicable across other websites that employ tagging.
Collaborative tagging, community identification algorithms, complex systems, del.icio.us, emergent semantics, graphical models, knowledge extraction, power laws, search engines
1-34
Robu, Valentin
36b30550-208e-48d4-8f0e-8ff6976cf566
Halpin, Harry
238bc1ee-b721-410b-b9fd-1a4eb59cdb14
Shepherd, Hana
43fde47d-374f-46d0-8ad4-d110353a7717
Robu, Valentin
36b30550-208e-48d4-8f0e-8ff6976cf566
Halpin, Harry
238bc1ee-b721-410b-b9fd-1a4eb59cdb14
Shepherd, Hana
43fde47d-374f-46d0-8ad4-d110353a7717

Robu, Valentin, Halpin, Harry and Shepherd, Hana (2009) Emergence of consensus and shared vocabularies in collaborative tagging systems. ACM Transactions on the Web, 3 (4), 1-34.

Record type: Article

Abstract

This article uses data from the social bookmarking site del.icio.us to empirically examine the dynamics of collaborative tagging systems and to study how coherent categorization schemes emerge from unsupervised tagging by individual users. First, we study the formation of stable distributions in tagging systems, seen as an implicit form of “consensus” reached by the users of the system around the tags that best describe a resource. We show that final tag frequencies for most resources converge to power law distributions and we propose an empirical method to examine the dynamics of the convergence process, based on the Kullback-Leibler divergence measure. The convergence analysis is performed for both the most utilized tags at the top of tag distributions and the so-called long tail. Second, we study the information structures that emerge from collaborative tagging, namely tag correlation (or folksonomy) graphs. We show how community-based network techniques can be used to extract simple tag vocabularies from the tag correlation graphs by partitioning them into subsets of related tags. Furthermore, we also show, for a specialized domain, that shared vocabularies produced by collaborative tagging are richer than the vocabularies which can be extracted from large-scale query logs provided by a major search engine. Although the empirical analysis presented in this article is based on a set of tagging data obtained from del.icio.us, the methods developed are general, and the conclusions should be applicable across other websites that employ tagging.

Text
ACMTransactionsPreprint.pdf - Version of Record
Download (565kB)

More information

Published date: September 2009
Keywords: Collaborative tagging, community identification algorithms, complex systems, del.icio.us, emergent semantics, graphical models, knowledge extraction, power laws, search engines
Organisations: Agents, Interactions & Complexity

Identifiers

Local EPrints ID: 268192
URI: http://eprints.soton.ac.uk/id/eprint/268192
PURE UUID: fe3269c9-f30f-4208-98af-38e114d6407d

Catalogue record

Date deposited: 11 Nov 2009 17:32
Last modified: 14 Mar 2024 09:05

Export record

Contributors

Author: Valentin Robu
Author: Harry Halpin
Author: Hana Shepherd

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×