The University of Southampton
University of Southampton Institutional Repository

Trendminer: an architecture for real time analysis of social media text

Trendminer: an architecture for real time analysis of social media text
Trendminer: an architecture for real time analysis of social media text
The emergence of online social networks (OSNs) and the accompanying availability of large amounts of data, pose a number of new natural language processing (NLP) and computational challenges. Data from OSNs is different to data from traditional sources (e.g. newswire). The texts are short, noisy and conversational. Another important issue is that data occurs in a real-time streams, needing immediate analysis that is grounded in time and context. In this paper we describe a new open-source framework for efficient text processing of streaming OSN data (available at www.trendminer-project.eu). Whilst researchers have made progress in adapting or creating text analysis tools for OSN data, a system to unify these tasks has yet to be built. Our system is focused on a real world scenario where fast processing and accuracy is paramount. We use the MapReduce framework for distributed computing and present running times for our system in order to show that scaling to online scenarios is feasible. We describe the components of the system and evaluate their accuracy. Our system supports easy integration of future modules in order to extend its functionality.
Preotiuc-Pietro, Daniel
4f95f12b-dfc9-4bbf-aa27-182d79025c57
Samangooei, Sina
c380fb26-55d4-4b34-94e7-c92bbb26a40d
Cohn, Trevor
ce35424c-5505-499f-984c-3a92a51881a2
Gibbins, Nicholas
98efd447-4aa7-411c-86d1-955a612eceac
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
Preotiuc-Pietro, Daniel
4f95f12b-dfc9-4bbf-aa27-182d79025c57
Samangooei, Sina
c380fb26-55d4-4b34-94e7-c92bbb26a40d
Cohn, Trevor
ce35424c-5505-499f-984c-3a92a51881a2
Gibbins, Nicholas
98efd447-4aa7-411c-86d1-955a612eceac
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f

Preotiuc-Pietro, Daniel, Samangooei, Sina, Cohn, Trevor, Gibbins, Nicholas and Niranjan, Mahesan (2012) Trendminer: an architecture for real time analysis of social media text. 6th International AAAI Conference on Weblogs and Social Media (ICWSM-12), Dublin, Ireland. 05 - 07 Jun 2012. 5 pp .

Record type: Conference or Workshop Item (Paper)

Abstract

The emergence of online social networks (OSNs) and the accompanying availability of large amounts of data, pose a number of new natural language processing (NLP) and computational challenges. Data from OSNs is different to data from traditional sources (e.g. newswire). The texts are short, noisy and conversational. Another important issue is that data occurs in a real-time streams, needing immediate analysis that is grounded in time and context. In this paper we describe a new open-source framework for efficient text processing of streaming OSN data (available at www.trendminer-project.eu). Whilst researchers have made progress in adapting or creating text analysis tools for OSN data, a system to unify these tasks has yet to be built. Our system is focused on a real world scenario where fast processing and accuracy is paramount. We use the MapReduce framework for distributed computing and present running times for our system in order to show that scaling to online scenarios is feasible. We describe the components of the system and evaluate their accuracy. Our system supports easy integration of future modules in order to extend its functionality.

Text
4739-22047-1-PB.pdf - Version of Record
Restricted to Repository staff only
Request a copy

More information

e-pub ahead of print date: June 2012
Published date: June 2012
Venue - Dates: 6th International AAAI Conference on Weblogs and Social Media (ICWSM-12), Dublin, Ireland, 2012-06-05 - 2012-06-07
Organisations: Web & Internet Science

Identifiers

Local EPrints ID: 340056
URI: http://eprints.soton.ac.uk/id/eprint/340056
PURE UUID: 260b480b-cf54-4e2f-9a6a-cc803c0afc19
ORCID for Nicholas Gibbins: ORCID iD orcid.org/0000-0002-6140-9956
ORCID for Mahesan Niranjan: ORCID iD orcid.org/0000-0001-7021-140X

Catalogue record

Date deposited: 08 Jun 2012 08:44
Last modified: 15 Mar 2024 03:29

Export record

Contributors

Author: Daniel Preotiuc-Pietro
Author: Sina Samangooei
Author: Trevor Cohn
Author: Nicholas Gibbins ORCID iD
Author: Mahesan Niranjan ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×