The University of Southampton
University of Southampton Institutional Repository

Livedoc: showing contextual information using topic modeling techniques

Livedoc: showing contextual information using topic modeling techniques
Livedoc: showing contextual information using topic modeling techniques

We present a solution named LiveDoc, which augments natural language text documents with relevant contextual background information. This background information helps readers to understand the context of the discourse better by fetching relevant information from other sources such as Wikipedia. Often the readers do not possess all background and supplementary information required for comprehending the purport of a narrative such as a news op-ed article. At the same time, it is not possible for authors to provide all contextual information while addressing a particular topic. LiveDoc processes the information in a document; uses extracted entities to fetch relevant background information in the context of the document from various sources (as defined by user) using semantic matching and topic modeling techniques like Latent Dirichlet Allocation and Hierarchical Dirichlet Process; and presents the background information to the user by augmenting the original document with the fetched information. Reader is then equipped better to understand the document with this additional background information. We present the effectiveness of our solution through extensive experimentation and associated results.

Data contextualization, Hierarchical Dirichlet Process, Information retrieval, Latent Dirichlet Allocation, Natural language processing, Topic modeling
0302-9743
468-482
Springer
Deshmukh, Jayati
5903b0c1-b4d1-4fbf-b687-610d4fde3990
Annervaz, K. M.
60ecdbb0-0673-49ca-92d4-29e48a46a0bb
Sengupta, Shubhashis
b7c8401f-33ff-4edc-89cf-228aa902a6cc
Pathak, Neetu
538ec4b0-7082-422b-bd22-3f9bcaeeb9a2
Perner, Petra
Deshmukh, Jayati
5903b0c1-b4d1-4fbf-b687-610d4fde3990
Annervaz, K. M.
60ecdbb0-0673-49ca-92d4-29e48a46a0bb
Sengupta, Shubhashis
b7c8401f-33ff-4edc-89cf-228aa902a6cc
Pathak, Neetu
538ec4b0-7082-422b-bd22-3f9bcaeeb9a2
Perner, Petra

Deshmukh, Jayati, Annervaz, K. M., Sengupta, Shubhashis and Pathak, Neetu (2016) Livedoc: showing contextual information using topic modeling techniques. Perner, Petra (ed.) In Machine Learning and Data Mining in Pattern Recognition - 12th International Conference, MLDM 2016, Proceedings. vol. 9729, Springer. pp. 468-482 . (doi:10.1007/978-3-319-41920-6_37).

Record type: Conference or Workshop Item (Paper)

Abstract

We present a solution named LiveDoc, which augments natural language text documents with relevant contextual background information. This background information helps readers to understand the context of the discourse better by fetching relevant information from other sources such as Wikipedia. Often the readers do not possess all background and supplementary information required for comprehending the purport of a narrative such as a news op-ed article. At the same time, it is not possible for authors to provide all contextual information while addressing a particular topic. LiveDoc processes the information in a document; uses extracted entities to fetch relevant background information in the context of the document from various sources (as defined by user) using semantic matching and topic modeling techniques like Latent Dirichlet Allocation and Hierarchical Dirichlet Process; and presents the background information to the user by augmenting the original document with the fetched information. Reader is then equipped better to understand the document with this additional background information. We present the effectiveness of our solution through extensive experimentation and associated results.

This record has no associated files available for download.

More information

Published date: 2016
Additional Information: Publisher Copyright: © Springer International Publishing Switzerland 2016.
Venue - Dates: 12th International Conference on Machine Learning and Data Mining in Pattern Recognition, MLDM 2016, , New York, United States, 2016-07-16 - 2016-07-21
Keywords: Data contextualization, Hierarchical Dirichlet Process, Information retrieval, Latent Dirichlet Allocation, Natural language processing, Topic modeling

Identifiers

Local EPrints ID: 493373
URI: http://eprints.soton.ac.uk/id/eprint/493373
ISSN: 0302-9743
PURE UUID: 4514879d-9245-4516-94c7-002195c88c33
ORCID for Jayati Deshmukh: ORCID iD orcid.org/0000-0002-1144-2635

Catalogue record

Date deposited: 30 Aug 2024 17:09
Last modified: 31 Aug 2024 02:12

Export record

Altmetrics

Contributors

Author: Jayati Deshmukh ORCID iD
Author: K. M. Annervaz
Author: Shubhashis Sengupta
Author: Neetu Pathak
Editor: Petra Perner

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×