The University of Southampton
University of Southampton Institutional Repository

Can Automatic Abstracting Improve on Current Extracting Techniques in Aiding Users to Judge the Relevance of Pages in Search Engine Results?

Can Automatic Abstracting Improve on Current Extracting Techniques in Aiding Users to Judge the Relevance of Pages in Search Engine Results?
Can Automatic Abstracting Improve on Current Extracting Techniques in Aiding Users to Judge the Relevance of Pages in Search Engine Results?
Current search engines use sentence extraction techniques to produce snippet result summaries, which users may find less than ideal for determining the relevance of pages. Unlike extracting, abstracting programs analyse the context of documents and rewrite them into informative summaries. Our project aims to produce abstracting summaries which are coherent and easy to read thereby lessening users’ time in judging the relevance of pages. However, automatic abstracting technique has its domain restriction. For solving this problem we propose to employ text classification techniques. We propose a new approach to initially classify whole web documents into sixteen top level ODP categories by using machine learning and a Bayesian classifier. We then manually create sixteen templates for each category. The summarisation techniques we use include a natural language processing techniques to weight words and analyse lexical chains to identify salient phrases and place them into relevant template slots to produce summaries.
Liang, SF
22ac6455-24fb-40d7-b9b6-f8ae62f085fd
Liang, SF
22ac6455-24fb-40d7-b9b6-f8ae62f085fd

(2004) Can Automatic Abstracting Improve on Current Extracting Techniques in Aiding Users to Judge the Relevance of Pages in Search Engine Results? The 7th Computational Linguistics UK.

Record type: Conference or Workshop Item (Poster)

Abstract

Current search engines use sentence extraction techniques to produce snippet result summaries, which users may find less than ideal for determining the relevance of pages. Unlike extracting, abstracting programs analyse the context of documents and rewrite them into informative summaries. Our project aims to produce abstracting summaries which are coherent and easy to read thereby lessening users’ time in judging the relevance of pages. However, automatic abstracting technique has its domain restriction. For solving this problem we propose to employ text classification techniques. We propose a new approach to initially classify whole web documents into sixteen top level ODP categories by using machine learning and a Bayesian classifier. We then manually create sixteen templates for each category. The summarisation techniques we use include a natural language processing techniques to weight words and analyse lexical chains to identify salient phrases and place them into relevant template slots to produce summaries.

PDF
CLUK_2004_proceeding.pdf - Other
Download (37kB)

More information

Published date: 2004
Venue - Dates: The 7th Computational Linguistics UK, 2004-01-01
Organisations: Electronics & Computer Science

Identifiers

Local EPrints ID: 265175
URI: http://eprints.soton.ac.uk/id/eprint/265175
PURE UUID: e5c15fcb-e59b-4178-8945-68f06738cfda

Catalogue record

Date deposited: 14 Feb 2008 12:33
Last modified: 18 Jul 2017 07:28

Export record

Contributors

Author: SF Liang

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×