Can Automatic Abstracting Improve on Current Extracting Techniques in Aiding Users to Judge the Relevance of Pages in Search Engine Results?
Can Automatic Abstracting Improve on Current Extracting Techniques in Aiding Users to Judge the Relevance of Pages in Search Engine Results?
Current search engines use sentence extraction techniques to produce snippet result summaries, which users may find less than ideal for determining the relevance of pages. Unlike extracting, abstracting programs analyse the context of documents and rewrite them into informative summaries. Our project aims to produce abstracting summaries which are coherent and easy to read thereby lessening users’ time in judging the relevance of pages. However, automatic abstracting technique has its domain restriction. For solving this problem we propose to employ text classification techniques. We propose a new approach to initially classify whole web documents into sixteen top level ODP categories by using machine learning and a Bayesian classifier. We then manually create sixteen templates for each category. The summarisation techniques we use include a natural language processing techniques to weight words and analyse lexical chains to identify salient phrases and place them into relevant template slots to produce summaries.
Liang, SF
22ac6455-24fb-40d7-b9b6-f8ae62f085fd
2004
Liang, SF
22ac6455-24fb-40d7-b9b6-f8ae62f085fd
Liang, SF
(2004)
Can Automatic Abstracting Improve on Current Extracting Techniques in Aiding Users to Judge the Relevance of Pages in Search Engine Results?
The 7th Computational Linguistics UK.
Record type:
Conference or Workshop Item
(Poster)
Abstract
Current search engines use sentence extraction techniques to produce snippet result summaries, which users may find less than ideal for determining the relevance of pages. Unlike extracting, abstracting programs analyse the context of documents and rewrite them into informative summaries. Our project aims to produce abstracting summaries which are coherent and easy to read thereby lessening users’ time in judging the relevance of pages. However, automatic abstracting technique has its domain restriction. For solving this problem we propose to employ text classification techniques. We propose a new approach to initially classify whole web documents into sixteen top level ODP categories by using machine learning and a Bayesian classifier. We then manually create sixteen templates for each category. The summarisation techniques we use include a natural language processing techniques to weight words and analyse lexical chains to identify salient phrases and place them into relevant template slots to produce summaries.
Text
CLUK_2004_proceeding.pdf
- Other
More information
Published date: 2004
Venue - Dates:
The 7th Computational Linguistics UK, 2004-01-01
Organisations:
Electronics & Computer Science
Identifiers
Local EPrints ID: 265175
URI: http://eprints.soton.ac.uk/id/eprint/265175
PURE UUID: e5c15fcb-e59b-4178-8945-68f06738cfda
Catalogue record
Date deposited: 14 Feb 2008 12:33
Last modified: 14 Mar 2024 08:04
Export record
Contributors
Author:
SF Liang
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics