The University of Southampton
University of Southampton Institutional Repository

Automatic Ontology-Based Knowledge Extraction from Web Documents

Automatic Ontology-Based Knowledge Extraction from Web Documents
Automatic Ontology-Based Knowledge Extraction from Web Documents
To bring the Semantic Web to life and provide advanced knowledge services, we need efficient ways to access and extract knowledge from Web documents. Although Web page annotations could facilitate such knowledge gathering, annotations are rare and will probably never be rich or detailed enough to cover all the knowledge these documents contain. Manual annotation is impractical and unscalable, and automatic annotation tools remain largely undeveloped. Specialized knowledge services therefore require tools that can search and extract specific knowledge directly from unstructured text on the Web, guided by an ontology that details what type of knowledge to harvest. An ontology uses concepts and relations to classify domain knowledge. Other researchers have used ontologies to support knowledge extraction,1,2 but few have explored their full potential in this domain. The Artequakt project links a knowledge-extraction tool with an ontology to achieve continuous knowledge support and guide information extraction. The extraction tool searches online documents and extracts knowledge that matches the given classification structure. It provides this knowledge in a machine-readable format that will be automatically maintained in a knowledge base (KB). Users could further enhance knowledge extraction using a lexicon-based term expansion mechanism that provides extended ontology terminology.
1541-1672
14-21
Alani, Harith
70cdbdce-1494-44c2-9dae-65d82bf7e991
Kim, Sanghee
9e0e5909-9fbe-4c37-9606-2fdea35eac12
Millard, David E.
4f19bca5-80dc-4533-a101-89a5a0e3b372
Weal, Mark J.
e8fd30a6-c060-41c5-b388-ca52c81032a4
Hall, Wendy
11f7f8db-854c-4481-b1ae-721a51d8790c
Lewis, Paul H.
7aa6c6d9-bc69-4e19-b2ac-a6e20558c020
Shadbolt, Nigel R.
5c5acdf4-ad42-49b6-81fe-e9db58c2caf7
Alani, Harith
70cdbdce-1494-44c2-9dae-65d82bf7e991
Kim, Sanghee
9e0e5909-9fbe-4c37-9606-2fdea35eac12
Millard, David E.
4f19bca5-80dc-4533-a101-89a5a0e3b372
Weal, Mark J.
e8fd30a6-c060-41c5-b388-ca52c81032a4
Hall, Wendy
11f7f8db-854c-4481-b1ae-721a51d8790c
Lewis, Paul H.
7aa6c6d9-bc69-4e19-b2ac-a6e20558c020
Shadbolt, Nigel R.
5c5acdf4-ad42-49b6-81fe-e9db58c2caf7

Alani, Harith, Kim, Sanghee, Millard, David E., Weal, Mark J., Hall, Wendy, Lewis, Paul H. and Shadbolt, Nigel R. (2003) Automatic Ontology-Based Knowledge Extraction from Web Documents. IEEE Intelligent Systems, 18 (1), 14-21.

Record type: Article

Abstract

To bring the Semantic Web to life and provide advanced knowledge services, we need efficient ways to access and extract knowledge from Web documents. Although Web page annotations could facilitate such knowledge gathering, annotations are rare and will probably never be rich or detailed enough to cover all the knowledge these documents contain. Manual annotation is impractical and unscalable, and automatic annotation tools remain largely undeveloped. Specialized knowledge services therefore require tools that can search and extract specific knowledge directly from unstructured text on the Web, guided by an ontology that details what type of knowledge to harvest. An ontology uses concepts and relations to classify domain knowledge. Other researchers have used ontologies to support knowledge extraction,1,2 but few have explored their full potential in this domain. The Artequakt project links a knowledge-extraction tool with an ontology to achieve continuous knowledge support and guide information extraction. The extraction tool searches online documents and extracts knowledge that matches the given classification structure. It provides this knowledge in a machine-readable format that will be automatically maintained in a knowledge base (KB). Users could further enhance knowledge extraction using a lexicon-based term expansion mechanism that provides extended ontology terminology.

Text
Alani-IEEE-IS-2002.pdf - Other
Download (4MB)

More information

Published date: January 2003
Organisations: Web & Internet Science

Identifiers

Local EPrints ID: 257396
URI: http://eprints.soton.ac.uk/id/eprint/257396
ISSN: 1541-1672
PURE UUID: df4550ad-5d12-4ae0-ac5c-dbb7d77918ef
ORCID for David E. Millard: ORCID iD orcid.org/0000-0002-7512-2710
ORCID for Mark J. Weal: ORCID iD orcid.org/0000-0001-6251-8786
ORCID for Wendy Hall: ORCID iD orcid.org/0000-0003-4327-7811

Catalogue record

Date deposited: 14 Apr 2003
Last modified: 15 Mar 2024 02:58

Export record

Contributors

Author: Harith Alani
Author: Sanghee Kim
Author: David E. Millard ORCID iD
Author: Mark J. Weal ORCID iD
Author: Wendy Hall ORCID iD
Author: Paul H. Lewis
Author: Nigel R. Shadbolt

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×