The University of Southampton
University of Southampton Institutional Repository

Automatic Extraction of Knowledge from Web Documents

Automatic Extraction of Knowledge from Web Documents
Automatic Extraction of Knowledge from Web Documents
A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from multiple documents based on a predefined ontology. The ontology represents the type and form of knowledge to extract. This knowledge is then used to generate tailored biographies. The information extraction process of Artequakt is detailed and evaluated in this paper.
Alani, Harith
70cdbdce-1494-44c2-9dae-65d82bf7e991
Kim, Sanghee
9e0e5909-9fbe-4c37-9606-2fdea35eac12
Millard, David E.
4f19bca5-80dc-4533-a101-89a5a0e3b372
Weal, Mark J.
e8fd30a6-c060-41c5-b388-ca52c81032a4
Lewis, Paul H.
7aa6c6d9-bc69-4e19-b2ac-a6e20558c020
Hall, Wendy
11f7f8db-854c-4481-b1ae-721a51d8790c
Shadbolt, Nigel R.
5c5acdf4-ad42-49b6-81fe-e9db58c2caf7
Alani, Harith
70cdbdce-1494-44c2-9dae-65d82bf7e991
Kim, Sanghee
9e0e5909-9fbe-4c37-9606-2fdea35eac12
Millard, David E.
4f19bca5-80dc-4533-a101-89a5a0e3b372
Weal, Mark J.
e8fd30a6-c060-41c5-b388-ca52c81032a4
Lewis, Paul H.
7aa6c6d9-bc69-4e19-b2ac-a6e20558c020
Hall, Wendy
11f7f8db-854c-4481-b1ae-721a51d8790c
Shadbolt, Nigel R.
5c5acdf4-ad42-49b6-81fe-e9db58c2caf7

Alani, Harith, Kim, Sanghee, Millard, David E., Weal, Mark J., Lewis, Paul H., Hall, Wendy and Shadbolt, Nigel R. (2003) Automatic Extraction of Knowledge from Web Documents. 2nd International Semantic Web Conference - Workshop on Human Language Technology for the Semantic Web abd Web Services, Sanibel Island, Florida, United States. 20 - 23 Oct 2003.

Record type: Conference or Workshop Item (Paper)

Abstract

A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from multiple documents based on a predefined ontology. The ontology represents the type and form of knowledge to extract. This knowledge is then used to generate tailored biographies. The information extraction process of Artequakt is detailed and evaluated in this paper.

Text
Alani-HLT03-final.pdf - Other
Download (283kB)

More information

Published date: 2003
Additional Information: Event Dates: October 20-23
Venue - Dates: 2nd International Semantic Web Conference - Workshop on Human Language Technology for the Semantic Web abd Web Services, Sanibel Island, Florida, United States, 2003-10-20 - 2003-10-23
Organisations: Web & Internet Science

Identifiers

Local EPrints ID: 258194
URI: http://eprints.soton.ac.uk/id/eprint/258194
PURE UUID: 5efed5dd-80db-4b53-8484-8bc000efe53c
ORCID for David E. Millard: ORCID iD orcid.org/0000-0002-7512-2710
ORCID for Mark J. Weal: ORCID iD orcid.org/0000-0001-6251-8786
ORCID for Wendy Hall: ORCID iD orcid.org/0000-0003-4327-7811

Catalogue record

Date deposited: 18 Oct 2003
Last modified: 15 Mar 2024 02:58

Export record

Contributors

Author: Harith Alani
Author: Sanghee Kim
Author: David E. Millard ORCID iD
Author: Mark J. Weal ORCID iD
Author: Paul H. Lewis
Author: Wendy Hall ORCID iD
Author: Nigel R. Shadbolt

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×