Automatic Extraction of Knowledge from Web Documents

A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from multiple documents based on a predefined ontology. The ontology represents the type and form of knowledge to extract. This knowledge is then used to generate tailored biographies. The information extraction process of Artequakt is detailed and evaluated in this paper.

Alani, Harith

70cdbdce-1494-44c2-9dae-65d82bf7e991

Kim, Sanghee

9e0e5909-9fbe-4c37-9606-2fdea35eac12

Millard, David E.

4f19bca5-80dc-4533-a101-89a5a0e3b372

Weal, Mark J.

e8fd30a6-c060-41c5-b388-ca52c81032a4

Lewis, Paul H.

7aa6c6d9-bc69-4e19-b2ac-a6e20558c020

Hall, Wendy

11f7f8db-854c-4481-b1ae-721a51d8790c

Shadbolt, Nigel R.

5c5acdf4-ad42-49b6-81fe-e9db58c2caf7

2003

Alani, Harith

70cdbdce-1494-44c2-9dae-65d82bf7e991

Kim, Sanghee

9e0e5909-9fbe-4c37-9606-2fdea35eac12

Millard, David E.

4f19bca5-80dc-4533-a101-89a5a0e3b372

Weal, Mark J.

e8fd30a6-c060-41c5-b388-ca52c81032a4

Lewis, Paul H.

7aa6c6d9-bc69-4e19-b2ac-a6e20558c020

Hall, Wendy

11f7f8db-854c-4481-b1ae-721a51d8790c

Shadbolt, Nigel R.

5c5acdf4-ad42-49b6-81fe-e9db58c2caf7

Alani, Harith, Kim, Sanghee, Millard, David E., Weal, Mark J., Lewis, Paul H., Hall, Wendy and Shadbolt, Nigel R. (2003) Automatic Extraction of Knowledge from Web Documents. 2nd International Semantic Web Conference - Workshop on Human Language Technology for the Semantic Web abd Web Services, Sanibel Island, Florida, United States. 20 - 23 Oct 2003.

Record type: Conference or Workshop Item (Paper)

Abstract

Text

Alani-HLT03-final.pdf - Other

Download (283kB)

More information

Published date: 2003

Additional Information: Event Dates: October 20-23

Venue - Dates: 2nd International Semantic Web Conference - Workshop on Human Language Technology for the Semantic Web abd Web Services, Sanibel Island, Florida, United States, 2003-10-20 - 2003-10-23

Organisations: Web & Internet Science

Identifiers

Local EPrints ID: 258194

URI: http://eprints.soton.ac.uk/id/eprint/258194

PURE UUID: 5efed5dd-80db-4b53-8484-8bc000efe53c

ORCID for David E. Millard:

orcid.org/0000-0002-7512-2710

ORCID for Mark J. Weal:

orcid.org/0000-0001-6251-8786

ORCID for Wendy Hall:

orcid.org/0000-0003-4327-7811

Catalogue record

Date deposited: 18 Oct 2003

Last modified: 15 Mar 2024 02:58

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Harith Alani

Author: Sanghee Kim

Author: David E. Millard

Author: Mark J. Weal

Author: Paul H. Lewis

Author: Wendy Hall

Author: Nigel R. Shadbolt

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information