Automatic Extraction of Knowledge from Web Documents
Automatic Extraction of Knowledge from Web Documents
A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from multiple documents based on a predefined ontology. The ontology represents the type and form of knowledge to extract. This knowledge is then used to generate tailored biographies. The information extraction process of Artequakt is detailed and evaluated in this paper.
Alani, Harith
70cdbdce-1494-44c2-9dae-65d82bf7e991
Kim, Sanghee
9e0e5909-9fbe-4c37-9606-2fdea35eac12
Millard, David E.
4f19bca5-80dc-4533-a101-89a5a0e3b372
Weal, Mark J.
e8fd30a6-c060-41c5-b388-ca52c81032a4
Lewis, Paul H.
7aa6c6d9-bc69-4e19-b2ac-a6e20558c020
Hall, Wendy
11f7f8db-854c-4481-b1ae-721a51d8790c
Shadbolt, Nigel R.
5c5acdf4-ad42-49b6-81fe-e9db58c2caf7
2003
Alani, Harith
70cdbdce-1494-44c2-9dae-65d82bf7e991
Kim, Sanghee
9e0e5909-9fbe-4c37-9606-2fdea35eac12
Millard, David E.
4f19bca5-80dc-4533-a101-89a5a0e3b372
Weal, Mark J.
e8fd30a6-c060-41c5-b388-ca52c81032a4
Lewis, Paul H.
7aa6c6d9-bc69-4e19-b2ac-a6e20558c020
Hall, Wendy
11f7f8db-854c-4481-b1ae-721a51d8790c
Shadbolt, Nigel R.
5c5acdf4-ad42-49b6-81fe-e9db58c2caf7
Alani, Harith, Kim, Sanghee, Millard, David E., Weal, Mark J., Lewis, Paul H., Hall, Wendy and Shadbolt, Nigel R.
(2003)
Automatic Extraction of Knowledge from Web Documents.
2nd International Semantic Web Conference - Workshop on Human Language Technology for the Semantic Web abd Web Services, Sanibel Island, Florida, United States.
20 - 23 Oct 2003.
Record type:
Conference or Workshop Item
(Paper)
Abstract
A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from multiple documents based on a predefined ontology. The ontology represents the type and form of knowledge to extract. This knowledge is then used to generate tailored biographies. The information extraction process of Artequakt is detailed and evaluated in this paper.
Text
Alani-HLT03-final.pdf
- Other
More information
Published date: 2003
Additional Information:
Event Dates: October 20-23
Venue - Dates:
2nd International Semantic Web Conference - Workshop on Human Language Technology for the Semantic Web abd Web Services, Sanibel Island, Florida, United States, 2003-10-20 - 2003-10-23
Organisations:
Web & Internet Science
Identifiers
Local EPrints ID: 258194
URI: http://eprints.soton.ac.uk/id/eprint/258194
PURE UUID: 5efed5dd-80db-4b53-8484-8bc000efe53c
Catalogue record
Date deposited: 18 Oct 2003
Last modified: 15 Mar 2024 02:58
Export record
Contributors
Author:
Harith Alani
Author:
Sanghee Kim
Author:
David E. Millard
Author:
Mark J. Weal
Author:
Paul H. Lewis
Author:
Nigel R. Shadbolt
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics