The University of Southampton
University of Southampton Institutional Repository

The impact of enriched linguistic annotation on the performance of extracting relation triples

The impact of enriched linguistic annotation on the performance of extracting relation triples
The impact of enriched linguistic annotation on the performance of extracting relation triples
A relation extraction system recognises pre-defined relation types between two identified entities from natural language documents. It is important for a task of automatically locating missing instances in knowledge base where the instance is represented as a triple (‘entity – relation – entity’). A relation entry specifies a set of rules associated with the syntactic and semantic conditions under which appropriate relations would be extracted. Manually creating such rules requires knowledge from information experts and moreover, it is a time-consuming and error-prone task when the input sentences have little consistency in terms of structures and vocabularies. In this paper, we present an approach for applying a symbolic learning algorithm to sentences in order to automatically induce the extraction rules which then successfully classify a new sentence. The proposed approach takes into account semantic attributes (e.g., semantically close words) as well as linguistic features(entity types) in generalising common patterns among the sentences which enable the system to cope better with syntactically different but semantically similar sentences. Not only does this increase the number of relations extracted, but it also improves the accuracy in extracting relations by adding features which might not be discovered only with syntactic analysis. Experimental results show that this approach is effective on the sentences of the Web documents obtaining 17% higher precision and 34% higher recall values.
relation extraction, information extraction, inductive logic programming
547-558
Springer
Kim, Sanghee
9e0e5909-9fbe-4c37-9606-2fdea35eac12
Lewis, Paul
7aa6c6d9-bc69-4e19-b2ac-a6e20558c020
Martinez, Kirk
5f711898-20fc-410e-a007-837d8c57cb18
Gelbukh, Alexander
85d5d2c9-619c-483f-b5da-d89efbc529ca
Gelbukh, Alexander
Kim, Sanghee
9e0e5909-9fbe-4c37-9606-2fdea35eac12
Lewis, Paul
7aa6c6d9-bc69-4e19-b2ac-a6e20558c020
Martinez, Kirk
5f711898-20fc-410e-a007-837d8c57cb18
Gelbukh, Alexander
85d5d2c9-619c-483f-b5da-d89efbc529ca
Gelbukh, Alexander

Kim, Sanghee, Lewis, Paul and Martinez, Kirk (2004) The impact of enriched linguistic annotation on the performance of extracting relation triples. In, Gelbukh, Alexander and Gelbukh, Alexander (eds.) Computational Linguistics and Intelligent Text Processing: 5th International Conference, CICLING 2004. Springer, pp. 547-558.

Record type: Book Section

Abstract

A relation extraction system recognises pre-defined relation types between two identified entities from natural language documents. It is important for a task of automatically locating missing instances in knowledge base where the instance is represented as a triple (‘entity – relation – entity’). A relation entry specifies a set of rules associated with the syntactic and semantic conditions under which appropriate relations would be extracted. Manually creating such rules requires knowledge from information experts and moreover, it is a time-consuming and error-prone task when the input sentences have little consistency in terms of structures and vocabularies. In this paper, we present an approach for applying a symbolic learning algorithm to sentences in order to automatically induce the extraction rules which then successfully classify a new sentence. The proposed approach takes into account semantic attributes (e.g., semantically close words) as well as linguistic features(entity types) in generalising common patterns among the sentences which enable the system to cope better with syntactically different but semantically similar sentences. Not only does this increase the number of relations extracted, but it also improves the accuracy in extracting relations by adding features which might not be discovered only with syntactic analysis. Experimental results show that this approach is effective on the sentences of the Web documents obtaining 17% higher precision and 34% higher recall values.

Text
CICLing04.pdf - Other
Restricted to Repository staff only
Request a copy

More information

Published date: February 2004
Keywords: relation extraction, information extraction, inductive logic programming
Organisations: Web & Internet Science

Identifiers

Local EPrints ID: 258880
URI: http://eprints.soton.ac.uk/id/eprint/258880
PURE UUID: 87a3e4cb-a0af-4ede-9135-df0c2bee7cc6
ORCID for Kirk Martinez: ORCID iD orcid.org/0000-0003-3859-5700

Catalogue record

Date deposited: 25 Feb 2004
Last modified: 16 Mar 2024 02:54

Export record

Contributors

Author: Sanghee Kim
Author: Paul Lewis
Author: Kirk Martinez ORCID iD
Editor: Alexander Gelbukh
Editor: Alexander Gelbukh

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×