A sequence modeling approach for structured data extraction from unstructured text
A sequence modeling approach for structured data extraction from unstructured text
Extraction of structured information from unstructured text has always been a problem of interest for NLP community. Structured data is concise to store, search and retrieve; and it facilitates easier human & machine consumption. Traditionally, structured data extraction from text has been done by using various parsing methodologies, applying domain specific rules and heuristics. In this work, we leverage the developments in the space of sequence modeling for the problem of structured data extraction. Initially, we posed the problem as a machine translation problem and used the state-of-the-art machine translation model. Based on these initial results, we changed the approach to a sequence tagging one. We propose an extension of one of the attractive models for sequence tagging tailored and effective to our problem. This gave 4.4% improvement over the vanilla sequence tagging model. We also propose another variant of the sequence tagging model which can handle multiple labels of words. Experiments have been performed on Wikipedia Infobox Dataset of biographies and results are presented for both single and multi-label models. These models indicate an effective alternate deep learning technique based methods to extract structured data from raw text.
41-50
Association for Computational Linguistics (ACL)
Deshmukh, Jayati
5903b0c1-b4d1-4fbf-b687-610d4fde3990
Annervaz, K. M.
60ecdbb0-0673-49ca-92d4-29e48a46a0bb
Sengupta, Shubhashis
b7c8401f-33ff-4edc-89cf-228aa902a6cc
1 January 2019
Deshmukh, Jayati
5903b0c1-b4d1-4fbf-b687-610d4fde3990
Annervaz, K. M.
60ecdbb0-0673-49ca-92d4-29e48a46a0bb
Sengupta, Shubhashis
b7c8401f-33ff-4edc-89cf-228aa902a6cc
Deshmukh, Jayati, Annervaz, K. M. and Sengupta, Shubhashis
(2019)
A sequence modeling approach for structured data extraction from unstructured text.
In IJCAI 2019 - Proceedings of the 5th Workshop on Semantic Deep Learning, SemDeep 2019.
Association for Computational Linguistics (ACL).
.
Record type:
Conference or Workshop Item
(Paper)
Abstract
Extraction of structured information from unstructured text has always been a problem of interest for NLP community. Structured data is concise to store, search and retrieve; and it facilitates easier human & machine consumption. Traditionally, structured data extraction from text has been done by using various parsing methodologies, applying domain specific rules and heuristics. In this work, we leverage the developments in the space of sequence modeling for the problem of structured data extraction. Initially, we posed the problem as a machine translation problem and used the state-of-the-art machine translation model. Based on these initial results, we changed the approach to a sequence tagging one. We propose an extension of one of the attractive models for sequence tagging tailored and effective to our problem. This gave 4.4% improvement over the vanilla sequence tagging model. We also propose another variant of the sequence tagging model which can handle multiple labels of words. Experiments have been performed on Wikipedia Infobox Dataset of biographies and results are presented for both single and multi-label models. These models indicate an effective alternate deep learning technique based methods to extract structured data from raw text.
This record has no associated files available for download.
More information
Published date: 1 January 2019
Additional Information:
Publisher Copyright:
© IJCAI 2019 - Proceedings of the 5th Workshop on Semantic Deep Learning, SemDeep 2019. All rights reserved.
Venue - Dates:
5th Workshop on Semantic Deep Learning, SemDeep 2019, held in conjunction with IJCAI 2019, , Macau, China, 2019-08-12
Identifiers
Local EPrints ID: 493208
URI: http://eprints.soton.ac.uk/id/eprint/493208
PURE UUID: a661303f-dfb6-4fb3-a93d-4c07b93074f9
Catalogue record
Date deposited: 27 Aug 2024 17:31
Last modified: 28 Aug 2024 02:16
Export record
Contributors
Author:
Jayati Deshmukh
Author:
K. M. Annervaz
Author:
Shubhashis Sengupta
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics