A sequence modeling approach for structured data extraction from unstructured text

Extraction of structured information from unstructured text has always been a problem of interest for NLP community. Structured data is concise to store, search and retrieve; and it facilitates easier human & machine consumption. Traditionally, structured data extraction from text has been done by using various parsing methodologies, applying domain specific rules and heuristics. In this work, we leverage the developments in the space of sequence modeling for the problem of structured data extraction. Initially, we posed the problem as a machine translation problem and used the state-of-the-art machine translation model. Based on these initial results, we changed the approach to a sequence tagging one. We propose an extension of one of the attractive models for sequence tagging tailored and effective to our problem. This gave 4.4% improvement over the vanilla sequence tagging model. We also propose another variant of the sequence tagging model which can handle multiple labels of words. Experiments have been performed on Wikipedia Infobox Dataset of biographies and results are presented for both single and multi-label models. These models indicate an effective alternate deep learning technique based methods to extract structured data from raw text.

41-50

Association for Computational Linguistics (ACL)

Deshmukh, Jayati

5903b0c1-b4d1-4fbf-b687-610d4fde3990

Annervaz, K. M.

60ecdbb0-0673-49ca-92d4-29e48a46a0bb

Sengupta, Shubhashis

b7c8401f-33ff-4edc-89cf-228aa902a6cc

1 January 2019

Deshmukh, Jayati

5903b0c1-b4d1-4fbf-b687-610d4fde3990

Annervaz, K. M.

60ecdbb0-0673-49ca-92d4-29e48a46a0bb

Sengupta, Shubhashis

b7c8401f-33ff-4edc-89cf-228aa902a6cc

Deshmukh, Jayati, Annervaz, K. M. and Sengupta, Shubhashis (2019) A sequence modeling approach for structured data extraction from unstructured text. In IJCAI 2019 - Proceedings of the 5th Workshop on Semantic Deep Learning, SemDeep 2019. Association for Computational Linguistics (ACL). pp. 41-50 .