Chinese Unknown Word Identification Based on Local Bigram Model


Wang, Zhuoran and Liu, Ting (2005) Chinese Unknown Word Identification Based on Local Bigram Model. International Journal of Computer Processing of Oriental Languages, Vol. 1, (3), 185-196.

Download

Full text not available from this repository.

Description/Abstract

This paper presents a Chinese unknown word identification system based on a local bigram model. Generally, our word segmentation system employs a statistical-based unigram model. But to identify those unknown words, we take advantage of their contextual information and apply a bigram model locally. By adjusting the value of interpolation which is derived from a smoothing method, we combine these two models with different dimensions. As a simplification of bigram, this method is simple as well as feasible, since the complexity of its algorithm is quite low and not so many training corpora are needed. The results of our experiments show the solution is effective.

Item Type: Article
ISSNs: 0219-4279
Related URLs:
Keywords: Unknown word identification; Chinese word segmentation; Local bigram model
Divisions: Faculty of Physical and Applied Science > Electronics and Computer Science
Item ID: 261543
Date Deposited: 13 Nov 2005
Last Modified: 02 Mar 2012 11:39
Contributors: Wang, Zhuoran (Author)
Liu, Ting (Author)
Date: September 2005
Status: Published
Publisher: World Scientific
Further Information:Google Scholar
URI: http://eprints.soton.ac.uk/id/eprint/261543

Actions (login required)

View Item View Item