Chinese Unknown Word Identification Based on Local Bigram Model
Wang, Zhuoran and Liu, Ting (2005) Chinese Unknown Word Identification Based on Local Bigram Model. International Journal of Computer Processing of Oriental Languages, Vol. 1, (3), 185-196.
Full text not available from this repository.
This paper presents a Chinese unknown word identification system based on a local bigram model. Generally, our word segmentation system employs a statistical-based unigram model. But to identify those unknown words, we take advantage of their contextual information and apply a bigram model locally. By adjusting the value of interpolation which is derived from a smoothing method, we combine these two models with different dimensions. As a simplification of bigram, this method is simple as well as feasible, since the complexity of its algorithm is quite low and not so many training corpora are needed. The results of our experiments show the solution is effective.
|Keywords:||Unknown word identification; Chinese word segmentation; Local bigram model|
|Divisions:||Faculty of Physical and Applied Science > Electronics and Computer Science
|Date Deposited:||13 Nov 2005|
|Last Modified:||02 Mar 2012 11:39|
|Contributors:||Wang, Zhuoran (Author)
Liu, Ting (Author)
|Further Information:||Google Scholar|
|RDF:||RDF+N-Triples, RDF+N3, RDF+XML, Browse.|
Actions (login required)