Please wait a minute...
New Technology of Library and Information Service  2008, Vol. 24 Issue (12): 54-58    DOI: 10.11925/infotech.1003-3513.2008.12.10
Current Issue | Archive | Adv Search |
Chinese Term Recognition Based on Hidden Markov Model
Cen Yonghua 1,2  Han Zhe Ji Peipei 3,4
1(School of Economics and Management,Nanjing University of Science & Technology,Nanjing 210094,China)
2(Department of Information Management,Nanjing University,Nanjing 210093,China)
3(National Science Library, Chinese Academy of Sciences, Beijing 100190,China)
4(Graduate University of  Chinese Academy of Sciences, Beijing 100049, China)
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

After a perceptive analysis of probabilistic characteristics of syntax composition especially POS matching of Chinese textual information, a system framework for Chinese term recognition and extraction based on dual layer HMM is presented and implemented. The method proposed shows a good performance in the tests with textual information from different domain, and the terms recognized and extracted by the implemented system can be treated as candidate terms for false-eliminating and optimizing combining with parameters of mutual information, log likelihood and domain dependency.

Key wordsChinese term recognition      Hidden markov model      HMM     
Received: 13 August 2008      Published: 25 December 2008
: 

TP391 

 
  G358   
  H031

 
Corresponding Authors: Cen Yonghua     E-mail: yhcen@163.com
About author:: Cen Yonghua,Han Zhe,Ji Peipei

Cite this article:

Cen Yonghua,Han Zhe,Ji Peipei . Chinese Term Recognition Based on Hidden Markov Model. New Technology of Library and Information Service, 2008, 24(12): 54-58.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2008.12.10     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2008/V24/I12/54

[1] Autonomy Website[EB/OL].[2008-05-12].http://www.autonomy.com/.
[2] Lafferty J, McCallum A, Pereira F.Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence data[A]. In:Proceeding 18th International Conference on Machine Learning[C], Morgan Kaufmann, San Francisco, CA, 2001:282-289.
[3] Lawrence R. Rabiner. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition[A].In:Proceedings of the IEEE[C],1989,77(2):257-286.
[4] Berger A L, Stephen A Della Pietray, Vincent J Della Pietray. A Maximum Entropy Approach to Natural Language Processing[J].Computational Linguistics,1996,22(1):1-36.
[5] Takeuchi K,Nigel Collier.Use of Support Vector Machines in Extended Named Entity Recognition[A].In:Proceedings of the 6th Conference on Natural Language Learning[C].Taipei, Taiwan,2002:119-125.
[6] 周俊生,戴新宇,尹存燕,等. 基于层叠条件随机场模型的中文机构名自动识别[J].电子学报,2006,34(5):804-809.
[7] 史树敏,王志强,周良,等.基于条件随机域的中文命名实体识别[A]. 见:第三届学生计算语言学研讨会论文集[C], 辽宁,沈阳,2006.
[8] 俞鸿魁,张华平,刘群,等.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报,2006,27 (2):87-94.
[9] 何楠,毛新年,等. 一种两阶段的中文命名实体识别方法[A]. 见:中国计算技术与语言问题研究——第七届中文信息处理国际会议论文集[C], 北京:电子工业出版社出版,2007.
[10] 刘建舟. 术语自动抽取系统的设计及关键技术研究[D].武汉:华中师范大学,2004.
[11] 贺敏,龚才春,张华平,等.一种基于大规模语料的新词识别方法[J].计算机工程与应用,2007,43( 21):157- 159.
[12]  张锋,许云,侯艳,等. 基于互信息的中文术语抽取系统[J]. 计算机应用研究,2005,22(5):72-73,77.
[13] Zhang F, Xu Y, Hou Y, et al.Chinese Term Extraction System Based on Mutual Information. Application Research of Computers[J], 2005(5):72-73,77.
[14] 中科院计算所汉语词性标记集[EB/OL]. [2008-05-12].http://ictclas.org/docs/ICTPOS3.0汉语词性标记集.doc.

[1] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[2] Mai Fanjin,Wang Ting. Sense Disambiguation of Chinese Segmentation Based on Bi-direction Matching Method and HMM[J]. 现代图书情报技术, 2008, 24(8): 37-41.
[3] Wang Hao,Deng Sanhong. Comparative Study on HMM and CRFs Applying in Information Extraction[J]. 现代图书情报技术, 2007, 2(12): 57-63.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn