Please wait a minute...
New Technology of Library and Information Service  2013, Vol. 29 Issue (9): 54-59    DOI: 10.11925/infotech.1003-3513.2013.09.09
Current Issue | Archive | Adv Search |
An Automatic Term Extraction System of Improved C-value Based on Effective Word Frequency
Xiong Liyan, Tan Long, Zhong Maosheng
School of Information Engineering, East China Jiaotong University, Nanchang 330013, China
Download: PDF(453 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  Existing Chinese term automatic extraction methods focus on the high-frequency characteristics and unithood indicators of terms, while low frequency terms and termhood indicators lack of effective treatment methods. In response to these problems, this paper introduces the background corpus into C-value method and proposes the concepts of word field distribution degree and effective word frequency. Then the paper automatically extracts the terms by calculating EC-value (Effective C-value) of candidate terms, and improves the extraction performance of low-frequency terms combined with the term cluster recognition and mining. The term extraction experiment in the computer field shows that the proposed improved method (EC-value method) can measure the termhood of terms more effectively, and improve the extraction performance of low-frequency terms.
Key wordsAutomatic term extraction      EC-value      Effective word frequency      Term cluster     
Received: 17 June 2013      Published: 27 September 2013
:  TP391.1  

Cite this article:

Xiong Liyan, Tan Long, Zhong Maosheng. An Automatic Term Extraction System of Improved C-value Based on Effective Word Frequency. New Technology of Library and Information Service, 2013, 29(9): 54-59.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2013.09.09     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2013/V29/I9/54

[1] 谷俊,王昊.基于领域中文文本的术语抽取方法研究[J]. 现代图书情报技术,2011 (4):29-34.(Gu Jun,Wang Hao.Study on Term Extraction on the Basis of Chinese Domain Texts[J].New Technology of Library and Information Service,2011(4):29-34.)
[2] 祝清松,冷伏海.自动术语识别存在的问题及发展趋势综述[J]. 图书情报工作,2012,56(18):104-109.(Zhu Qingsong,Leng Fuhai.Existing Problems and Developing Trends of Automatic Term Recognition[J].Library and Information Service,2012,56(18):104-109.)
[3] 韩红旗,安小米.C-value值和Unithood指标结合的中文科技术语抽取[J]. 图书情报工作,2012,56(19):85-89.(Han Hongqi,An Xiaomi.Chinese Scientific and Technical Term Extraction by Using C-value and Unithood Measure[J].Library and Information Service,2012,56(19):85-89.)
[4] 胡健坤.基于齐次马尔可夫链的中文新术语识别方法研究[D].广州:华南理工大学,2012.(Hu Jiankun.Chinese New Term Detection Method Based on Homogeneous Markov Chain Research[D].Guangzhou:South China University of Technology,2012.)
[5] 季培培,鄢小燕,岑咏华.面向领域中文文本信息处理的术语识别与抽取研究综述[J]. 图书情报工作,2010,54(16):124-l29.(Ji Peipei,Yan Xiaoyan,Cen Yonghua.A Survey of Term Recognition and Extraction for Domain-specific Chinese Text Information Processing[J].Library and Information Service,2010,54(16):124-129.)
[6] Frantzi K,Ananiadou S,Mima H.Automatic Recognition of Multi-word Terms:The C-value/NC-value Method[J].International Journa1 on Digital Libraries,2000,3(2):115-130.
[7] 周浪,史树敏,冯冲,等.基于多策略融合的中文术语抽取方法[J]. 情报学报,2010,29(3):460-467.(Zhou Lang,Shi Shumin,Feng Chong,et al.A Chinese Term Extraction System Based on Multi-Strategies Integration[J].Journal of the China Society for Scientific and Technical Information,2010,29(3):460-467.)
[8] 胡阿沛,张静,刘俊丽.基于改进C-value 方法的中文术语抽取[J]. 现代图书情报技术,2013 (2):24-29.(Hu Apei,Zhang Jing,Liu Junli.Chinese Term Extraction Based on Improved C-value Method[J].New Technology of Library and Information Service,2013 (2):24-29.)
[9] Milios E,Zhang Y,He B,et al.Automatic Term Extraction and Document Similarity in Special Text Corpora[C].In:Proceedings of the 6th Conference of the Pacific Association for Computational Linguistics,Halifax,Canada.2003:1-10.
[10] Barrón-Cedeo A,Sierra G,Drouin P,et al.An Improved Automatic Term Recognition Method for Spanish[C].In:Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing.Heidelberg:Springer-Verlag,2009:125-136.
[11] Mima H,Ananiadou S.An Application and Evaluation of the C/NC-value Approach for the Automatic Term Recognition of Multi-word Units in Japanese[J].International Journal of Terminology,2001,6(2):175-194.
[12] 李嵩.语言学文献标题的术语提取研究[D].济南:山东大学,2007.(Li Song. Terminology Extraction Research of Linguistic Literature Titles[D].Jinan:Shandong University,2007.)
[13] 周浪.中文术语抽取若干问题研究[D].南京:南京理工大学,2010.(Zhou Lang.A Study on the Chinese Term Extraction[D].Nanjing:Nanjing University of Science and Technology,2010.)
[14] 梁颖红,张文静,张有承.C 值和互信息相结合的术语抽取[J]. 计算机应用与软件,2010,27(4):108-110.(Liang Yinghong,Zhang Wenjing,Zhang Youcheng.Term Recognition Based on Integration of C-Value and Mutual Information[J].Computer Applications and Software,2010,27(4):108-110.)
[15] Manning C D,Schütze H.Foundation of Statistical Natural Language Processing[M].Cambridge,MA:MIT Press,1999.
[1] Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu. Identifying Commodity Names Based on XGBoost Model[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[2] Xu Deshan, Li Hui, Zhang Yunliang. A Method of Keywords Annotation Based on Linked Triples[J]. 现代图书情报技术, 2015, 31(9): 31-37.
[3] Chen Shiqin, Li Wenjiang. Application of WebSocket in Library Mobile Information Service[J]. 现代图书情报技术, 2015, 31(9): 90-96.
[4] Hu Juxiang, Lv Xueqiang, Liu Kehui. Complaint Text Classification Based on Guiding Words[J]. 现代图书情报技术, 2015, 31(7-8): 97-103.
[5] Duan Yufeng, Zhu Wenjing, Chen Qiao, Liu Wei, Liu Fenghong. The Study on Out-of-Vocabulary Identification on a Model Based on the Combination of CRFs and Domain Ontology Elements Set[J]. 现代图书情报技术, 2015, 31(4): 41-49.
[6] Li Junfeng, Lv Xueqiang, Zhou Shaojun. Patent Keyword Indexing Based on Weighted Complex Graph Model[J]. 现代图书情报技术, 2015, 31(3): 26-32.
[7] Ma Bin, Yin Lifeng. A Parallel Naive Bayesian Network Public Opinion Fast Classification Algorithm Based on Hadoop Platform[J]. 现代图书情报技术, 2015, 31(2): 78-84.
[8] Hou Ting, Lv Xueqiang, Li Zhuo. Hierarchical Filtering Method for Patent Term Extraction[J]. 现代图书情报技术, 2015, 31(1): 24-30.
[9] Tang Shouli, Xu Baoxiang. Research on Ontology-based Cloud Services Semantic Retrieval System[J]. 现代图书情报技术, 2014, 30(12): 27-35.
[10] Tang Xiaobo, Xiao Lu. Research of Text Feature Extraction on Dependency Parsing Network[J]. 现代图书情报技术, 2014, 30(11): 31-37.
[11] Shi Cui, Wang Yang, Yang Bin, Yao Ye. Identification of Non-nest Coordination for Chinese Patent Literature[J]. 现代图书情报技术, 2014, 30(10): 76-83.
[12] Zhang Yongjun, Liu Jinling, Ma Jialin. Classification of Multi Topic Extraction Based on Chinese Short Information Text Message Flow[J]. 现代图书情报技术, 2014, 30(7): 101-106.
[13] Li Wenjiang, Chen Shiqin. WeChat as Library Public Service Platform for the APP Client[J]. 现代图书情报技术, 2014, 30(7): 133-138.
[14] Tang Qing,Lv Xueqiang,Li Zhuo,Shi Shuicai,. Research on Domain Ontology Term Extraction[J]. 现代图书情报技术, 2014, 30(1): 43-50.
[15] Li Wenjiang, Chen Shiqin. Design of Library Information Push System Based on Android GCM Service[J]. 现代图书情报技术, 2013, 29(11): 91-96.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn