Please wait a minute...
New Technology of Library and Information Service  2013, Vol. 29 Issue (9): 54-59    DOI: 10.11925/infotech.1003-3513.2013.09.09
Current Issue | Archive | Adv Search |
An Automatic Term Extraction System of Improved C-value Based on Effective Word Frequency
Xiong Liyan, Tan Long, Zhong Maosheng
School of Information Engineering, East China Jiaotong University, Nanchang 330013, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  Existing Chinese term automatic extraction methods focus on the high-frequency characteristics and unithood indicators of terms, while low frequency terms and termhood indicators lack of effective treatment methods. In response to these problems, this paper introduces the background corpus into C-value method and proposes the concepts of word field distribution degree and effective word frequency. Then the paper automatically extracts the terms by calculating EC-value (Effective C-value) of candidate terms, and improves the extraction performance of low-frequency terms combined with the term cluster recognition and mining. The term extraction experiment in the computer field shows that the proposed improved method (EC-value method) can measure the termhood of terms more effectively, and improve the extraction performance of low-frequency terms.
Key wordsAutomatic term extraction      EC-value      Effective word frequency      Term cluster     
Received: 17 June 2013      Published: 27 September 2013
:  TP391.1  

Cite this article:

Xiong Liyan, Tan Long, Zhong Maosheng. An Automatic Term Extraction System of Improved C-value Based on Effective Word Frequency. New Technology of Library and Information Service, 2013, 29(9): 54-59.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2013.09.09     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2013/V29/I9/54

[1] 谷俊,王昊.基于领域中文文本的术语抽取方法研究[J]. 现代图书情报技术,2011 (4):29-34.(Gu Jun,Wang Hao.Study on Term Extraction on the Basis of Chinese Domain Texts[J].New Technology of Library and Information Service,2011(4):29-34.)
[2] 祝清松,冷伏海.自动术语识别存在的问题及发展趋势综述[J]. 图书情报工作,2012,56(18):104-109.(Zhu Qingsong,Leng Fuhai.Existing Problems and Developing Trends of Automatic Term Recognition[J].Library and Information Service,2012,56(18):104-109.)
[3] 韩红旗,安小米.C-value值和Unithood指标结合的中文科技术语抽取[J]. 图书情报工作,2012,56(19):85-89.(Han Hongqi,An Xiaomi.Chinese Scientific and Technical Term Extraction by Using C-value and Unithood Measure[J].Library and Information Service,2012,56(19):85-89.)
[4] 胡健坤.基于齐次马尔可夫链的中文新术语识别方法研究[D].广州:华南理工大学,2012.(Hu Jiankun.Chinese New Term Detection Method Based on Homogeneous Markov Chain Research[D].Guangzhou:South China University of Technology,2012.)
[5] 季培培,鄢小燕,岑咏华.面向领域中文文本信息处理的术语识别与抽取研究综述[J]. 图书情报工作,2010,54(16):124-l29.(Ji Peipei,Yan Xiaoyan,Cen Yonghua.A Survey of Term Recognition and Extraction for Domain-specific Chinese Text Information Processing[J].Library and Information Service,2010,54(16):124-129.)
[6] Frantzi K,Ananiadou S,Mima H.Automatic Recognition of Multi-word Terms:The C-value/NC-value Method[J].International Journa1 on Digital Libraries,2000,3(2):115-130.
[7] 周浪,史树敏,冯冲,等.基于多策略融合的中文术语抽取方法[J]. 情报学报,2010,29(3):460-467.(Zhou Lang,Shi Shumin,Feng Chong,et al.A Chinese Term Extraction System Based on Multi-Strategies Integration[J].Journal of the China Society for Scientific and Technical Information,2010,29(3):460-467.)
[8] 胡阿沛,张静,刘俊丽.基于改进C-value 方法的中文术语抽取[J]. 现代图书情报技术,2013 (2):24-29.(Hu Apei,Zhang Jing,Liu Junli.Chinese Term Extraction Based on Improved C-value Method[J].New Technology of Library and Information Service,2013 (2):24-29.)
[9] Milios E,Zhang Y,He B,et al.Automatic Term Extraction and Document Similarity in Special Text Corpora[C].In:Proceedings of the 6th Conference of the Pacific Association for Computational Linguistics,Halifax,Canada.2003:1-10.
[10] Barrón-Cedeo A,Sierra G,Drouin P,et al.An Improved Automatic Term Recognition Method for Spanish[C].In:Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing.Heidelberg:Springer-Verlag,2009:125-136.
[11] Mima H,Ananiadou S.An Application and Evaluation of the C/NC-value Approach for the Automatic Term Recognition of Multi-word Units in Japanese[J].International Journal of Terminology,2001,6(2):175-194.
[12] 李嵩.语言学文献标题的术语提取研究[D].济南:山东大学,2007.(Li Song. Terminology Extraction Research of Linguistic Literature Titles[D].Jinan:Shandong University,2007.)
[13] 周浪.中文术语抽取若干问题研究[D].南京:南京理工大学,2010.(Zhou Lang.A Study on the Chinese Term Extraction[D].Nanjing:Nanjing University of Science and Technology,2010.)
[14] 梁颖红,张文静,张有承.C 值和互信息相结合的术语抽取[J]. 计算机应用与软件,2010,27(4):108-110.(Liang Yinghong,Zhang Wenjing,Zhang Youcheng.Term Recognition Based on Integration of C-Value and Mutual Information[J].Computer Applications and Software,2010,27(4):108-110.)
[15] Manning C D,Schütze H.Foundation of Statistical Natural Language Processing[M].Cambridge,MA:MIT Press,1999.
[1] Yu Bengong,Zhu Xiaojie,Zhang Ziwei. A Capsule Network Model for Text Classification with Multi-level Feature Extraction[J]. 数据分析与知识发现, 2021, 5(6): 93-102.
[2] Liu Huan, Zhang Zhixiong, Wang Yufei. A Review on Main Optimization Methods of BERT [J]. 数据分析与知识发现, 0, (): 1-.
[3] Ye Guanghui, Xu Tong, Bi Chongwu, Li Xinyue. The Analysis of City Tourism Portrait Evolution Based on Multi-Dimensional Features and LDA Model [J]. 数据分析与知识发现, 0, (): 1-.
[4] Liu Jingru, Song Yang, Jia Rui, Zhang Yipeng, Luo Yong, Ma Jingdong. A BiLSTM-CRF Model for Chinese Clinical Protected Health Information Recognition [J]. 数据分析与知识发现, 0, (): 0-.
[5] Shi Lei,Wang Yi,Cheng Ying,Wei Ruibin. Review of Attention Mechanism in Natural Language Processing[J]. 数据分析与知识发现, 2020, 4(5): 1-14.
[6] Liu Ping,Peng Xiaofang. Calculating Word Similarities Based on Formal Concept Analysis[J]. 数据分析与知识发现, 2020, 4(5): 66-74.
[7] Liu Shurui,Tian Jidong,Chen Puchun,Lai Li,Song Guojie. New Sample Selection Algorithm with Textual Data[J]. 数据分析与知识发现, 2020, 4(2/3): 223-230.
[8] Xu Jianmin,Zhang Liqing,Wang Miao. Tracking Static Topics with Bayesian Network[J]. 数据分析与知识发现, 2020, 4(2/3): 200-206.
[9] Ying Tan,Jin Zhang,Lixin Xia. A Survey of Sentiment Analysis on Social Media[J]. 数据分析与知识发现, 2020, 4(1): 1-11.
[10] Hui Nie,Huan He. Identifying Implicit Features with Word Embedding[J]. 数据分析与知识发现, 2020, 4(1): 99-110.
[11] Bocheng Li,Yunqiu Zhang,Kaixi Yang. Extracting Emotion Tags from Comments of Microblog Commodities[J]. 数据分析与知识发现, 2019, 3(9): 115-123.
[12] Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu. Identifying Commodity Names Based on XGBoost Model[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[13] Yu Chuanming,Gong Yutian,Wang Feng,An Lu. Predicting Stock Prices with Text and Price Combined Model[J]. 数据分析与知识发现, 2018, 2(12): 33-42.
[14] Zeng Ziming,Yang Qianwen. Sentiment Analysis for Micro-blogs with LDA and AdaBoost[J]. 数据分析与知识发现, 2018, 2(8): 51-59.
[15] Jia Longjia,Zhang Bangzuo. Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo[J]. 数据分析与知识发现, 2018, 2(7): 55-62.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn