|
|
English Term Extraction Based on Context Analysis & Statistical Characteristic |
Xu Deshan1,2, Zhang Zhixiong1, Wang Feng3, Xing Meifeng1,2 |
1. National Science Library, Chinese Academy of Sciences, Beijing 100190, China;
2. Graduate University of Chinese Academy of Sciences,Beijing 100049,China;
3. National Key Laboratory for Electronic Measurement Technology, North University of China, Taiyuan 030051,China |
|
|
Abstract Firstly, the article introduces the basic features of terms, and discusses the automatic identification method of scientific terms. Then V-value is proposed, which improves the two main statistical indicators:TF-IDF and C-value according to text characteristics. Different weights are also set for the candidate terms by the position to show their effect. Finally, a term extraction system is implemented based on statistics and rules. The system combines the weight, C-value and TF-IDF, so it has a higher precision of extraction.
|
Received: 30 September 2010
Published: 07 January 2011
|
|
[1] Krauthammer M, Nenadic G. Term Identification in the Biomedical Literature [J].Journal of Biomedical Informatics,2004,37(6):512-526.
[2] Frantzi K T, Ananiadou S, Tsujii J.The C-value/NC-value Method of Automatic Recognition for Multi-word Terms.In: Proceedings of the 2nd European Conference on Research and Advanced Technology for Digital Libraries.1998:585-604.
[3] Terminology.http://en.wikipedia.org/wiki/Term_(language.
[4] 百度百科-术语. http://baike.baidu.com/view/168249.htm?fr=ala0_1.
[5] Ha L Q, Sicilia-Garcia E I, Ming J,et al. Extension of Zipf’s Law to Word and Character N-grams for English and Chinese [J].Computational Linguistics and Chinese Language Processing,2003,8(1):77-102.
[6] 张玉芳,陈小莉,熊忠阳.基于信息增益的特征词权重调整算法研究 [J]. 计算机工程与应用,2007,43(35):159-161.
[7] Frantzi K, Ananiadou S, Mima H. Automatic Recognition of Multi-Word Terms: The C-value/NC-value Method [J].International Journal on Digital Libraries, 2000,3(2):115-130.
[8] 陈琦,伍朝辉,姚芳,等.基于TF*IDF的垃圾邮件过滤特征选择改进算法 [J]. 计算机应用研究,2009,26(6):2165-2167.
[9] Sebastiani F. Machine Learning in Automated Text Categorization [J].ACM Computing Surveys,2002,34(1):1-47.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|