English Term Extraction Based on Context Analysis & Statistical Characteristic
Xu Deshan1,2, Zhang Zhixiong1, Wang Feng3, Xing Meifeng1,2
1. National Science Library, Chinese Academy of Sciences, Beijing 100190, China;
2. Graduate University of Chinese Academy of Sciences,Beijing 100049,China;
3. National Key Laboratory for Electronic Measurement Technology, North University of China, Taiyuan 030051,China
Firstly, the article introduces the basic features of terms, and discusses the automatic identification method of scientific terms. Then V-value is proposed, which improves the two main statistical indicators:TF-IDF and C-value according to text characteristics. Different weights are also set for the candidate terms by the position to show their effect. Finally, a term extraction system is implemented based on statistics and rules. The system combines the weight, C-value and TF-IDF, so it has a higher precision of extraction.
许德山, 张智雄, 王峰, 邢美凤. 上下文分析与统计特征相结合的英文术语抽取研究[J]. 现代图书情报技术, 2010, 26(12): 28-33.
Xu Deshan, Zhang Zhixiong, Wang Feng, Xing Meifeng. English Term Extraction Based on Context Analysis & Statistical Characteristic. New Technology of Library and Information Service, 2010, 26(12): 28-33.
[1] Krauthammer M, Nenadic G. Term Identification in the Biomedical Literature [J].Journal of Biomedical Informatics,2004,37(6):512-526.
[2] Frantzi K T, Ananiadou S, Tsujii J.The C-value/NC-value Method of Automatic Recognition for Multi-word Terms.In: Proceedings of the 2nd European Conference on Research and Advanced Technology for Digital Libraries.1998:585-604.
[5] Ha L Q, Sicilia-Garcia E I, Ming J,et al. Extension of Zipf’s Law to Word and Character N-grams for English and Chinese [J].Computational Linguistics and Chinese Language Processing,2003,8(1):77-102.
[7] Frantzi K, Ananiadou S, Mima H. Automatic Recognition of Multi-Word Terms: The C-value/NC-value Method [J].International Journal on Digital Libraries, 2000,3(2):115-130.