Please wait a minute...
New Technology of Library and Information Service  2010, Vol. 26 Issue (12): 28-33    DOI: 10.11925/infotech.1003-3513.2010.12.05
article Current Issue | Archive | Adv Search |
English Term Extraction Based on Context Analysis & Statistical Characteristic
Xu Deshan1,2, Zhang Zhixiong1, Wang Feng3, Xing Meifeng1,2
1. National Science Library, Chinese Academy of Sciences, Beijing 100190, China;
2. Graduate University of Chinese Academy of Sciences,Beijing 100049,China;
3. National Key Laboratory for Electronic Measurement Technology, North University of China, Taiyuan 030051,China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

Firstly, the article introduces the basic features of terms, and discusses the automatic identification method of scientific terms. Then V-value is proposed, which improves the two main statistical indicators:TF-IDF and C-value according to text characteristics. Different weights are also set for the candidate terms by the position to show their effect. Finally, a term extraction system is implemented based on statistics and rules. The system combines the weight, C-value and TF-IDF, so it has a higher precision of extraction.

Key wordsTerm      extraction      Multi-word      recognition      Weighted      TF-IDF      C-value      computing     
Received: 30 September 2010      Published: 07 January 2011
: 

TP391

 

Cite this article:

Xu Deshan, Zhang Zhixiong, Wang Feng, Xing Meifeng. English Term Extraction Based on Context Analysis & Statistical Characteristic. New Technology of Library and Information Service, 2010, 26(12): 28-33.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2010.12.05     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2010/V26/I12/28


[1] Krauthammer M, Nenadic G. Term Identification in the Biomedical Literature
[J].Journal of Biomedical Informatics,2004,37(6):512-526.

[2] Frantzi K T, Ananiadou S, Tsujii J.The C-value/NC-value Method of Automatic Recognition for Multi-word Terms.In: Proceedings of the 2nd European Conference on Research and Advanced Technology for Digital Libraries.1998:585-604.

[3] Terminology.http://en.wikipedia.org/wiki/Term_(language.

[4] 百度百科-术语. http://baike.baidu.com/view/168249.htm?fr=ala0_1.

[5] Ha L Q, Sicilia-Garcia E I, Ming J,et al. Extension of Zipf’s Law to Word and Character N-grams for English and Chinese
[J].Computational Linguistics and Chinese Language Processing,2003,8(1):77-102.

[6] 张玉芳,陈小莉,熊忠阳.基于信息增益的特征词权重调整算法研究
[J]. 计算机工程与应用,2007,43(35):159-161.

[7] Frantzi K, Ananiadou S, Mima H. Automatic Recognition of Multi-Word Terms: The C-value/NC-value Method
[J].International Journal on Digital Libraries, 2000,3(2):115-130.

[8] 陈琦,伍朝辉,姚芳,等.基于TF*IDF的垃圾邮件过滤特征选择改进算法
[J]. 计算机应用研究,2009,26(6):2165-2167.

[9] Sebastiani F. Machine Learning in Automated Text Categorization
[J].ACM Computing Surveys,2002,34(1):1-47.

[1] Wang Yifan,Li Bo,Shi Hua,Miao Wei,Jiang Bin. Annotation Method for Extracting Entity Relationship from Ancient Chinese Works[J]. 数据分析与知识发现, 2021, 5(9): 63-74.
[2] Ma Jiangwei, Lv Xueqiang, You Xindong, Xiao Gang, Han Junmei. Extracting Relationship Among Military Domains with BERT and Relation Position Features[J]. 数据分析与知识发现, 2021, 5(8): 1-12.
[3] Han Hui, Liu Xiuwen. Automatic Scoring for Subjective Questions in Maritime Competency Assessment[J]. 数据分析与知识发现, 2021, 5(8): 113-121.
[4] Chai Qingfeng, Shi Linyan, Mei Shan, Xiong Haitao, He Huixin. Extracting Knowledge Elements of Sci-Tech Literature Based on Artificial and Machine Features[J]. 数据分析与知识发现, 2021, 5(8): 132-144.
[5] Tan Ying, Tang Yifei. Extracting Citation Contents with Coreference Resolution[J]. 数据分析与知识发现, 2021, 5(8): 25-33.
[6] Zhang Jiandong, Chen Shiji, Xu Xiaoting, Zuo Wenge. Extracting PDF Tables Based on Word Vectors[J]. 数据分析与知识发现, 2021, 5(8): 34-44.
[7] Chen Xingyue, Ni Liping, Ni Zhiwei. Extracting Financial Events with ELECTRA and Part-of-Speech[J]. 数据分析与知识发现, 2021, 5(7): 36-47.
[8] Yu Xuehan, He Lin, Xu Jian. Extracting Events from Ancient Books Based on RoBERTa-CRF[J]. 数据分析与知识发现, 2021, 5(7): 26-35.
[9] Zhao Danning,Mu Dongmei,Bai Sen. Automatically Extracting Structural Elements of Sci-Tech Literature Abstracts Based on Deep Learning[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
[10] Cao Rui,Liao Bin,Li Min,Sun Ruina. Predicting Prices and Analyzing Features of Online Short-Term Rentals Based on XGBoost[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[11] Yan Qiang,Zhang Xiaoyan,Zhou Simin. Extracting Keywords Based on Sememe Similarity[J]. 数据分析与知识发现, 2021, 5(4): 80-89.
[12] Shi Xiang,Liu Ping. Extraction and Representation of Domain Knowledge with Semantic Description Model and Knowledge Elements——Case Study of Information Retrieval[J]. 数据分析与知识发现, 2021, 5(4): 123-133.
[13] Cheng Bin,Shi Shuicai,Du Yuncheng,Xiao Shibin. Keyword Extraction for Journals Based on Part-of-Speech and BiLSTM-CRF Combined Model[J]. 数据分析与知识发现, 2021, 5(3): 101-108.
[14] Hu Shaohu,Zhang Yingyi,Zhang Chengzhi. Review of Keyword Extraction Studies[J]. 数据分析与知识发现, 2021, 5(3): 45-59.
[15] Wang Hongbin,Wang Jianxiong,Zhang Yafei,Yang Heng. Topic Recognition of News Reports with Imbalanced Contents[J]. 数据分析与知识发现, 2021, 5(3): 109-120.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn