Abstract:[Objective] In order to improve the efficiency of science and technology literature information organization and retrieval,extraction of science and technology terms is the basic research problem. [Methods] The paper proposes an automatic extraction method based on science and technology terms characteristics and statistical computing. The method fully combines language characteristics and statistical information of terms such as the combination strength between words and the position that appeared in the literature to realize automatic extraction algorithm. [Results] Experimental results show that the average accuracy of scientific terms extraction can reach 51.2%. [Limitations] Statistical computing algorithm and data processing still need further improve for the algorithm and the quality of data. [Conclusions] The proposed method is effective.
曾文,徐硕,张运良,翟娟华. 科技文献术语的自动抽取技术研究与分析*[J]. 现代图书情报技术, 2014, 30(1): 51-55.
Zeng Wen,Xu Shuo,Zhang Yunliang,Zhai Juanhua. The Research and Analysis on Automatic Extraction of Science and Technology Literature Terms. New Technology of Library and Information Service, 2014, 30(1): 51-55.
[1] Frantzi K T,Ananiadou S,Mima H.Automatic Recognition of Multi-word Terms:The C-value/NC-value Method [J]. International Journal on Digital Libraries,2000,3(2):115-130. [2]常鹏,马辉.高效的短文本主题词抽取方法[J]. 计算机工程与应用,2011,47(20):126-128,154.(Chang Peng,Ma Hui. Efficient Short Texts Keyword Extraction Method Analysis[J]. Computer Engineering and Applications,2011,47(20):126-128,154.) [3]李鹏,王斌,石志伟,等. Tag-TextRank:一种基于Tag的网页关键词抽取方法[J]. 计算机研究与发展,2012,49(11):2344-2351.(Li Peng,Wang Bin,Shi Zhiwei,et al. Tag-TextRank:A Webpage Keyword Extraction Method Based on Tags[J]. Journal of Computer Research and Development,2012,49(11):2344-2351.) [4]陈文亮,朱靖波,姚天顺,等. 基于Bootstrapping的领域词汇自动获取[C]. 见:全国第7届计算语言学联合学术会议论文集.2003:67-72.(Chen Wenliang,Zhu Jingbo,Yao Tianshun,et al. Automatic Learning Field Words by Bootstrapping[C]. In:Proceedings of the 7th Computational Linguistics in China. 2003:67-72.) [5]王裴岩,张桂平,蔡东风,等. 一种用于专利主题词抽取的模板自动生成方法[J]. 沈阳航空工业学院学报,2010,27(3):46-49.(Wang Peiyan ,Zhang Guiping,Cai Dongfeng,et al. An Automation Pattern Generation Method for Patent Topic Keyword Extraction[J]. Journal of Shenyang Institute of Aeronautical Engineering,2010,27(3):46-49.) [6]邢红兵. 信息领域汉语术语的特征及其在语料中的分布规律[J]. 术语标准化与信息技术,2000(3):17-21.(Xing Hongbing. Structural Features and Distributions of Chinese- English Terms in the Corpus from Information Field[J]. Terminology Standardization and Information Technology,2000(3):17-21.) [7]张榕. 术语定义抽取、聚类与术语识别研究[D]. 北京:北京语言大学,2006.(Zhang Rong. The Term Definition Extraction, Clustering and Terminology Recognition Research [D]. Beijing:Beijing Language and Culture University,2006. [8]国家技术监督局. 汉语叙词表编制规则GB13190-91 [S]. 北京:中国标准出版社,1992:1-17.( State Bureau of Technical Supervision. Guidelines for Establishment and Development of Chinese Thesauri[S]. Beijing:China Standards Press,1992:1-17.)