Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (10): 88-94    DOI: 10.11925/infotech.1003-3513.2015.10.12
Current Issue | Archive | Adv Search |
A Chinese Term Extraction System in New Energy Vehicles Domain
He Yu1, Lv Xueqiang1, Xu Liping2
1 Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing InformationScience & Technology University, Beijing 100101, China;
2 Beijing Research Center of Urban System Engineering, Beijing 100089, China
Download: PDF(426 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] The problem of Chinese term extraction in new energy vehicles domain is a key problem which needs a special method to improve the precision and recall rate. [Methods] This paper uses conditional random fields model as extraction model, select the word, word length, part of speech, dependencies, dictionary location, stop words and other characteristics as the feature templates. [Results] Experimental results show that the precision and recall are 93.12% and 90.47% respectively. This method improves the performance by 7.73% when compared with the baseline in terms of accuracy. [Limitations] This method can only improve part of the accuracy of the results. [Conclusions] Dependency as one of the conditional random fields model features can improve the precision and recall rate in new energy vehicles domain.

Received: 29 January 2015      Published: 06 April 2016
:  TP391.41  

Cite this article:

He Yu, Lv Xueqiang, Xu Liping. A Chinese Term Extraction System in New Energy Vehicles Domain. New Technology of Library and Information Service, 2015, 31(10): 88-94.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2015.10.12     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2015/V31/I10/88

[1] 国家知识产权局专利局专利文献部. 专利文献与信息检索[M]. 北京: 知识产权出版社, 2013. (The Patent Documentation Department of Sipo. Patent Documents and Information Retrieval [M]. Beijing: Intellectual Property Publishing House Co., Ltd., 2013.)
[2] 周浪, 史树敏, 冯冲, 等. 基于多策略融合的中文术语抽取方法[J]. 情报学报, 2010, 29(3): 460-467. (Zhou Lang, Shi Shumin, Feng Chong, et al. A Chinese Term Extraction System Based on Multi-Strategies Integration [J]. Journal of the China Society for Scientific and Technical Information, 2010, 29(3): 460-467.)
[3] 梁颖红, 张文静, 张有承. C值和互信息相结合的术语抽取[J]. 计算机应用与软件, 2010, 27(4): 108-110. (Liang Yinghong, Zhang Wenjing, Zhang Youcheng. Term Recognition Based on Integration of C-Value and Mutual Information [J]. Computer Applications and Software, 2010, 27(4): 108-110.)
[4] 屈鹏, 王惠临. 面向信息分析的专利术语抽取研究[J]. 图书情报工作, 2013, 57(1): 130-135. (Qu Peng, Wang Huilin. Patent Term Extraction for Information Analysis [J]. Library and Information Service, 2013, 57(1): 130-135.)
[5] 董丽丽, 李欢, 张翔, 等. 一种中文领域概念词自动提取方法研究[J]. 计算机工程与应用, 2014, 50(6): 127-131. (Dong Lili, Li Huan, Zhang Xiang, et al. Method for Automatic Extraction of Chinese Domain Concepts [J]. Computer Engineering and Applications, 2014, 50(6): 127-131.)
[6] 郭剑毅, 薛征山, 余正涛, 等. 基于层叠条件随机场的旅游领域命名实体识别[J]. 中文信息学报, 2009, 23(5): 47-52. (Guo Jianyi, Xue Zhengshan, Yu Zhengtao, et al. Named Entity Recognition for the Tourism Domain Based on Cascaded Conditional Random Fields [J]. Journal of Chinese Information Processing, 2009, 23(5): 47-52.)
[7] 施水才, 王锴, 韩艳铧, 等. 基于条件随机场的领域术语识别研究[J]. 计算机工程与应用, 2013, 49(10): 147-149. (Shi Shuicai, Wang Kai, Han Yanhua, et al. Terminology Recognition Based on Conditional Random Fields [J]. Computer Engineering and Applications, 2013, 49(10): 147-149.)
[8] 章成志. 基于多层术语度的一体化术语抽取研究[J]. 情报学报, 2011, 30(3): 275-285. (Zhang Chengzhi. Using Integration Strategy and Multi-level Termhood to Extract Terminology [J]. Journal of the China Society for Scientific and Technical Information, 2011, 30(3): 275-285.)
[9] 唐涛, 周俏丽, 张桂平. 统计与规则相结合的术语抽取[J]. 沈阳航空航天大学学报, 2011, 28(5): 71-74. (Tang Tao, Zhou Qiaoli, Zhang Guiping. Term Extraction Based on the Combination of Statistics and Rules [J]. Journal of Shenyang Aerospace University, 2011, 28(5): 71-74.)
[10] Lafferty J D, McCallum A, Pereira F C N. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data [C]. In: Proceedings of the 18th International Conference on Machine Learning (ICML'01). San Francisco: Morgan Kaufmann Publishers Inc., 2001: 282-289.
[11] 语言云[EB/OL]. [2014-08-25]. http://www.ltp-cloud.com/. (Language Technology Platform Cloud [EB/OL]. [2014-08- 25]. http://www.ltp-cloud.com/.) 李丽双, 党延忠, 张婧, 等. 基于条件随机场的汽车领域术语抽取[J]. 大连理工大学学报, 2013, 53(2): 267-272. (Li Lishuang, Dang Yanzhong, Zhang Jing, et al. Automotive Term Extraction Based on Conditional Random Fields [J]. Journal of Dalian University of Technology, 2013, 53(2): 267-272.)

[1] Zeng Xinhong, Cai Qinghe, Huang Huajun, Lin Weiming. Research on Non-uniform Node Clustered Graph Layout Algorithm for Visualization Based on Force Directed Model[J]. 现代图书情报技术, 2014, 30(9): 33-43.
[2] Fang Naiwei, Lv Xueqiang, Zhang Dan. Mechanical Design Image Retrieval with Combined Geometrical Features[J]. 现代图书情报技术, 2013, 29(1): 43-49.
[3] Zeng Xinhong, Cai Qinghe, Zeng Hanlong, Tang Cheng, Huang Huajun, Lin Weiming. The Research and Implementation of Clustered Graphs Layout Algorithm for OntoThesaurus Visualization[J]. 现代图书情报技术, 2012, (10): 8-15.
[4] Wang Zexian. Implement the Browser-based Slide System Using Open Source Software[J]. 现代图书情报技术, 2009, 25(6): 89-93.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn