新能源汽车领域中文术语抽取方法

doi:10.11925/infotech.1003-3513.2015.10.12

现代图书情报技术

2015, Vol. 31

Issue (10): 88-94 https://doi.org/10.11925/infotech.1003-3513.2015.10.12

应用论文

本期目录 | 过刊浏览 | 高级检索

新能源汽车领域中文术语抽取方法

何宇¹, 吕学强¹, 徐丽萍²

1 北京信息科技大学网络文化与数字传播北京市重点实验室北京 100101;
2 北京城市系统工程研究中心北京 100089

A Chinese Term Extraction System in New Energy Vehicles Domain

He Yu¹, Lv Xueqiang¹, Xu Liping²

1 Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing InformationScience & Technology University, Beijing 100101, China;
2 Beijing Research Center of Urban System Engineering, Beijing 100089, China

摘要
参考文献
相关文章
Metrics

全文: PDF (426 KB) HTML
输出: BibTeX | EndNote (RIS)

摘要

[目的] 为提高新能源汽车领域中文术语抽取结果的准确率和召回率, 提出一种适合该领域的术语抽取方法。[方法] 在总结前人工作基础上, 提出利用条件随机场模型作为抽取模型, 选取词、词长、词性、依存关系、词典位置、停用词等特征作为特征模板。[结果] 实验结果正确率为93.12%, 召回率为90.47%。正确率比Baseline方法提高7.73%。[局限] 该方法只提高较短术语抽取结果的正确率。[结论] 依存关系作为条件随机场模型的一项特征可以提高新能源汽车领域中文术语抽取结果的正确率和召回率。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章

Abstract：

[Objective] The problem of Chinese term extraction in new energy vehicles domain is a key problem which needs a special method to improve the precision and recall rate. [Methods] This paper uses conditional random fields model as extraction model, select the word, word length, part of speech, dependencies, dictionary location, stop words and other characteristics as the feature templates. [Results] Experimental results show that the precision and recall are 93.12% and 90.47% respectively. This method improves the performance by 7.73% when compared with the baseline in terms of accuracy. [Limitations] This method can only improve part of the accuracy of the results. [Conclusions] Dependency as one of the conditional random fields model features can improve the precision and recall rate in new energy vehicles domain.

收稿日期: 2015-01-29 出版日期: 2016-04-06

TP391.41

基金资助:

本文系国家自然科学基金项目“基于本体的专利自动标引研究”(项目编号: 61271304)、北京市教委科技发展计划重点项目暨北京市自然科学基金B类重点项目“面向领域的互联网多模态信息精准搜索方法研究”(项目编号: KZ201311232037)和北京市科学技术研究院科技创新工程项目“基于CGE-TIMES模型的交通对大气环境综合影响评价方法研究”(项目编号: PXM2015_178215_000008)的研究成果之一。

通讯作者: 何宇, ORICD: 0002-8314-5525, E-mail: solocode@sina.com。 E-mail: solocode@sina.com

作者简介: 作者贡献声明:吕学强: 提出研究思路, 设计研究方案; 何宇: 研究过程的实施, 包括获取数据, 进行实验, 起草论文; 徐丽萍: 论文最终版本修订。

引用本文:

何宇, 吕学强, 徐丽萍. 新能源汽车领域中文术语抽取方法[J]. 现代图书情报技术, 2015, 31(10): 88-94.
He Yu, Lv Xueqiang, Xu Liping. A Chinese Term Extraction System in New Energy Vehicles Domain. New Technology of Library and Information Service, 2015, 31(10): 88-94.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2015.10.12 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2015/V31/I10/88

[1] 国家知识产权局专利局专利文献部. 专利文献与信息检索[M]. 北京: 知识产权出版社, 2013. (The Patent Documentation Department of Sipo. Patent Documents and Information Retrieval [M]. Beijing: Intellectual Property Publishing House Co., Ltd., 2013.)
[2] 周浪, 史树敏, 冯冲, 等. 基于多策略融合的中文术语抽取方法[J]. 情报学报, 2010, 29(3): 460-467. (Zhou Lang, Shi Shumin, Feng Chong, et al. A Chinese Term Extraction System Based on Multi-Strategies Integration [J]. Journal of the China Society for Scientific and Technical Information, 2010, 29(3): 460-467.)
[3] 梁颖红, 张文静, 张有承. C值和互信息相结合的术语抽取[J]. 计算机应用与软件, 2010, 27(4): 108-110. (Liang Yinghong, Zhang Wenjing, Zhang Youcheng. Term Recognition Based on Integration of C-Value and Mutual Information [J]. Computer Applications and Software, 2010, 27(4): 108-110.)
[4] 屈鹏, 王惠临. 面向信息分析的专利术语抽取研究[J]. 图书情报工作, 2013, 57(1): 130-135. (Qu Peng, Wang Huilin. Patent Term Extraction for Information Analysis [J]. Library and Information Service, 2013, 57(1): 130-135.)
[5] 董丽丽, 李欢, 张翔, 等. 一种中文领域概念词自动提取方法研究[J]. 计算机工程与应用, 2014, 50(6): 127-131. (Dong Lili, Li Huan, Zhang Xiang, et al. Method for Automatic Extraction of Chinese Domain Concepts [J]. Computer Engineering and Applications, 2014, 50(6): 127-131.)
[6] 郭剑毅, 薛征山, 余正涛, 等. 基于层叠条件随机场的旅游领域命名实体识别[J]. 中文信息学报, 2009, 23(5): 47-52. (Guo Jianyi, Xue Zhengshan, Yu Zhengtao, et al. Named Entity Recognition for the Tourism Domain Based on Cascaded Conditional Random Fields [J]. Journal of Chinese Information Processing, 2009, 23(5): 47-52.)
[7] 施水才, 王锴, 韩艳铧, 等. 基于条件随机场的领域术语识别研究[J]. 计算机工程与应用, 2013, 49(10): 147-149. (Shi Shuicai, Wang Kai, Han Yanhua, et al. Terminology Recognition Based on Conditional Random Fields [J]. Computer Engineering and Applications, 2013, 49(10): 147-149.)
[8] 章成志. 基于多层术语度的一体化术语抽取研究[J]. 情报学报, 2011, 30(3): 275-285. (Zhang Chengzhi. Using Integration Strategy and Multi-level Termhood to Extract Terminology [J]. Journal of the China Society for Scientific and Technical Information, 2011, 30(3): 275-285.)
[9] 唐涛, 周俏丽, 张桂平. 统计与规则相结合的术语抽取[J]. 沈阳航空航天大学学报, 2011, 28(5): 71-74. (Tang Tao, Zhou Qiaoli, Zhang Guiping. Term Extraction Based on the Combination of Statistics and Rules [J]. Journal of Shenyang Aerospace University, 2011, 28(5): 71-74.)
[10] Lafferty J D, McCallum A, Pereira F C N. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data [C]. In: Proceedings of the 18th International Conference on Machine Learning (ICML'01). San Francisco: Morgan Kaufmann Publishers Inc., 2001: 282-289.
[11] 语言云[EB/OL]. [2014-08-25]. http://www.ltp-cloud.com/. (Language Technology Platform Cloud [EB/OL]. [2014-08- 25]. http://www.ltp-cloud.com/.) 李丽双, 党延忠, 张婧, 等. 基于条件随机场的汽车领域术语抽取[J]. 大连理工大学学报, 2013, 53(2): 267-272. (Li Lishuang, Dang Yanzhong, Zhang Jing, et al. Automotive Term Extraction Based on Conditional Random Fields [J]. Journal of Dalian University of Technology, 2013, 53(2): 267-272.)

[1]	曾新红, 蔡庆河, 黄华军, 林伟明. 基于力导向模型的非一致节点群组布局可视化算法研究[J]. 现代图书情报技术, 2014, 30(9): 33-43.
[2]	方乃伟, 吕学强, 张丹. 机械设计图像几何特征组合检索研究[J]. 现代图书情报技术, 2013, 29(1): 43-49.
[3]	曾新红, 蔡庆河, 曾汉龙, 唐铖, 黄华军, 林伟明. 中文叙词表本体可视化群组布局算法研究与实现[J]. 现代图书情报技术, 2012, (10): 8-15.
[4]	王泽贤. 利用开源软件实现基于浏览器的幻灯片系统*[J]. 现代图书情报技术, 2009, 25(6): 89-93.

Viewed

Full text

Abstract

Cited

Shared

Discussed