Please wait a minute...
Advanced Search
现代图书情报技术  2015, Vol. 31 Issue (3): 26-32    DOI: 10.11925/infotech.1003-3513.2015.03.04
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
带权复杂图模型的专利关键词标引研究
李军锋, 吕学强, 周绍钧
北京信息科技大学网络文化与数字传播北京市重点实验室 北京 100101
Patent Keyword Indexing Based on Weighted Complex Graph Model
Li Junfeng, Lv Xueqiang, Zhou Shaojun
Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science & Technology University, Beijing 100101, China
全文: PDF(477 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的]专利关键词标引是中文信息处理领域较为基础的环节, 在专利检索、专利翻译以及专利自动摘要中具有较高的应用价值。[方法]采用K-最邻近耦合图将专利文献映射成复杂网络图模型, 结合平均路径变化量、平均聚类系数变化量以及当前节点对整个复杂图模型流动性的影响, 提出平均连通权重评价指标。分析关键词位置信息、关键词跨度信息以及关键词逆文档频率信息, 提出专利综合相关特征衡量关键词的重要性。[结果]在传感器领域专利文献的实验结果中, Top-8级别上准确率达到60.9%, Top-10级别上召回率达到73.4%。[局限]对低频关键词的处理效果不够理想, 影响了标引效果。[结论]实验结果表明该方法的有效性, 对专利标引具有积极意义。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
李军锋
吕学强
周绍钧
关键词 复杂图模型拓扑势关键词标引平均连通权重综合相关特征    
Abstract

[Objective] Patent keyword indexing plays an important role in nature language processing and is widely applied in many fields, such as patent retrieval, translation and automatic summary. [Methods] Using K-proximity coupled graph to transfer patents into complex graph model, and average connectivity weight is proposed with the average path variation, the average clustering coefficient, and the current node's liquidity effect. Considering the location information, the word-gap information and the inverse document frequency of keywords, a patent comprehensive correlation calculation method for quantitative analysis of keyword importance is proposed. [Results] Experiment of patent literatures in sensor domain obtains the precision of 60.9% on top-8, and the recall rate of 73.4% on top-10. [Limitations] The result of keywords with low frequency is not good enough, which affects the indexing result. [Conclusions] Experimental results show that this method is effective and has active significance for patent indexing.

Key wordsComplex graph model    Topology potential    Keyword indexing    Average connectivity weight    Comprehensive correlation
收稿日期: 2014-08-13     
:  TP391.1  
基金资助:

本文系国家自然科学基金项目"基于本体的专利自动标引研究"(项目编号: 61271304)、北京市教委科技发展计划重点项目暨北京市自然科学基金B类重点项目"面向领域的互联网多模态信息精准搜索方法研究"(项目编号: KZ201311232037)和北京市属高等学校创新团队建设与教师职业发展计划项目"大数据内容理解的理论基础及智能化处理技术"(项目编号: IDHT20130519)的研究成果之一。

通讯作者: 李军锋, ORCID: 0000-0002-6561-1043, E-mail: lijunfeng1990@live.cn。     E-mail: lijunfeng1990@live.cn
作者简介: 作者贡献声明: 吕学强:提出研究命题;周绍钧:采集、清洗和分析数据,设计研究方案;李军锋:设计研究方案,进行实验,论文起草及最终版本修订。
引用本文:   
李军锋, 吕学强, 周绍钧. 带权复杂图模型的专利关键词标引研究[J]. 现代图书情报技术, 2015, 31(3): 26-32.
Li Junfeng, Lv Xueqiang, Zhou Shaojun. Patent Keyword Indexing Based on Weighted Complex Graph Model. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2015.03.04.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2015.03.04

[1] 张静. 自动标引技术的回顾与展望[J]. 现代情报, 2009, 29(4): 221-225. (Zhang Jing. Review and Prospect of Automatic Indexing [J]. Journal of Modern Information, 2009, 29(4): 221-225.)
[2] Fujii A, Utiyama M, Yamamoto M, et al. Overview of the Patent Translation Task at the NTCIR-7 Workshop [C]. In: Proceedings of the 7th NII Testbeds and Community for Information Access Research Workshop Meeting, Tokyo, Japan. Tokyo: National Institude of Informatics, 2008: 389-400.
[3] Wartena C, Brussee R, Slakhorst W. Keyword Extraction Using Word Co-occurrence[C]. In: Proceedings of 2010 Workshop on Database and Expert Systems Applications (DEXA), Bilbao, Spain. New York, USA: IEEE, 2010: 54-58.
[4] 罗准辰, 王挺. 基于分离模型的中文关键词提取算法研究[J]. 中文信息学报, 2009, 23(1): 63-70. (Luo Zhunchen, Wang Ting. Research on the Chinese Keyword Extraction Algorithm Based on Separate Models [J]. Journal of Chinese Information Processing, 2009, 23(1): 63-70.)
[5] 索红光, 刘玉树, 曹淑英. 一种基于词汇链的关键词抽取方法[J]. 中文信息学报, 2006, 20(6): 25-30. (Suo Hongguang, Liu Yushu, Cao Shuying. A Keyword Selection Method Based on Lexical Chains [J]. Journal of Chinese Information Processing, 2006, 20(6): 25-30.)
[6] Noh Y, Son J W, Park S B. Keyword Extraction from Dialogue Sentences Using Semantic and Topical Relatedness [C]. In: Proceedings of the 20th International Conference on Neural Information Processing, Daegu, Korea. Berlin: Springer-Verlag, 2013: 129-136.
[7] 章成志. 基于集成学习的自动标引方法研究[J]. 中国索引, 2009, 7(2): 16-23. (Zhang Chengzhi. Automatic Indexing Method Based on Ensemble Learning [J]. Journal of the China Society of Indexers, 2009, 7(2): 16-23.)
[8] Chen X, Peng Z, Zeng C. A Co-training Based Method for Chinese Patent Semantic Annotation[C]. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. New York, USA: ACM, 2012: 2379-2382.
[9] 马力, 焦李成, 白琳, 等. 基于小世界模型的复合关键词提取方法研究[J]. 中文信息学报, 2009, 23(3): 121-128. (Ma Li, Jiao Licheng, Bai Lin, et al. Research on a Compound Keywords Detection Method Based on Small World Model [J]. Journal of Chinese Information Processing, 2009, 23(3): 121-128.)
[10] 翟周伟, 刘刚, 吕玉琴. 基于图模型的关键词挖掘方法[J]. 软件, 2012, 33(8): 9-13. (Zhai Zhouwei, Liu Gang, Lv Yuqin. Keywords Mining Method Based on Graph Model [J]. Software, 2012, 33(8): 9-13.)
[11] 夏天. 词语位置加权TextRank的关键词抽取研究[J]. 现代图书情报技术, 2013(9): 30-34. (Xia Tian. Study on Keyword Extraction Using Word Position Weight TextRank [J]. New Technology of Library and Information Service, 2013(9): 30-34.)
[12] Wang S, Hauskrecht M. Keyword Annotationof Biomedical Documents with Graph-based Similarity Methods [C]. In:Proceedings of the 2012 IEEE International Conference on BioInformatics and BioMedicine (BIBM), Philadelphia, PA, USA. IEEE, 2012: 1-4.
[13] 于少然. 网络拓扑结构中节点重要性评价方法的研究[D]. 北京: 北京交通大学, 2012. (Yu Shaoran. The Research of Node Importance Analysis in the Networks Topology [D]. Beijing: Beijing Jiaotong University, 2012.)
[14] Yang Y, Zhao T, Lu Q, et al. Chinese Term Extraction Using Different Types of Relevance [C]. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, Suntec, Singapore. Philadelphia, PA, USA: Association for Computational Linguistics, 2009: 213-216.
[15] Ventura J A L, Jonquet C, Roche M, et al. Combining C-value and Keyword Extraction Methods for Biomedical Terms Ex­traction [C]. In: Proceedings of the 5th International Sy­m­posium on Languages in Biology and Medicine, Tokyo, Japan. Database Center for Life Science Technology, 2013: 45-49.

[1] 许德山, 李辉, 张运良. 文献关键词链接标引方法研究[J]. 现代图书情报技术, 2015, 31(9): 31-37.
[2] 王昊, 邹杰利, 邓三鸿. 面向中文图书的自动标引模型构建及实验分析[J]. 现代图书情报技术, 2013, 29(7/8): 55-62.
[3] 刘华. 关键词自动标引系统实现[J]. 现代图书情报技术, 2006, 1(2): 88-90.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn