面向TRIZ的专利自动分类研究

doi:10.11925/infotech.1003-3513.2015.01.10

现代图书情报技术

2015, Vol. 31

Issue (1): 66-74 https://doi.org/10.11925/infotech.1003-3513.2015.01.10

研究论文

本期目录 | 过刊浏览 | 高级检索

面向TRIZ的专利自动分类研究

胡正银^1,2, 方曙¹, 文奕¹, 张娴^1,2, 梁田¹

1. 中国科学院成都文献情报中心成都 610041;
2. 中国科学院大学北京 100049

Study on Automatic Classification of Patents Oriented to TRIZ

Hu Zhengyin^1,2, Fang Shu¹, Wen Yi¹, Zhang Xian^1,2, Liang Tian¹

1. Chengdu Document and Information Center, Chinese Academy of Sciences, Chengdu 610041, China;
2. University of Chinese Academy of Sciences, Beijing 100049, China

摘要
参考文献
相关文章
Metrics

全文: PDF (639 KB) HTML
输出: BibTeX | EndNote (RIS)

摘要

[目的] 通过构建个性化分类体系, 研究面向TRIZ应用的专利自动分类方法。[方法] 基于主题模型, 从宏观、中观、微观三个层面构建面向TRIZ个性化分类体系; 通过对不同分类特征项与算法进行组合, 挑选分类准确率最高的组合构建初始分类器; 采用平滑非平衡数据与特征项降维方式对分类器进行优化, 完成对专利的自动分类。[结果] 实现半自动构建面向TRIZ的个性化分类体系及基于该分类体系的专利自动分类。在中等数据量级场景下(千条), 实现专利自动分类, 分类效果综合评价指标高达90.2%。[局限] 该方法不适用于数据量较小(百条)时的专利分类; 在较大数据量(万条)场景下, 该方法的有效性尚未得到验证。[结论] 对中等规模专利数据, 能快速构建面向TRIZ的分类体系, 并实现自动分类。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	胡正银
	张娴
	梁田
	文奕
	方曙

关键词 ：发明问题解决理论, 主题模型, 专利分类, 个性化分类体系

Abstract：

[Objective] This paper proposes an approach to automatically classify patents oriented to TRIZ applications based on a personalized classification system. [Methods] A personalized classification system is constructed in micro-macro-meso levels using topic model. Then, an appropriate feature and classifier are chosen to preliminarily classify patents. The classifier is optimized by smoothing unbalance data and reducing features dimensions. [Results] This approach implements semi-automatically constructing a personalized classification and automatically classifying patents oriented to TRIZ applications. In medium data size, this approach can classify patents with F-measure value of 90.2%. [Limitations] This approach is not available in small size data set and not verified in big size data set. [Conclusions] This paper can classify patents oriented to TRIZ applications in medium data size.

Key words： TRIZ Topic model Patent classification Personalized classification system

收稿日期: 2014-07-23 出版日期: 2015-02-12

G353.1

基金资助:

本文系中国科学院知识产权专项工作项目"中国科学院知识产权信息服务"(项目编号:KFJ-EW-STS-032)和中国科学院西部之光项目"基于本体的专利文献技术挖掘系统研究与实践"的研究成果之一。

通讯作者: 胡正银,ORCID:0000-0002-5699-9891,E-mail:huzy@clas.ac.cn。 E-mail: huzy@clas.ac.cn

作者简介: 作者贡献声明: 胡正银: 文献调研, 实证分析, 论文撰写; 方曙: 研究命题的提出、设计, 论文修订; 文奕: LDA主题模型应用; 张娴: 领域词表建设, 面向TRIZ分类体系构建; 梁田: SAO数据清洗, 分类数据处理。

引用本文:

胡正银, 方曙, 文奕, 张娴, 梁田. 面向TRIZ的专利自动分类研究[J]. 现代图书情报技术, 2015, 31(1): 66-74.
Hu Zhengyin, Fang Shu, Wen Yi, Zhang Xian, Liang Tian. Study on Automatic Classification of Patents Oriented to TRIZ. New Technology of Library and Information Service, 2015, 31(1): 66-74.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2015.01.10 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2015/V31/I1/66

[1] Kaplan S. An Introduction to TRIZ: The Russian Theory of Inventive Problem Solving [EB/OL]. [2013-07-02]. http://www. trizasia.com/FileStorage/6341665956857300352005-Intro_to_TRIZ%20--%20for%20printer.pdf.
[2] Loh H T, He C, Shen L. Automatic Classification of Patent Documents for TRIZ Users [J]. World Patent Information, 2006, 28(1): 6-13.
[3] Hu Z Y, Fang S, Liang T. Automatic Patent Classification Oriented to Problems & Solutions [C]. In: Proceedings of Conference on Artificial Intelligence and Data Mining (AIDM'13), Sanya, China. 2013: 22-24.
[4] 胡正银, 方曙. 专利文本技术挖掘研究进展综述[J]. 现代图书情报技术, 2014(6): 62-70. (Hu Zhengyin, Fang Shu. Review on Text-based Patent Technology Mining [J]. New Technology of Library and Information Service, 2014(6): 62-70.)
[5] WIPO. International Patent Classification (Version 2014) [EB/OL]. [2014-06-01]. http://www.wipo.int/export/sites/www/classifications/ipc/en/guide/guide_ipc.pdf.
[6] He C, Loh H T. Pattern-oriented Associative Rule-based Patent Classification [J]. Expert Systems with Applications, 2010, 37(3): 2395-2404.
[7] 梁艳红, 檀润华, 马建红. 面向产品创新设计的专利文本分类研究[J]. 计算机集成制造系统, 2013, 19(2): 382-390. (Liang Yanhong, Tan Runhua, Ma Jianhong. Study on Patent Text Classification for Product Innovative Design [J]. Computer Integrated Manufacturing Systems, 2013, 19(2): 382-390.)
[8] Teichert T, Mittermayer M A. Text Mining for Technology Monitoring [C]. In: Proceedings of 2002 IEEE International Engineering Management (IEMC'02). IEEE, 2002: 596-601.
[9] Hu Z, Fang S, Liang T. Empirical Study of Constructing a Knowledge Organization System of Patent Documents Using Topic Modeling [J]. Scientometrics, 2014, 100 (3): 787-799.
[10] Blei D M. Probabilistic Topic Models [EB/OL]. [2013-06-12]. https://www.cs.princeton.edu/~blei/kdd-tutorial.pdf.
[11] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation [J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[12] Yates A, Cafarella M, Banko M, et al. TextRunner: Open Information Extraction on the Web [C]. In: Proceedings of NAACL-Demonstrations '07 of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. Association for Computational Linguistics, 2007: 25-26.
[13] Fader A, Soderland S, Etzioni O. Identifying Relations for Open Information Extraction [EB/OL]. [2013-03-02]. http://ai.cs.washington.edu/www/media/papers/reverb.pdf.
[14] Zhang Y, Porter A L, Hu Z, et al. "Term Clumping" for Technical Intelligence: A Case Study on Dye-sensitized Solar Cells [J]. Technological Forecasting and Social Change, 2014, 85: 26-39.
[15] Thomson Reuters. Thomson Data Analyzer [EB/OL]. [2013-03-03]. http://ip-science.thomsonreuters.com.cn/media/tda.pdf.
[16] The Stanford Natural Language Processing Group. Research [EB/OL]. [2013-03-03]. http://www-nlp.stanford.edu/research. shtml.
[17] Mimno D. Machine Learning with MALLET [EB/OL]. [2013-03- 03]. http://mallet.cs.umass.edu/mallet-tutorial.pdf.
[18] 杨建武. 文本自动分类技术 [EB/OL]. [2013-06-13]. http://www. icst.pku.edu.cn/course/mining/11-12spring/TextMining04-%E5%88%86%E7%B1%BB.pdf. (Yang Jianwu. Review on Text Classification [EB/OL]. [2013-06-13]. http://www.icst.pku.edu.cn/course/mining/11-12spring/TextMining04-%E5%88%86%E7%B1%BB.pdf. )
[19] 钱洪波, 贺广南. 非平衡类数据分类概述[J]. 计算机工程与科学, 2010, 32(5): 85-88. (Qian Hongbo, He Guangnan. A Survey of Class-imbalanced Data Classification [J]. Computer Engineering & Science, 2010, 32(5): 85-88.)
[20] Powers D M W. Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation [EB/OL]. [2013-03-03]. http://www.infoeng.flinders.edu.au/research/techreps/SIE07001.pdf.

[1]	伊惠芳,刘细文. 一种专利技术主题分析的IPC语境增强Context-LDA模型研究[J]. 数据分析与知识发现, 2021, 5(4): 25-36.
[2]	张鑫,文奕,许海云. 一种融合表示学习与主题表征的作者合作预测模型^*[J]. 数据分析与知识发现, 2021, 5(3): 88-100.
[3]	赵天资, 段亮, 岳昆, 乔少杰, 马子娟. 基于Biterm主题模型的新闻线索生成方法 ^*[J]. 数据分析与知识发现, 2021, 5(2): 1-13.
[4]	陈浩, 张梦毅, 程秀峰. *融合主题模型与决策树的跨地区专利合作关系发现与推荐^——以广东省和武汉市高校专利库为例**[J]. 数据分析与知识发现, 2021, 5(10): 37-50.
[5]	余传明,原赛,朱星宇,林虹君,张普亮,安璐. 基于深度学习的热点事件主题表示研究*[J]. 数据分析与知识发现, 2020, 4(4): 1-14.
[6]	潘有能,倪秀丽. 基于Labeled-LDA模型的在线医疗专家推荐研究*[J]. 数据分析与知识发现, 2020, 4(4): 34-43.
[7]	陈文杰. 基于翻译模型的科研合作预测研究^*[J]. 数据分析与知识发现, 2020, 4(10): 28-36.
[8]	凌洪飞,欧石燕. 面向主题模型的主题自动语义标注研究综述 ^*[J]. 数据分析与知识发现, 2019, 3(9): 16-26.
[9]	聂维民,陈永洲,马静. 融合多粒度信息的文本向量表示模型 ^*[J]. 数据分析与知识发现, 2019, 3(9): 45-52.
[10]	曾庆田,胡晓慧,李超. 融合主题词嵌入和网络结构分析的主题关键词提取方法 ^*[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[11]	余本功,陈杨楠,杨颖. 基于nBD-SVM模型的投诉短文本分类^*[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[12]	席林娜,窦永香. 基于计划行为理论的微博用户转发行为影响因素研究^*[J]. 数据分析与知识发现, 2019, 3(2): 13-20.
[13]	张杰,赵君博,翟东升,孙宁宁. 基于主题模型的微藻生物燃料产业链专利技术分析^*[J]. 数据分析与知识发现, 2019, 3(2): 52-64.
[14]	刘俊婉,龙志昕,王菲菲. 基于LDA主题模型与链路预测的新兴主题关联机会发现研究^*[J]. 数据分析与知识发现, 2019, 3(1): 104-117.
[15]	杨贵军,徐雪,赵富强. 基于XGBoost算法的用户评分预测模型及应用^*[J]. 数据分析与知识发现, 2019, 3(1): 118-126.

Viewed

Full text

Abstract

Cited

Shared

Discussed