Study on Automatic Classification of Patents Oriented to TRIZ
Hu Zhengyin1,2, Fang Shu1, Wen Yi1, Zhang Xian1,2, Liang Tian1
1. Chengdu Document and Information Center, Chinese Academy of Sciences, Chengdu 610041, China;
2. University of Chinese Academy of Sciences, Beijing 100049, China
[Objective] This paper proposes an approach to automatically classify patents oriented to TRIZ applications based on a personalized classification system. [Methods] A personalized classification system is constructed in micro-macro-meso levels using topic model. Then, an appropriate feature and classifier are chosen to preliminarily classify patents. The classifier is optimized by smoothing unbalance data and reducing features dimensions. [Results] This approach implements semi-automatically constructing a personalized classification and automatically classifying patents oriented to TRIZ applications. In medium data size, this approach can classify patents with F-measure value of 90.2%. [Limitations] This approach is not available in small size data set and not verified in big size data set. [Conclusions] This paper can classify patents oriented to TRIZ applications in medium data size.
胡正银, 方曙, 文奕, 张娴, 梁田. 面向TRIZ的专利自动分类研究[J]. 现代图书情报技术, 2015, 31(1): 66-74.
Hu Zhengyin, Fang Shu, Wen Yi, Zhang Xian, Liang Tian. Study on Automatic Classification of Patents Oriented to TRIZ. New Technology of Library and Information Service, 2015, 31(1): 66-74.
[1] Kaplan S. An Introduction to TRIZ: The Russian Theory of Inventive Problem Solving [EB/OL]. [2013-07-02]. http://www. trizasia.com/FileStorage/6341665956857300352005-Intro_to_TRIZ%20--%20for%20printer.pdf.
[2] Loh H T, He C, Shen L. Automatic Classification of Patent Documents for TRIZ Users [J]. World Patent Information, 2006, 28(1): 6-13.
[3] Hu Z Y, Fang S, Liang T. Automatic Patent Classification Oriented to Problems & Solutions [C]. In: Proceedings of Conference on Artificial Intelligence and Data Mining (AIDM'13), Sanya, China. 2013: 22-24.
[4] 胡正银, 方曙. 专利文本技术挖掘研究进展综述[J]. 现代图书情报技术, 2014(6): 62-70. (Hu Zhengyin, Fang Shu. Review on Text-based Patent Technology Mining [J]. New Technology of Library and Information Service, 2014(6): 62-70.)
[5] WIPO. International Patent Classification (Version 2014) [EB/OL]. [2014-06-01]. http://www.wipo.int/export/sites/www/classifications/ipc/en/guide/guide_ipc.pdf.
[6] He C, Loh H T. Pattern-oriented Associative Rule-based Patent Classification [J]. Expert Systems with Applications, 2010, 37(3): 2395-2404.
[7] 梁艳红, 檀润华, 马建红. 面向产品创新设计的专利文本分类研究[J]. 计算机集成制造系统, 2013, 19(2): 382-390. (Liang Yanhong, Tan Runhua, Ma Jianhong. Study on Patent Text Classification for Product Innovative Design [J]. Computer Integrated Manufacturing Systems, 2013, 19(2): 382-390.)
[8] Teichert T, Mittermayer M A. Text Mining for Technology Monitoring [C]. In: Proceedings of 2002 IEEE International Engineering Management (IEMC'02). IEEE, 2002: 596-601.
[9] Hu Z, Fang S, Liang T. Empirical Study of Constructing a Knowledge Organization System of Patent Documents Using Topic Modeling [J]. Scientometrics, 2014, 100 (3): 787-799.
[10] Blei D M. Probabilistic Topic Models [EB/OL]. [2013-06-12]. https://www.cs.princeton.edu/~blei/kdd-tutorial.pdf.
[11] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation [J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[12] Yates A, Cafarella M, Banko M, et al. TextRunner: Open Information Extraction on the Web [C]. In: Proceedings of NAACL-Demonstrations '07 of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. Association for Computational Linguistics, 2007: 25-26.
[13] Fader A, Soderland S, Etzioni O. Identifying Relations for Open Information Extraction [EB/OL]. [2013-03-02]. http://ai.cs.washington.edu/www/media/papers/reverb.pdf.
[14] Zhang Y, Porter A L, Hu Z, et al. "Term Clumping" for Technical Intelligence: A Case Study on Dye-sensitized Solar Cells [J]. Technological Forecasting and Social Change, 2014, 85: 26-39.
[15] Thomson Reuters. Thomson Data Analyzer [EB/OL]. [2013-03-03]. http://ip-science.thomsonreuters.com.cn/media/tda.pdf.
[16] The Stanford Natural Language Processing Group. Research [EB/OL]. [2013-03-03]. http://www-nlp.stanford.edu/research. shtml.
[17] Mimno D. Machine Learning with MALLET [EB/OL]. [2013-03- 03]. http://mallet.cs.umass.edu/mallet-tutorial.pdf.
[18] 杨建武. 文本自动分类技术 [EB/OL]. [2013-06-13]. http://www. icst.pku.edu.cn/course/mining/11-12spring/TextMining04-%E5%88%86%E7%B1%BB.pdf. (Yang Jianwu. Review on Text Classification [EB/OL]. [2013-06-13]. http://www.icst.pku.edu.cn/course/mining/11-12spring/TextMining04-%E5%88%86%E7%B1%BB.pdf. )
[19] 钱洪波, 贺广南. 非平衡类数据分类概述[J]. 计算机工程与科学, 2010, 32(5): 85-88. (Qian Hongbo, He Guangnan. A Survey of Class-imbalanced Data Classification [J]. Computer Engineering & Science, 2010, 32(5): 85-88.)
[20] Powers D M W. Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation [EB/OL]. [2013-03-03]. http://www.infoeng.flinders.edu.au/research/techreps/SIE07001.pdf.