|
|
Study on Automatic Classification of Patents Oriented to TRIZ |
Hu Zhengyin1,2, Fang Shu1, Wen Yi1, Zhang Xian1,2, Liang Tian1 |
1. Chengdu Document and Information Center, Chinese Academy of Sciences, Chengdu 610041, China;
2. University of Chinese Academy of Sciences, Beijing 100049, China |
|
|
Abstract [Objective] This paper proposes an approach to automatically classify patents oriented to TRIZ applications based on a personalized classification system. [Methods] A personalized classification system is constructed in micro-macro-meso levels using topic model. Then, an appropriate feature and classifier are chosen to preliminarily classify patents. The classifier is optimized by smoothing unbalance data and reducing features dimensions. [Results] This approach implements semi-automatically constructing a personalized classification and automatically classifying patents oriented to TRIZ applications. In medium data size, this approach can classify patents with F-measure value of 90.2%. [Limitations] This approach is not available in small size data set and not verified in big size data set. [Conclusions] This paper can classify patents oriented to TRIZ applications in medium data size.
|
Received: 23 July 2014
Published: 12 February 2015
|
|
[1] Kaplan S. An Introduction to TRIZ: The Russian Theory of Inventive Problem Solving [EB/OL]. [2013-07-02]. http://www. trizasia.com/FileStorage/6341665956857300352005-Intro_to_TRIZ%20--%20for%20printer.pdf.
[2] Loh H T, He C, Shen L. Automatic Classification of Patent Documents for TRIZ Users [J]. World Patent Information, 2006, 28(1): 6-13.
[3] Hu Z Y, Fang S, Liang T. Automatic Patent Classification Oriented to Problems & Solutions [C]. In: Proceedings of Conference on Artificial Intelligence and Data Mining (AIDM'13), Sanya, China. 2013: 22-24.
[4] 胡正银, 方曙. 专利文本技术挖掘研究进展综述[J]. 现代图书情报技术, 2014(6): 62-70. (Hu Zhengyin, Fang Shu. Review on Text-based Patent Technology Mining [J]. New Technology of Library and Information Service, 2014(6): 62-70.)
[5] WIPO. International Patent Classification (Version 2014) [EB/OL]. [2014-06-01]. http://www.wipo.int/export/sites/www/classifications/ipc/en/guide/guide_ipc.pdf.
[6] He C, Loh H T. Pattern-oriented Associative Rule-based Patent Classification [J]. Expert Systems with Applications, 2010, 37(3): 2395-2404.
[7] 梁艳红, 檀润华, 马建红. 面向产品创新设计的专利文本分类研究[J]. 计算机集成制造系统, 2013, 19(2): 382-390. (Liang Yanhong, Tan Runhua, Ma Jianhong. Study on Patent Text Classification for Product Innovative Design [J]. Computer Integrated Manufacturing Systems, 2013, 19(2): 382-390.)
[8] Teichert T, Mittermayer M A. Text Mining for Technology Monitoring [C]. In: Proceedings of 2002 IEEE International Engineering Management (IEMC'02). IEEE, 2002: 596-601.
[9] Hu Z, Fang S, Liang T. Empirical Study of Constructing a Knowledge Organization System of Patent Documents Using Topic Modeling [J]. Scientometrics, 2014, 100 (3): 787-799.
[10] Blei D M. Probabilistic Topic Models [EB/OL]. [2013-06-12]. https://www.cs.princeton.edu/~blei/kdd-tutorial.pdf.
[11] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation [J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[12] Yates A, Cafarella M, Banko M, et al. TextRunner: Open Information Extraction on the Web [C]. In: Proceedings of NAACL-Demonstrations '07 of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. Association for Computational Linguistics, 2007: 25-26.
[13] Fader A, Soderland S, Etzioni O. Identifying Relations for Open Information Extraction [EB/OL]. [2013-03-02]. http://ai.cs.washington.edu/www/media/papers/reverb.pdf.
[14] Zhang Y, Porter A L, Hu Z, et al. "Term Clumping" for Technical Intelligence: A Case Study on Dye-sensitized Solar Cells [J]. Technological Forecasting and Social Change, 2014, 85: 26-39.
[15] Thomson Reuters. Thomson Data Analyzer [EB/OL]. [2013-03-03]. http://ip-science.thomsonreuters.com.cn/media/tda.pdf.
[16] The Stanford Natural Language Processing Group. Research [EB/OL]. [2013-03-03]. http://www-nlp.stanford.edu/research. shtml.
[17] Mimno D. Machine Learning with MALLET [EB/OL]. [2013-03- 03]. http://mallet.cs.umass.edu/mallet-tutorial.pdf.
[18] 杨建武. 文本自动分类技术 [EB/OL]. [2013-06-13]. http://www. icst.pku.edu.cn/course/mining/11-12spring/TextMining04-%E5%88%86%E7%B1%BB.pdf. (Yang Jianwu. Review on Text Classification [EB/OL]. [2013-06-13]. http://www.icst.pku.edu.cn/course/mining/11-12spring/TextMining04-%E5%88%86%E7%B1%BB.pdf. )
[19] 钱洪波, 贺广南. 非平衡类数据分类概述[J]. 计算机工程与科学, 2010, 32(5): 85-88. (Qian Hongbo, He Guangnan. A Survey of Class-imbalanced Data Classification [J]. Computer Engineering & Science, 2010, 32(5): 85-88.)
[20] Powers D M W. Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation [EB/OL]. [2013-03-03]. http://www.infoeng.flinders.edu.au/research/techreps/SIE07001.pdf. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|