Please wait a minute...
New Technology of Library and Information Service  2013, Vol. 29 Issue (3): 38-44    DOI: 10.11925/infotech.1003-3513.2013.03.07
Current Issue | Archive | Adv Search |
Fundamental Research Questions in Patent Text Categorization
Qu Peng, Wang Huilin
Institute of Scientific & Technical Information of China, Beijing 100038, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  The paper focuses on some fundamental problems in patent text categorization, including the feasibility of using terms for automatic categorization, the research on claim categorization, and the effect of classes with close-related topics on the categorization result. The research is executed on two Naive Bayesian classifiers, kNN, Racchio and SVM classifier, and cross validation is used for testing. The results of the paper are that terms are better than common features under the same settings, that training a classifier with abstracts can improve the claim categorization results, and that classes with close-related topics result in low precision and hierarchical design of classifier is necessary, correspondingly. The paper provides fundamental data for patent text categorization and can be referred by information analysis and other applications using patents.
Key wordsPatent      Text categorization      Text mining     
Received: 08 March 2013      Published: 14 May 2013
:  G353.1  

Cite this article:

Qu Peng, Wang Huilin. Fundamental Research Questions in Patent Text Categorization. New Technology of Library and Information Service, 2013, 29(3): 38-44.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2013.03.07     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2013/V29/I3/38

[1] 李程雄, 丁月华, 文贵华. SVM-KNN组合改进算法在专利文本分类中的应用[J]. 计算机工程与应用 , 2006, 42(20): 193-195. (Li Chengxiong, Ding Yuehua, Wen Guihua. Application of SVM-kNN Combination Improvement Algorithm on Patent Text Classification[J]. Computer Engineering and Applications, 2006, 42(20): 193-195.)
[2] 丁月华, 文贵华, 郭炜强. 基于核向量空间模型的专利分类[J]. 华南理工大学学报:自然科学版 , 2005, 33(8): 58-61. (Ding Yuehua, Wen Guihua, Guo Weiqiang. Patent Categorization Based on Kernel Vector Space Model[J]. Journal of South China University of Technology: Natural Science Edition, 2005, 33(8): 58-61.)
[3] 郭炜强, 文军, 文贵华. 基于贝叶斯模型的专利分类[J]. 计算机工程与设计 , 2005, 26(8): 1986-1987,1996. (Guo Weiqiang, Wen Jun, Wen Guihua. Patent Categorization Based on Bayes Model[J]. Computer Engineering and Design, 2005, 26(8): 1986-1987,1996.)
[4] 蒋健安, 陆介平, 倪巍伟, 等. 一种面向专利文献数据的文本自动分类方法[J]. 计算机应用 , 2008, 28(1): 159-161. (Jiang Jian’an, Lu Jieping, Ni Weiwei, et al. Automatic Text Categorization for Patent Data[J]. Journal of Computer Applications, 2008, 28(1): 159-161.)
[5] 李生珍, 王建新, 齐建东, 等. 基于BP神经网络的专利自动分类法[J]. 计算机工程与设计 , 2010, 31(23): 5075-5078. (Li Shengzhen, Wang Jianxin, Qi Jiandong, et al. Automated Categorization of Patent Based on Back-propagation Network [J]. Computer Engineering and Design, 2010, 31(23): 5075-5078.)
[6] 季铎, 蔡云雷, 蔡东风, 等. 基于共享最近邻的专利自动分类技术研究[J]. 沈阳航空工业学院学报 , 2010, 27(4): 41-46. (Ji Duo, Cai Yunlei, Cai Dongfeng, et al. Patent Automatic Classification Research Based on Shared Nearest Neighbor [J]. Journal of Shenyang Institute of Aeronautical Engineering, 2010, 27(4): 41-46.)
[7] 褚晓雷. 基于机器学习的专利分类研究[D]. 上海: 上海交通大学, 2008. (Chu Xiaolei. Machine Learning Based Patent Categorization[D]. Shanghai: Shanghai Jiaotong University, 2008.)
[8] 叶志飞. 并行化最小最大模块化支撑向量机及其在专利分类中的应用[D]. 上海: 上海交通大学, 2009. (Ye Zhifei. Parallel Min-Max Modular Support Vector Machine with Application to Patent Classification[D]. Shanghai: Shanghai Jiaotong University, 2009.)
[9] Li Y Y,Bontcheva K,Cunningham H. SVM Based Learning System for F-term Patent Classification[C]. In: Proceedings of the 6th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and CrossLingual Information Access. 2007.
[10] Fall C J, Törcsvári A, Benzineb K, et al. Automated Categorization in the International Patent Classification [J/OL]. ACM SIGIR Forum,2003,37(1):10-25. [2013-03-07]. http://www.sigir.org/forum/S2003/CJF_ Manuscript_sigir.pdf.
[11] Lai K K, Wu S J. Using the Patent Co-citation Approach to Establish a New Patent Classification System[J]. Information Processing and Management, 2005, 41(2): 313-330.
[12] Li X, Chen H, Zhang Z, et al. Automatic Patent Classification Using Citation Network Information: An Experimental Study in Nanotechnology[C].In: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries. New York: ACM, 2007: 419-427.
[13] Porter M. The Porter Stemming Algorithm [EB/OL] . (2006-01-01). [2013-03-07]. http://tartarus.org/ ~martin/PorterStemmer/.
[14] 屈鹏, 王惠临. 面向信息分析的专利术语抽取研究[J]. 图书情报工作 , 2013, 57(1): 130-135. (Qu Peng, Wang Huilin. Patent Term Extraction for Information Analysis[J]. Library and Information Services, 2013, 57(1): 130-135.)
[15] Joachims T. Making Large-scale SVM Learning Practical[A] // Schölkopf B, Burges C, Smola A, eds. Advances in Kernel Methods-Support Vector Learning[M]. Cambridge, MA: MIT Press, 1999.
[1] Zhang Le, Leng Jidong, Lv Xueqiang, Cui Zhuo, Wang Lei, You Xindong. RLCPAR: A Rewriting Model for Chinese Patent Abstracts Based on Reinforcement Learning[J]. 数据分析与知识发现, 2021, 5(7): 59-69.
[2] Huang Mingxuan,Jiang Caoqing,Lu Shoudong. Expanding Queries Based on Word Embedding and Expansion Terms[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[3] Gao Yilin,Min Chao. Comparing Technology Diffusion Structure of China and the U.S. to Countries Along the Belt and Road[J]. 数据分析与知识发现, 2021, 5(6): 80-92.
[4] Xu Guang,Ren Ming,Song Chengyu. Extracting China’s Economic Image from Western News[J]. 数据分析与知识发现, 2021, 5(5): 30-40.
[5] Dai Bing,Hu Zhengyin. Review of Studies on Literature-Based Discovery[J]. 数据分析与知识发现, 2021, 5(4): 1-12.
[6] Lv Xueqiang,Luo Yixiong,Li Jiaquan,You Xindong. Review of Studies on Detecting Chinese Patent Infringements[J]. 数据分析与知识发现, 2021, 5(3): 60-68.
[7] Chen Hao, Zhang Mengyi, Cheng Xiufeng. Identifying Cross-Region Patent Collaboration Opportunities Using LDA and Decision Trees——Case Study of Universities from Guangdong and Wuhan[J]. 数据分析与知识发现, 2021, 5(10): 37-50.
[8] Guan Peng,Wang Yuefen,Jin Jialin,Fu Zhu. Developments of Tech-Innovation Network for Patent Cooperation: Case Study of Speech Recognition in China[J]. 数据分析与知识发现, 2021, 5(1): 112-127.
[9] Yu Chuanming, Wang Manyi, Lin Hongjun, Zhu Xingyu, Huang Tingting, An Lu. A Comparative Study of Word Representation Models Based on Deep Learning[J]. 数据分析与知识发现, 2020, 4(8): 28-40.
[10] Xia Tian. Extracting Key-phrases from Chinese Scholarly Papers[J]. 数据分析与知识发现, 2020, 4(7): 76-86.
[11] Hu Yongjun,Wei Tingting,Dou Zixin,Huang Yunyin,Liang Ruicheng,Chang Huiyou. Tech-Development Path of Knife-Scissor Industry in Guangdong with TRIZ Analysis of Patents[J]. 数据分析与知识发现, 2020, 4(2/3): 101-109.
[12] Zhang Jinzhu,Zhu Lipeng,Liu Jingjie. Unsupervised Cross-Language Model for Patent Recommendation Based on Representation[J]. 数据分析与知识发现, 2020, 4(10): 93-103.
[13] Li Jiaquan,Li Baoan,You Xindong,Lü Xueqiang. Computing Similarity of Patent Terms Based on Knowledge Graph[J]. 数据分析与知识发现, 2020, 4(10): 104-112.
[14] Du Jian. Measuring Uncertainty of Medical Knowledge: A Literature Review[J]. 数据分析与知识发现, 2020, 4(10): 14-27.
[15] Peng Guan,Yuefen Wang. Advances in Patent Network[J]. 数据分析与知识发现, 2020, 4(1): 26-39.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn