Please wait a minute...
New Technology of Library and Information Service  2013, Vol. 29 Issue (3): 38-44    DOI: 10.11925/infotech.1003-3513.2013.03.07
Current Issue | Archive | Adv Search |
Fundamental Research Questions in Patent Text Categorization
Qu Peng, Wang Huilin
Institute of Scientific & Technical Information of China, Beijing 100038, China
Download: PDF(612 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  The paper focuses on some fundamental problems in patent text categorization, including the feasibility of using terms for automatic categorization, the research on claim categorization, and the effect of classes with close-related topics on the categorization result. The research is executed on two Naive Bayesian classifiers, kNN, Racchio and SVM classifier, and cross validation is used for testing. The results of the paper are that terms are better than common features under the same settings, that training a classifier with abstracts can improve the claim categorization results, and that classes with close-related topics result in low precision and hierarchical design of classifier is necessary, correspondingly. The paper provides fundamental data for patent text categorization and can be referred by information analysis and other applications using patents.
Key wordsPatent      Text categorization      Text mining     
Received: 08 March 2013      Published: 14 May 2013
:  G353.1  

Cite this article:

Qu Peng, Wang Huilin. Fundamental Research Questions in Patent Text Categorization. New Technology of Library and Information Service, 2013, 29(3): 38-44.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2013.03.07     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2013/V29/I3/38

[1] 李程雄, 丁月华, 文贵华. SVM-KNN组合改进算法在专利文本分类中的应用[J]. 计算机工程与应用 , 2006, 42(20): 193-195. (Li Chengxiong, Ding Yuehua, Wen Guihua. Application of SVM-kNN Combination Improvement Algorithm on Patent Text Classification[J]. Computer Engineering and Applications, 2006, 42(20): 193-195.)
[2] 丁月华, 文贵华, 郭炜强. 基于核向量空间模型的专利分类[J]. 华南理工大学学报:自然科学版 , 2005, 33(8): 58-61. (Ding Yuehua, Wen Guihua, Guo Weiqiang. Patent Categorization Based on Kernel Vector Space Model[J]. Journal of South China University of Technology: Natural Science Edition, 2005, 33(8): 58-61.)
[3] 郭炜强, 文军, 文贵华. 基于贝叶斯模型的专利分类[J]. 计算机工程与设计 , 2005, 26(8): 1986-1987,1996. (Guo Weiqiang, Wen Jun, Wen Guihua. Patent Categorization Based on Bayes Model[J]. Computer Engineering and Design, 2005, 26(8): 1986-1987,1996.)
[4] 蒋健安, 陆介平, 倪巍伟, 等. 一种面向专利文献数据的文本自动分类方法[J]. 计算机应用 , 2008, 28(1): 159-161. (Jiang Jian’an, Lu Jieping, Ni Weiwei, et al. Automatic Text Categorization for Patent Data[J]. Journal of Computer Applications, 2008, 28(1): 159-161.)
[5] 李生珍, 王建新, 齐建东, 等. 基于BP神经网络的专利自动分类法[J]. 计算机工程与设计 , 2010, 31(23): 5075-5078. (Li Shengzhen, Wang Jianxin, Qi Jiandong, et al. Automated Categorization of Patent Based on Back-propagation Network [J]. Computer Engineering and Design, 2010, 31(23): 5075-5078.)
[6] 季铎, 蔡云雷, 蔡东风, 等. 基于共享最近邻的专利自动分类技术研究[J]. 沈阳航空工业学院学报 , 2010, 27(4): 41-46. (Ji Duo, Cai Yunlei, Cai Dongfeng, et al. Patent Automatic Classification Research Based on Shared Nearest Neighbor [J]. Journal of Shenyang Institute of Aeronautical Engineering, 2010, 27(4): 41-46.)
[7] 褚晓雷. 基于机器学习的专利分类研究[D]. 上海: 上海交通大学, 2008. (Chu Xiaolei. Machine Learning Based Patent Categorization[D]. Shanghai: Shanghai Jiaotong University, 2008.)
[8] 叶志飞. 并行化最小最大模块化支撑向量机及其在专利分类中的应用[D]. 上海: 上海交通大学, 2009. (Ye Zhifei. Parallel Min-Max Modular Support Vector Machine with Application to Patent Classification[D]. Shanghai: Shanghai Jiaotong University, 2009.)
[9] Li Y Y,Bontcheva K,Cunningham H. SVM Based Learning System for F-term Patent Classification[C]. In: Proceedings of the 6th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and CrossLingual Information Access. 2007.
[10] Fall C J, Törcsvári A, Benzineb K, et al. Automated Categorization in the International Patent Classification [J/OL]. ACM SIGIR Forum,2003,37(1):10-25. [2013-03-07]. http://www.sigir.org/forum/S2003/CJF_ Manuscript_sigir.pdf.
[11] Lai K K, Wu S J. Using the Patent Co-citation Approach to Establish a New Patent Classification System[J]. Information Processing and Management, 2005, 41(2): 313-330.
[12] Li X, Chen H, Zhang Z, et al. Automatic Patent Classification Using Citation Network Information: An Experimental Study in Nanotechnology[C].In: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries. New York: ACM, 2007: 419-427.
[13] Porter M. The Porter Stemming Algorithm [EB/OL] . (2006-01-01). [2013-03-07]. http://tartarus.org/ ~martin/PorterStemmer/.
[14] 屈鹏, 王惠临. 面向信息分析的专利术语抽取研究[J]. 图书情报工作 , 2013, 57(1): 130-135. (Qu Peng, Wang Huilin. Patent Term Extraction for Information Analysis[J]. Library and Information Services, 2013, 57(1): 130-135.)
[15] Joachims T. Making Large-scale SVM Learning Practical[A] // Schölkopf B, Burges C, Smola A, eds. Advances in Kernel Methods-Support Vector Learning[M]. Cambridge, MA: MIT Press, 1999.
[1] Yanan Yang,Wenhui Zhao,Jian Zhang,Shen Tan,Beibei Zhang. Visualizing Policy Texts Based on Multi-View Collaboration[J]. 数据分析与知识发现, 2019, 3(6): 30-41.
[2] Mengji Zhang,Wanyu Du,Nan Zheng. Predicting Stock Trends Based on News Events[J]. 数据分析与知识发现, 2019, 3(5): 11-18.
[3] Cheng Zhou,Hongqin Wei. Evaluating and Classifying Patent Values Based on Self-Organizing Maps and Support Vector Machine[J]. 数据分析与知识发现, 2019, 3(5): 117-124.
[4] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[5] Jie Zhang,Junbo Zhao,Dongsheng Zhai,Ningning Sun. Patent Technology Analysis of Microalgae Biofuel Industrial Chain Based on Topic Model[J]. 数据分析与知识发现, 2019, 3(2): 52-64.
[6] Zhanglu Tan,Zhaogang Wang,Han Hu. Study on a Method of Feature Classification Selection Based on χ2 Statistics[J]. 数据分析与知识发现, 2019, 3(2): 72-78.
[7] Xiangdong Li,Fan Gao,Youhai Li. Categorizing Documents Automatically within Common Semantic Space[J]. 数据分析与知识发现, 2018, 2(9): 66-73.
[8] Ning Zhang,Lemin Yin,Lifeng He. Impacts of “Poster-Follower” Sentiment on Stock Market Performance[J]. 数据分析与知识发现, 2018, 2(6): 1-12.
[9] Xueying Wang,Hao Wang,Zixuan Zhang. Recognizing Semantics of Continuous Strings in Chinese Patent Documents[J]. 数据分析与知识发现, 2018, 2(5): 11-22.
[10] Yan Yu,Naixuan Zhao. Weighted Topic Model for Patent Text Analysis[J]. 数据分析与知识发现, 2018, 2(4): 81-89.
[11] Xinyue Fan,Lei Cui. Using Text Mining to Discover Drug Side Effects: Case Study of PubMed[J]. 数据分析与知识发现, 2018, 2(3): 79-86.
[12] Guoming Feng,Xiaodong Zhang,Suhui Liu. Classifying Chinese Texts with CapsNet[J]. 数据分析与知识发现, 2018, 2(12): 68-76.
[13] Yan Yu,Naixuan Zhao. Choosing Stopwords for Patent Topic Analysis Based on Auxiliary Set[J]. 数据分析与知识发现, 2018, 2(11): 95-103.
[14] Shanshan Jia,Chang Liu,Lianying Sun,Xiaoan Liu,Tao Peng. Patent Classification Based on Multi-feature and Multi-classifier Integration[J]. 数据分析与知识发现, 2017, 1(8): 76-84.
[15] Shuying Li,Shu Fang. Review of Data Analysis Methods in Measuring Technology Fusion and Trend[J]. 数据分析与知识发现, 2017, 1(7): 2-12.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn