|
|
Fundamental Research Questions in Patent Text Categorization |
Qu Peng, Wang Huilin |
Institute of Scientific & Technical Information of China, Beijing 100038, China |
|
|
Abstract The paper focuses on some fundamental problems in patent text categorization, including the feasibility of using terms for automatic categorization, the research on claim categorization, and the effect of classes with close-related topics on the categorization result. The research is executed on two Naive Bayesian classifiers, kNN, Racchio and SVM classifier, and cross validation is used for testing. The results of the paper are that terms are better than common features under the same settings, that training a classifier with abstracts can improve the claim categorization results, and that classes with close-related topics result in low precision and hierarchical design of classifier is necessary, correspondingly. The paper provides fundamental data for patent text categorization and can be referred by information analysis and other applications using patents.
|
Received: 08 March 2013
Published: 14 May 2013
|
|
[1] 李程雄, 丁月华, 文贵华. SVM-KNN组合改进算法在专利文本分类中的应用[J]. 计算机工程与应用 , 2006, 42(20): 193-195. (Li Chengxiong, Ding Yuehua, Wen Guihua. Application of SVM-kNN Combination Improvement Algorithm on Patent Text Classification[J]. Computer Engineering and Applications, 2006, 42(20): 193-195.) [2] 丁月华, 文贵华, 郭炜强. 基于核向量空间模型的专利分类[J]. 华南理工大学学报:自然科学版 , 2005, 33(8): 58-61. (Ding Yuehua, Wen Guihua, Guo Weiqiang. Patent Categorization Based on Kernel Vector Space Model[J]. Journal of South China University of Technology: Natural Science Edition, 2005, 33(8): 58-61.) [3] 郭炜强, 文军, 文贵华. 基于贝叶斯模型的专利分类[J]. 计算机工程与设计 , 2005, 26(8): 1986-1987,1996. (Guo Weiqiang, Wen Jun, Wen Guihua. Patent Categorization Based on Bayes Model[J]. Computer Engineering and Design, 2005, 26(8): 1986-1987,1996.) [4] 蒋健安, 陆介平, 倪巍伟, 等. 一种面向专利文献数据的文本自动分类方法[J]. 计算机应用 , 2008, 28(1): 159-161. (Jiang Jian’an, Lu Jieping, Ni Weiwei, et al. Automatic Text Categorization for Patent Data[J]. Journal of Computer Applications, 2008, 28(1): 159-161.) [5] 李生珍, 王建新, 齐建东, 等. 基于BP神经网络的专利自动分类法[J]. 计算机工程与设计 , 2010, 31(23): 5075-5078. (Li Shengzhen, Wang Jianxin, Qi Jiandong, et al. Automated Categorization of Patent Based on Back-propagation Network [J]. Computer Engineering and Design, 2010, 31(23): 5075-5078.) [6] 季铎, 蔡云雷, 蔡东风, 等. 基于共享最近邻的专利自动分类技术研究[J]. 沈阳航空工业学院学报 , 2010, 27(4): 41-46. (Ji Duo, Cai Yunlei, Cai Dongfeng, et al. Patent Automatic Classification Research Based on Shared Nearest Neighbor [J]. Journal of Shenyang Institute of Aeronautical Engineering, 2010, 27(4): 41-46.) [7] 褚晓雷. 基于机器学习的专利分类研究[D]. 上海: 上海交通大学, 2008. (Chu Xiaolei. Machine Learning Based Patent Categorization[D]. Shanghai: Shanghai Jiaotong University, 2008.) [8] 叶志飞. 并行化最小最大模块化支撑向量机及其在专利分类中的应用[D]. 上海: 上海交通大学, 2009. (Ye Zhifei. Parallel Min-Max Modular Support Vector Machine with Application to Patent Classification[D]. Shanghai: Shanghai Jiaotong University, 2009.) [9] Li Y Y,Bontcheva K,Cunningham H. SVM Based Learning System for F-term Patent Classification[C]. In: Proceedings of the 6th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and CrossLingual Information Access. 2007. [10] Fall C J, Törcsvári A, Benzineb K, et al. Automated Categorization in the International Patent Classification [J/OL]. ACM SIGIR Forum,2003,37(1):10-25. [2013-03-07]. http://www.sigir.org/forum/S2003/CJF_ Manuscript_sigir.pdf. [11] Lai K K, Wu S J. Using the Patent Co-citation Approach to Establish a New Patent Classification System[J]. Information Processing and Management, 2005, 41(2): 313-330. [12] Li X, Chen H, Zhang Z, et al. Automatic Patent Classification Using Citation Network Information: An Experimental Study in Nanotechnology[C].In: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries. New York: ACM, 2007: 419-427. [13] Porter M. The Porter Stemming Algorithm [EB/OL] . (2006-01-01). [2013-03-07]. http://tartarus.org/ ~martin/PorterStemmer/. [14] 屈鹏, 王惠临. 面向信息分析的专利术语抽取研究[J]. 图书情报工作 , 2013, 57(1): 130-135. (Qu Peng, Wang Huilin. Patent Term Extraction for Information Analysis[J]. Library and Information Services, 2013, 57(1): 130-135.) [15] Joachims T. Making Large-scale SVM Learning Practical[A] // Schölkopf B, Burges C, Smola A, eds. Advances in Kernel Methods-Support Vector Learning[M]. Cambridge, MA: MIT Press, 1999. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|