Please wait a minute...
New Technology of Library and Information Service  2013, Vol. 29 Issue (7/8): 101-106    DOI: 10.11925/infotech.1003-3513.2013.07-08.15
Current Issue | Archive | Adv Search |
Research on Chinese Patent Automatic Classification Method Based on Statistical Distribution
Hu Bing1, Zhang Jianli2
1. School of Economics & Management, Xidian University, Xi'an 710071, China;
2. Electronic Technology Information Research Institute, Ministry of Industry and Information Technology of the People's Republic of China, Beijing 100043, China
Export: BibTeX | EndNote (RIS)      
Abstract  Traditional text automatic classification algorithm based on Vector Space Model fails to take the distribution information of terms among classes and the position information of terms in class into consideration, which leads to a poor performance of the algorithm in patent classification. This paper proposes a Chinese patent automatic classification method based on statistical distribution. Firstly, this paper puts forward distribution information weighting factor to manifest the weighting of the terms that appear frequently but in less class. Then, combining with the structural feature of patent text, this paper introduces position information weighting factor to highlight the legal and technical characteristics of patent and differences of patent's each element in content. Finally, the contrast experiment shows that the classification effect can be improved sufficiently by this proposed method.
Key wordsStatistical distribution      Patent automatic classification      Weighting factor     
Received: 27 March 2013      Published: 02 September 2013



Cite this article:

Hu Bing, Zhang Jianli. Research on Chinese Patent Automatic Classification Method Based on Statistical Distribution. New Technology of Library and Information Service, 2013, 29(7/8): 101-106.

URL:     OR

[1] 刘玉琴, 赖院根, 雷孝平. 基于IPC知识结构的专利自动分类模型[J]. 小型微型计算机系统,2007, 28(12):2295-2298.(Liu Yuqin, Lai Yuangen, Lei Xiaoping. Automated Categorization Model of Patent Based on the Knowledge of IPC[J]. Journal of Chinese Computer Systems, 2007, 28(12):2295-2298.)
[2] 李生珍, 王建新, 齐建东, 等. 基于BP神经网络的专利自动分类方法[J]. 计算机工程与设计, 2010, 31(23):5075-5078.(Li Shengzhen, Wang Jianxin, Qi Jiandong, et al. Automated Categorization of Patent Based on Back-propagation Network[J]. Computer Engineering and Design, 2010, 31(23):5075-5078.)
[3] Yoon B, Park Y. A Systematic Approach for Identifying Technology Opportunities: Keyword-based Morphology Analysis[J]. Technological Forecasting and Social Change, 2005, 72(2):145-160.
[4] Shih M J, Liu D R, Hsu M L. Discovering Competitive Intelligence by Mining Changes in Patent Trends[J]. Expert Systems with Applications, 2010, 37(4):2882-2890.
[5] 赵环宇. 中文专利自动分类技术的研究[D].沈阳:沈阳航空工业学院, 2009.(Zhao Huanyu. Research on Automatic Categorization Technology for Chinese Patent Documentation[D]. Shenyang:Shenyang Aerospace University, 2009.)
[6] Mathiassen H, Ortiz-Arroyo D. Automatic Classification of Patent Applications Using Classifier Combinations[C]. In: Proceedings of the 7th International Conference on Intelligent Data Engineering and Automated Learning, Burgos,Spain. 2006: 1039-1047.
[7] 李程雄, 丁月华, 文贵华. SVM- KNN组合改进算法在专利文本分类中的应用[J]. 计算机工程与应用, 2006, 42(20):193-195.(Li Chengxiong, Ding Yuehua, Wen Guihua. Application of SVM-KNN Combination Improvement Algorithm on Patent Text Classification[J]. Computer Engineering and Applications, 2006, 42(20):193-195.)
[8] 邓擘, 樊孝忠, 杨立公. 基于统计分布与集合论的文本分类方法[J]. 北京理工大学学报, 2006,26(7):589-592. (Deng Bo, Fan Xiaozhong, Yang Ligong. A Method of Text Classification Based on Statistical Technology and Set Theory[J]. Transactions of Beijing Institute of Technology, 2006,26(7):589-592.)
[9] 蒋健安, 陆介平, 倪巍伟, 等. 一种面向专利文献数据的文本自动分类方法[J]. 计算机应用, 2008,28(1):159-161. (Jiang Jian'an, Lu Jieping, Ni Weiwei, et al. Automatic Text Categorization for Patent Data[J]. Journal of Computer Applications, 2008,28(1):159-161.)
[10] Salton G, Wong A, Yang C S. A Vector Space Model for Automatic Indexing[J]. Communications of the ACM,1975,18(11):613-620.
[11] Salton G, Buckley C. Term Weighting Approaches in Automatic Text Retrieval[J]. Information Processing and Management, 1988, 24(5):513-523.
[12] 施聪莺, 徐朝军, 杨晓江. TFIDF算法研究综述[J]. 计算机应用, 2009,29(S1):167-170,180. (Shi Congying, Xu Chaojun, Yang Xiaojiang. Study of TFIDF Algorithm[J]. Journal of Computer Applications,2009,29(S1):167-170,180.)
[13] 台德艺, 王俊. 文本分类特征权重改进算法[J]. 计算机工程, 2010,36(9):197-199. (Tai Deyi, Wang Jun. Improved Feature Weighting Algorithm for Text Categorization[J].Computer Engineering, 2010,36(9):197-199.)
[14] 高继平,丁堃. 基于专利文件知识结构的中文专利知识单元挖掘[J]. 情报理论与实践, 2011,34(6): 83-86. (Gao Jiping, Ding Kun. Chinese Patent Knowledge Unit Mining Based on Patent Document Knowledge Structure[J]. Information Studies: Theory & Application, 2011,34(6):83-86.)
[15] Salton G, McGillM J. Introduction to Modern Information Retrieval[M]. New York, NY, USA:McGraw Hill, 1983.
[1] Ma Fang. Research of Patent Automatic Classification Based on RBFNN[J]. 现代图书情报技术, 2011, 27(12): 58-63.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938