Please wait a minute...
Advanced Search
现代图书情报技术  2013, Vol. 29 Issue (7/8): 101-106    DOI: 10.11925/infotech.1003-3513.2013.07-08.15
  情报分析与研究 本期目录 | 过刊浏览 | 高级检索 |
基于统计分布的中文专利自动分类方法研究
胡冰1, 张建立2
1. 西安电子科技大学经济与管理学院 西安 710071;
2. 工业和信息化部电子科学技术情报研究所 北京 100043
Research on Chinese Patent Automatic Classification Method Based on Statistical Distribution
Hu Bing1, Zhang Jianli2
1. School of Economics & Management, Xidian University, Xi'an 710071, China;
2. Electronic Technology Information Research Institute, Ministry of Industry and Information Technology of the People's Republic of China, Beijing 100043, China
全文: PDF(665 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 传统的基于向量空间模型的文本自动分类算法没有考虑到特征词的类间分布情况及特征词在文本内部的位置分布情况,导致该算法用于专利分类时效果不佳。提出一种基于统计分布的中文专利自动分类方法。首先,统计出特征词的类间分布信息,引入类间分散度加权因子,突出分布类别少、出现频率高的特征词的权重;其次,结合专利文本的结构特点,引入位置权重因子,突出专利的法律特性和技术特性以及组成专利各元素内容的差异性。最后通过对比实验证明,该方法能够有效提高中文专利自动分类的效果。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
胡冰
张建立
关键词 统计分布专利自动分类加权因子    
Abstract:Traditional text automatic classification algorithm based on Vector Space Model fails to take the distribution information of terms among classes and the position information of terms in class into consideration, which leads to a poor performance of the algorithm in patent classification. This paper proposes a Chinese patent automatic classification method based on statistical distribution. Firstly, this paper puts forward distribution information weighting factor to manifest the weighting of the terms that appear frequently but in less class. Then, combining with the structural feature of patent text, this paper introduces position information weighting factor to highlight the legal and technical characteristics of patent and differences of patent's each element in content. Finally, the contrast experiment shows that the classification effect can be improved sufficiently by this proposed method.
Key wordsStatistical distribution    Patent automatic classification    Weighting factor
收稿日期: 2013-03-27     
: 

TP391.1

 
通讯作者: 胡冰     E-mail: hubing_mafia@sina.com
引用本文:   
胡冰, 张建立. 基于统计分布的中文专利自动分类方法研究[J]. 现代图书情报技术, 2013, 29(7/8): 101-106.
Hu Bing, Zhang Jianli. Research on Chinese Patent Automatic Classification Method Based on Statistical Distribution. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2013.07-08.15.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2013.07-08.15
[1] 刘玉琴, 赖院根, 雷孝平. 基于IPC知识结构的专利自动分类模型[J]. 小型微型计算机系统,2007, 28(12):2295-2298.(Liu Yuqin, Lai Yuangen, Lei Xiaoping. Automated Categorization Model of Patent Based on the Knowledge of IPC[J]. Journal of Chinese Computer Systems, 2007, 28(12):2295-2298.)
[2] 李生珍, 王建新, 齐建东, 等. 基于BP神经网络的专利自动分类方法[J]. 计算机工程与设计, 2010, 31(23):5075-5078.(Li Shengzhen, Wang Jianxin, Qi Jiandong, et al. Automated Categorization of Patent Based on Back-propagation Network[J]. Computer Engineering and Design, 2010, 31(23):5075-5078.)
[3] Yoon B, Park Y. A Systematic Approach for Identifying Technology Opportunities: Keyword-based Morphology Analysis[J]. Technological Forecasting and Social Change, 2005, 72(2):145-160.
[4] Shih M J, Liu D R, Hsu M L. Discovering Competitive Intelligence by Mining Changes in Patent Trends[J]. Expert Systems with Applications, 2010, 37(4):2882-2890.
[5] 赵环宇. 中文专利自动分类技术的研究[D].沈阳:沈阳航空工业学院, 2009.(Zhao Huanyu. Research on Automatic Categorization Technology for Chinese Patent Documentation[D]. Shenyang:Shenyang Aerospace University, 2009.)
[6] Mathiassen H, Ortiz-Arroyo D. Automatic Classification of Patent Applications Using Classifier Combinations[C]. In: Proceedings of the 7th International Conference on Intelligent Data Engineering and Automated Learning, Burgos,Spain. 2006: 1039-1047.
[7] 李程雄, 丁月华, 文贵华. SVM- KNN组合改进算法在专利文本分类中的应用[J]. 计算机工程与应用, 2006, 42(20):193-195.(Li Chengxiong, Ding Yuehua, Wen Guihua. Application of SVM-KNN Combination Improvement Algorithm on Patent Text Classification[J]. Computer Engineering and Applications, 2006, 42(20):193-195.)
[8] 邓擘, 樊孝忠, 杨立公. 基于统计分布与集合论的文本分类方法[J]. 北京理工大学学报, 2006,26(7):589-592. (Deng Bo, Fan Xiaozhong, Yang Ligong. A Method of Text Classification Based on Statistical Technology and Set Theory[J]. Transactions of Beijing Institute of Technology, 2006,26(7):589-592.)
[9] 蒋健安, 陆介平, 倪巍伟, 等. 一种面向专利文献数据的文本自动分类方法[J]. 计算机应用, 2008,28(1):159-161. (Jiang Jian'an, Lu Jieping, Ni Weiwei, et al. Automatic Text Categorization for Patent Data[J]. Journal of Computer Applications, 2008,28(1):159-161.)
[10] Salton G, Wong A, Yang C S. A Vector Space Model for Automatic Indexing[J]. Communications of the ACM,1975,18(11):613-620.
[11] Salton G, Buckley C. Term Weighting Approaches in Automatic Text Retrieval[J]. Information Processing and Management, 1988, 24(5):513-523.
[12] 施聪莺, 徐朝军, 杨晓江. TFIDF算法研究综述[J]. 计算机应用, 2009,29(S1):167-170,180. (Shi Congying, Xu Chaojun, Yang Xiaojiang. Study of TFIDF Algorithm[J]. Journal of Computer Applications,2009,29(S1):167-170,180.)
[13] 台德艺, 王俊. 文本分类特征权重改进算法[J]. 计算机工程, 2010,36(9):197-199. (Tai Deyi, Wang Jun. Improved Feature Weighting Algorithm for Text Categorization[J].Computer Engineering, 2010,36(9):197-199.)
[14] 高继平,丁堃. 基于专利文件知识结构的中文专利知识单元挖掘[J]. 情报理论与实践, 2011,34(6): 83-86. (Gao Jiping, Ding Kun. Chinese Patent Knowledge Unit Mining Based on Patent Document Knowledge Structure[J]. Information Studies: Theory & Application, 2011,34(6):83-86.)
[15] Salton G, McGillM J. Introduction to Modern Information Retrieval[M]. New York, NY, USA:McGraw Hill, 1983.
[1] 马芳. 基于RBFNN的专利自动分类研究[J]. 现代图书情报技术, 2011, 27(12): 58-63.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn