[1] 魏大威, 刘金哲, 薛尧予. 以数字图书馆推广工程为抓手, 构建覆盖全国的数字图书馆服务体系[J]. 国家图书馆学刊, 2012, 21(5): 14-19. (Wei Dawei, Liu Jinzhe, Xue Yaoyu. Using the Digital Library Promotion Project as a Driver, Construct a Country-Wide Digital Library Service Architecture[J]. Journal of the National Library of China, 2012, 21(5): 14-19.)
[2] 王军. 数字图书馆的知识组织系统: 从理论到实践[M]. 北京: 北京大学出版社, 2008. (Wang Jun.The Knowledge Organization System in Digital Library——From Theory to Practice[M]. Beijing: Peking University Press, 2008.)
[3] Wang J. An Extensive Study on Automated Dewey Decimal Classification[J]. Journal of the American Society for Information Science & Technology, 2009, 60(11): 2269-2286.
[4] 肖雪, 何中市. 基于向量空间模型的中文文本层次分类方法研究[J]. 计算机应用, 2006, 26(5): 1125-1126, 1133. (Xiao Xue, He Zhongshi. Hierarchical Categorization Methods of Chinese Text Based on Vector Space Model[J]. Computer Applications, 2006, 26(5): 1125-1126, 1133.)
[5] 何琳, 侯汉清, 白振田, 等. 基于标引经验和机器学习相结合的多层自动分类[J]. 情报学报, 2006, 25(6): 725-729. (He Lin, Hou Hanqing, Bai Zhentian, et al. Automatic Multi- Layer Classification Method Based on Integration of Machine Learning and Indexing Experience[J]. Journal of the China Society for Scientific and Technical Information, 2006, 25 (6): 725-729.)
[6] 张启蕊, 张凌, 董守斌, 等. 训练集类别分布对文本分类的影响[J]. 清华大学学报: 自然科学版, 2005, 45(S1): 1802-1805. (Zhang Qirui, Zhang Ling, Dong Shoubin, et al. Effects of Category Distribution in a Training Set on Text Categorization[J]. Journal of Tsinghua University: Science and Technology, 2005, 45(S1): 1802-1805.)
[7] 肖希明, 郑燃. 国外图书馆、档案馆和博物馆数字资源整合研究进展[J]. 中国图书馆学报, 2012, 38(3): 26-39. (Xiao Ximing, Zheng Ran. Research Progress on Digital Resources Convergence of Libraries, Archives and Museums in Foreign Countries[J]. Journal of Library Science in China, 2012, 38(3): 26-39.)
[8] 林琛, 李弼程, 周杰. 基于信息粒度的交叠类文本分类方法[J]. 情报学报, 2011, 30(4): 339-346. (Lin Chen, Li Bicheng, Zhou Jie. A Text Categorization Method for Overlapping Classes Based on Information Granularity[J]. Journal of the China Society for Scientific and Technical Information, 2011, 30(4): 339-346.)
[9] García V, Alejo R, Sánchez J S, et al. Combined Effects of Class Imbalance and Class Overlap on Instance-Based Classification[A] //Intelligent Data Engineering and Automated Learning–IDEAL 2006[M]. Berlin, Heidelberg: Springer, 2006: 371-378.
[10] Orriols A, Bernadó-Mansilla E. The Class Imbalance Problem in Learning Classifier Systems: A Preliminary Study[C]. In: Proceedings of the 2005 Workshops on Genetic and Evolutionary Computation. ACM, 2005: 74-78.
[11] Japkowicz N, Stephen S. The Class Imbalance Problem: A Systematic Study[J]. Intelligent Data Analysis, 2002, 6(5): 429-449.
[12] 夏战国, 夏士雄, 蔡世玉, 等.类不均衡的半监督高斯过程分类算法[J]. 通信学报, 2013, 34(5):42-51. (Xia Zhanguo, Xia Shixiong, Cai Shiyu, et al. Semi-Supervised Gaussian Process Classification Algorithm Addressing the ClassImbalance[J]. Journal on Communications, 2013, 34(5): 42-51.)
[13] Jo T, Japkowicz N. Class Imbalances Versus Small Disjuncts[J]. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 40-49.
[14] 江颉, 王卓芳, Gong Rongsheng, 等. 不平衡数据分类方法及其在入侵检测中的应用研究[J]. 计算机科学, 2013, 40(4): 131-135. (Jiang Jie, Wang Zhuofang,Gong Rongsheng, et al. Imbalanced Data Classification and Its Application Research for Intrusion Detection[J]. Computer Science, 2013, 40(4): 131-135.)
[15] Estabrooks A, Jo T, Japkowicz N. A Multiple Resampling Method for Learning from Imbalanced Data Sets[J]. Computational Intelligence, 2004, 20(1): 18-36.
[16] Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: Synthetic Minority Over-Sampling Technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 321-357.
[17] Han H, Wang W Y, Mao B H. Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning[C]. In: Proceedings of International Conference on intelligent Computing (ICIC 2005), Hefei, China. Berlin, Heidelberg: Springer, 2005: 878-887.
[18] Batista G E, Prati R C, Monard M C. A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data[J]. ACM Sigkdd Explorations Newsletter, 2004, 6(1): 20-29.
[19] Chen E, Lin Y, Xiong H, et al. Exploiting Probabilistic Topic Models to Improve Text Categorization Under Class Imbalance[J]. Information Processing & Management, 2011, 47(2): 202-214.
[20] 张清华, 王国胤, 胡军, 等. 多粒度知识获取与不确定性度量[M]. 北京: 科学出版社, 2013. (Zhang Qinghua, Wang Guoyin, Hu Jun, et al. Multi-Granularity Knowledge Acquisition and Measure of Uncertainty[M]. Beijing: Science Press, 2013.)
[21] 郭虎升, 亓慧, 王文剑. 处理非平衡数据的粒度SVM学习算法[J]. 计算机工程, 2010, 36(2): 181-183. (Guo Husheng, Qi Hui, Wang Wenjian. Granular SVM Learning Algorithm for Processing Imbalanced Data[J]. Computer Engineering, 2010, 36(2): 181-183.)
[22] 林洋港, 陈恩红. 文本分类中基于概率主题模型的噪声处理方法[J]. 计算机工程与科学, 2010, 32(7): 89-92, 119. (Lin Yanggang, Chen Enhong. A Probabilistic Topic Model Based Noise Processing Method for Text Classification[J]. Computer Engineering and Science, 2010, 32(7): 89-92, 119.)
[23] Zadeh L A. Fuzzy Sets and Information Granularity[A] //Advances in Fuzzy Set Theory and Applications[M]. Amsterdam: North-Holland Publishing Co., 1979: 3-18.
[24] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. The Journal of Machine Learning Research, 2003, 3: 993-1022.
[25] Heinrich G. Parameter Estimation for Text analysis[R]. Germany: Fraunhofer IGD, 2005.
[26] Cao J, Xia T, Li J, et al. A Density-based Method for Adaptive LDA Model Selection[J]. Neurocomputing, 2009, 72(7-9): 1775-1781.
[27] 张华平. ICTCLAS汉语分词系统[EB/OL].[2014-01-01]. http://ictclas.nlpir.org/. (Zhang Huaping. ICTCLAS Chinese Word Segmentation System[EB/OL].[2014-01-01]. http://ictclas.nlpir.org/.)
[28] 李荣陆. 复旦大学中文分类语料库[DB/OL].[2014-01-01]. http://www.datatang.com/data/43318. (Li Ronglu. Chinese Categorization Corpus from Fudan University[DB/OL].[2014-01-01]. http://www.datatang.com/data/43318. )
[29] 搜狗实验室. 文本分类语料库[DB/OL].[2013-08-22]. http://www.sogou.com/labs/dl/t.html. (Sogou Labs. Text Categorization Corpus[DB/OL].[2013-08-22]. http://www.sogou.com/labs/dl/t.html.) |