|
|
Construction of Keywords-Chinese Library Classification Codes Integrated Thesaurus |
Yang He1,2, Yang Yihong1,2, Li Ning2 |
1. Institute of Scientific & Technical Information of China, Beijing 100038, China;
2. Beijing Wanfang Data Co., Ltd., Beijing 100038, China |
|
|
Abstract Based on years of massive manual indexing data, this paper constructs a natural language classification thesaurus with Mutual Information (MI), Chi-Square (χ2) and Maximum Likelihood Estimate (MLE) to analyze the corresponding relation between keywords and Chinese Library Classification Codes. The performances of the Keywords-Chinese Library Classification Codes Integrated Thesaurus used for automatic indexing of sci-tech literatures are tested by close and open testing.
|
Received: 07 April 2013
Published: 02 September 2013
|
|
[1] 国家图书馆《中国图书馆分类法》编辑委员会.《中国分类主题词表》(第二版)及其电子版手册[M].北京:北京图书馆出版社,2006:43.(National Library of China, Editorial Committee of 《Chinese Library Classification》. 《Chinese Classified Thesaurus》 (The 2nd Edition) with Its Electronic Version of the Handbook[M]. Beijing: Beijing Library Press, 2006:43.)[2] 国家图书馆《中国图书馆分类法》编辑委员会.《中国分类主题词表》(第二版)[DB/CD].北京:北京图书馆出版社,2006.(National Library of China, Editorial Committee of 《Chinese Library Classification》. 《Chinese Classified Thesaurus》 (The 2nd Edition)[DB/CD]. Beijing: Beijing Library Press, 2006.)[3] 王梦云,曹素青.基于字频向量的中文文本自动分类系统[J]. 情报学报,2000,19(6):644-649. (Wang Mengyun, Cao Suqing. The System for Automatic Text Categorization Based on Chinese Character Vector[J]. Journal of the China Society for Scientific and Technical Information,2000,19(6):644-649.)[4] 周茜,赵明生,扈旻.中文文本分类中的特征选择研究[J]. 中文信息学报,2004,18(3):17-23.(Zhou Qian, Zhao Mingsheng, Hu Min. Study on Feature Selection in Chinese Text Categorization[J]. Journal of Chinese Information Processing,2004, 18(3):17-23.)[5] Yang Y, Pedersen J O. A Comparative Study on Feature Selection in Text Categorization[C].In: Proceedings of the 14th International Conference on Machine Learning, Nashville, Tennessee.1997: 412-420.[6] Yang Y. An Evaluation of Statistical Approaches to Text Categorization[J].Information Retrieval,1999,1(1-2):69-90.[7] Yang Y, Liu X. A Re-examination of Text Categorization Methods[C].In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY, USA: ACM, 1999: 42-49.[8] Mladenic D, Grobelnik M. Feature Selection for Unbalanced Class Distribution and Naive Bayes[C].In: Proceedings of the 16th International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,1999: 258-267.[9] Kolcz A, Prabakarmurthi V, Kalita J. Summarization as Feature Selection for Text Categorization[C].In: Proceedings of the 10th International Conference on Information and Knowledge Management. New York, NY, USA: ACM, 2001: 365-370.[10] Lassi M. Automatic Thesaurus Construction[EB/OL].[2013-03-09]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.106.3346&rep=rep1&type=pdf.[11] 侯汉清,薛鹏军.中文信息自动分类用知识库的设计与构建[J]. 情报学报,2003,22(6):681-686.(Hou Hanqing, Xue Pengjun. Design & Construction of Knowledge Database for Automatic Classification in Chinese[J]. Journal of the China Society for Scientific and Technical Information,2003,22(6):681-686.)[12] Liu H, Motoda H, Setiono R, et al. Feature Selection: An Ever Evolving Frontier in Data Mining[C].In: Proceedings of the 4th Workshop on Feature Selection in Data Mining. 2010:4-13.[13] Yan X. A Study for Important Criteria of Feature Selection in Text Categorization[C].In: Proceedings of the 2nd International Workshop on Intelligent Systems and Applications (ISA).2010: 1-4.[14] 代六玲,黄河燕,陈肇雄,等.中文文本分类中特征抽取方法的比较研究[J]. 中文信息学报,2004,18(1):26-32.(Dai Liuling, Huang Heyan, Chen Zhaoxiong, et al. A Comparative Study on Feature Selection in Chinese Text Categorization[J]. Journal of Chinese Information Processing,2004,18(1):26-32.)[15] 胡佳妮,徐蔚然,郭军,等.中文文本分类中的特征选择算法研究[J]. 光通信研究,2005(3):44-46.(Hu Jiani, Xu Weiran, Guo Jun, et al. Study on Feature Selection Methods in Chinese Text Categorization[J]. Study on Optical Communications, 2005(3):44-46.)[16] 张雪英.经济信息检索高速词汇转换系统的设计[D]. 南京:南京农业大学,1999.(Zhang Xueying. Design of a Vocabulary Switching System of Economic Information Retrieval[D]. Nanjing: Nanjing Agricultural University, 1999.)[17] Hofmann T. Probabilistic Latent Semantic Indexing[C].In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY, USA:ACM,1999: 50-57.[18] 季铎,郑伟,蔡东风.潜在语义索引中特征优化技术的研究[J]. 中文信息学报,2009,23(2):69-76. (Ji Duo, Zheng Wei, Cai Dongfeng. Research on Feature Optimization in Latent Semantic Indexing[J]. Journal of Chinese Information Processing,2009,23(2):69-76.)[19] 杨贺,杨奕虹,乔晓东,等.用于计算机辅助文献标引加工系统的自然语言词表构建[J]. 现代图书情报技术, 2010(6):17-24. (Yang He, Yang Yihong, Qiao Xiaodong, et al.Construction of Natural Language Thesauri for Automatic Assistant Indexing Literature System[J]. New Technology of Library and Information Service,2010(6):17-24.) |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|