Please wait a minute...
New Technology of Library and Information Service  2013, Vol. 29 Issue (7/8): 107-113    DOI: 10.11925/infotech.1003-3513.2013.07-08.16
article Current Issue | Archive | Adv Search |
Construction of Keywords-Chinese Library Classification Codes Integrated Thesaurus
Yang He1,2, Yang Yihong1,2, Li Ning2
1. Institute of Scientific & Technical Information of China, Beijing 100038, China;
2. Beijing Wanfang Data Co., Ltd., Beijing 100038, China
Download: PDF(1129 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  Based on years of massive manual indexing data, this paper constructs a natural language classification thesaurus with Mutual Information (MI), Chi-Square (χ2) and Maximum Likelihood Estimate (MLE) to analyze the corresponding relation between keywords and Chinese Library Classification Codes. The performances of the Keywords-Chinese Library Classification Codes Integrated Thesaurus used for automatic indexing of sci-tech literatures are tested by close and open testing.
Key wordsKeywords-Chinese Library Classification Codes Integrated Thesaurus      Literature processing      Automatic indexing      Automatic categorization     
Received: 07 April 2013      Published: 02 September 2013
: 

G254

 

Cite this article:

Yang He, Yang Yihong, Li Ning. Construction of Keywords-Chinese Library Classification Codes Integrated Thesaurus. New Technology of Library and Information Service, 2013, 29(7/8): 107-113.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2013.07-08.16     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2013/V29/I7/8/107

[1] 国家图书馆《中国图书馆分类法》编辑委员会.《中国分类主题词表》(第二版)及其电子版手册[M].北京:北京图书馆出版社,2006:43.(National Library of China, Editorial Committee of 《Chinese Library Classification》. 《Chinese Classified Thesaurus》 (The 2nd Edition) with Its Electronic Version of the Handbook[M]. Beijing: Beijing Library Press, 2006:43.)
[2] 国家图书馆《中国图书馆分类法》编辑委员会.《中国分类主题词表》(第二版)[DB/CD].北京:北京图书馆出版社,2006.(National Library of China, Editorial Committee of 《Chinese Library Classification》. 《Chinese Classified Thesaurus》 (The 2nd Edition)[DB/CD]. Beijing: Beijing Library Press, 2006.)
[3] 王梦云,曹素青.基于字频向量的中文文本自动分类系统[J]. 情报学报,2000,19(6):644-649. (Wang Mengyun, Cao Suqing. The System for Automatic Text Categorization Based on Chinese Character Vector[J]. Journal of the China Society for Scientific and Technical Information,2000,19(6):644-649.)
[4] 周茜,赵明生,扈旻.中文文本分类中的特征选择研究[J]. 中文信息学报,2004,18(3):17-23.(Zhou Qian, Zhao Mingsheng, Hu Min. Study on Feature Selection in Chinese Text Categorization[J]. Journal of Chinese Information Processing,2004, 18(3):17-23.)
[5] Yang Y, Pedersen J O. A Comparative Study on Feature Selection in Text Categorization[C].In: Proceedings of the 14th International Conference on Machine Learning, Nashville, Tennessee.1997: 412-420.
[6] Yang Y. An Evaluation of Statistical Approaches to Text Categorization[J].Information Retrieval,1999,1(1-2):69-90.
[7] Yang Y, Liu X. A Re-examination of Text Categorization Methods[C].In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY, USA: ACM, 1999: 42-49.
[8] Mladenic D, Grobelnik M. Feature Selection for Unbalanced Class Distribution and Naive Bayes[C].In: Proceedings of the 16th International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,1999: 258-267.
[9] Kolcz A, Prabakarmurthi V, Kalita J. Summarization as Feature Selection for Text Categorization[C].In: Proceedings of the 10th International Conference on Information and Knowledge Management. New York, NY, USA: ACM, 2001: 365-370.
[10] Lassi M. Automatic Thesaurus Construction[EB/OL].[2013-03-09]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.106.3346&rep=rep1&type=pdf.
[11] 侯汉清,薛鹏军.中文信息自动分类用知识库的设计与构建[J]. 情报学报,2003,22(6):681-686.(Hou Hanqing, Xue Pengjun. Design & Construction of Knowledge Database for Automatic Classification in Chinese[J]. Journal of the China Society for Scientific and Technical Information,2003,22(6):681-686.)
[12] Liu H, Motoda H, Setiono R, et al. Feature Selection: An Ever Evolving Frontier in Data Mining[C].In: Proceedings of the 4th Workshop on Feature Selection in Data Mining. 2010:4-13.
[13] Yan X. A Study for Important Criteria of Feature Selection in Text Categorization[C].In: Proceedings of the 2nd International Workshop on Intelligent Systems and Applications (ISA).2010: 1-4.
[14] 代六玲,黄河燕,陈肇雄,等.中文文本分类中特征抽取方法的比较研究[J]. 中文信息学报,2004,18(1):26-32.(Dai Liuling, Huang Heyan, Chen Zhaoxiong, et al. A Comparative Study on Feature Selection in Chinese Text Categorization[J]. Journal of Chinese Information Processing,2004,18(1):26-32.)
[15] 胡佳妮,徐蔚然,郭军,等.中文文本分类中的特征选择算法研究[J]. 光通信研究,2005(3):44-46.(Hu Jiani, Xu Weiran, Guo Jun, et al. Study on Feature Selection Methods in Chinese Text Categorization[J]. Study on Optical Communications, 2005(3):44-46.)
[16] 张雪英.经济信息检索高速词汇转换系统的设计[D]. 南京:南京农业大学,1999.(Zhang Xueying. Design of a Vocabulary Switching System of Economic Information Retrieval[D]. Nanjing: Nanjing Agricultural University, 1999.)
[17] Hofmann T. Probabilistic Latent Semantic Indexing[C].In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY, USA:ACM,1999: 50-57.
[18] 季铎,郑伟,蔡东风.潜在语义索引中特征优化技术的研究[J]. 中文信息学报,2009,23(2):69-76. (Ji Duo, Zheng Wei, Cai Dongfeng. Research on Feature Optimization in Latent Semantic Indexing[J]. Journal of Chinese Information Processing,2009,23(2):69-76.)
[19] 杨贺,杨奕虹,乔晓东,等.用于计算机辅助文献标引加工系统的自然语言词表构建[J]. 现代图书情报技术, 2010(6):17-24. (Yang He, Yang Yihong, Qiao Xiaodong, et al.Construction of Natural Language Thesauri for Automatic Assistant Indexing Literature System[J]. New Technology of Library and Information Service,2010(6):17-24.)
[1] Zhao Yan, Chen Heng. A Method to Improve Accuracy of Automatic Indexing for Chinese-English Mixed Text[J]. 现代图书情报技术, 2012, 28(6): 36-42.
[2] Yang He Yang Yihong Qiao Xiaodong Li Ning Zhu Lijun. Construction of Natural Language Thesauri for Automatic Assistant Indexing Literature System[J]. 现代图书情报技术, 2010, 26(6): 17-24.
[3] Zhang Chengmin,Xu Xin,Zhang Chengzhi. Analysis of the Factors Affecting the Performance of CRF-based Keywords Extraction Model[J]. 现代图书情报技术, 2008, 24(6): 34-40.
[4] Zhang Chengzhi. Review and Prospect of Automatic Indexing Research[J]. 现代图书情报技术, 2007, 2(11): 33-39.
[5] Wang Lancheng,Wang Lishuang. Research on a New Text Automatic Indexing Technology Based on Digital Library[J]. 现代图书情报技术, 2006, 1(2): 5-9.
[6] Tan Jinbo,Li Yi,Yang Xiaojiang. Development of Text Automatic Categorization Measurement Research.[J]. 现代图书情报技术, 2005, 21(5): 46-49.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn