Abstract:With 1987-2009 documents in Wanfang Data, the paper collects all documents of industrial technology. Within 16 second categories, it computes the keywords frequency, and calculates the standard deviation of keywords within relative categories. There are more than 50% keywords can be attributed to one category, and nearly 90% keywords can be put in 1-3 categories. If keywords belong to 3 or more than 3 categories, when the word frequency is less than 11, 16% of the words can be categorized; when word frequency is equal or greater than 11, and 49% of the words can be categorized. Test concludes that keywords can be classified by machine-aided with keyword frequency statistics and standard deviation, which is better than traditional classification method.
常春, 赖院根. 专业概念机器辅助分类方法研究[J]. 现代图书情报技术, 2011, 27(10): 34-39.
Chang Chun, Lai Yuangen. Research on Machine-aided Classification Methods of Domain Concepts. New Technology of Library and Information Service, 2011, 27(10): 34-39.
[1] 贺德方.《汉语主题词表》的回顾与展望[J]. 情报理论与实践, 2010,33(2): 1-4.[2] 戴维民. 信息组织[M]. 北京:高等教育出版社, 2009.[3] 赵妍, 侯汉清. 中文期刊文献通用词标引分析[J]. 图书与情报, 2007(1): 63-65.[4] 张琪玉, 侯汉清. 情报检索语言实用教程[M]. 武汉:武汉大学出版社, 2004.[5] Crouch C J. A Cluster-based Approach to Thesaurus Construction . In:Proceeding of the 11th Annual International ACM/SIGIR Conference on Research & Development in Information Retrieval. Grenoble: ACM Press, 1988:309-320.[6] Crouch C J, Yang B. Experiments in Automatic Statistical Thesaurus Construction . In:Proceeding of the 15th Annual International ACM/SIGIR Conference on Research & Development in Information Retrieval. Copenhagen: ACM Press, 1992:21-24.[7] 刘华梅, 侯汉清. 基于受控词表互操作的集成词库构建研究[J]. 中国图书馆学报, 2010,36(3): 67-72.[8] 侯汉清, 刘华梅, 郝嘉树. 60年来情报检索语言及其互操作进展(1949-2009)[J]. 图书馆杂志, 2009(12): 2-13.[9] 侯汉清, 薛鹏军. 中文信息自动分类用知识库的设计与构建[J]. 情报学报, 2003, 22(6): 681-686.[10] 陆勇, 章成志, 侯汉清. 基于百科资源的多策略中文同义词自动抽取研究[J]. 中国图书馆学报 , 2010, 36(1): 56-62.[11] 常春, 赖院根. 基于文献标题词汇共现获取词间关系研究[J]. 图书情报工作, 2009, 53(8): 17-20.[12] Bechhofer S,Goble C. Thesaurus Construction Through Knowledge Representation[J]. Data & Knowledge Engineers,2001, 37(1):25-45.[13] Wartena C,Brussee R. Instanced-based Mapping Between Thesauri and Folksonomies . In:Proceeding of the 7th International Semantic Web Conference. 2008: 356-370.[14] 万方数据.http://www.wanfangdata.com.cn/.[15] Salton G, Yang C S. On the Specification of Term Values in Automatic Indexing [J]. Journal of Documentation, 1973, 29(4): 351-372.[16] Salton G, Buckley C. Term-weighting Approaches in Automatic Text Retrieval [J]. Information Processing & Management, 1988, 24(5): 513-523.[17] 中国图书资料分类法编辑委员会.中国图书资料分类法(第四版)[M].北京:科学技术文献出版社,2000.