Please wait a minute...
New Technology of Library and Information Service  2011, Vol. 27 Issue (10): 34-39    DOI: 10.11925/infotech.1003-3513.2011.10.07
article Current Issue | Archive | Adv Search |
Research on Machine-aided Classification Methods of Domain Concepts
Chang Chun, Lai Yuangen
Institute of Scientific & Technical Information of China, Beijing 100038, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  With 1987-2009 documents in Wanfang Data, the paper collects all documents of industrial technology. Within 16 second categories, it computes the keywords frequency, and calculates the standard deviation of keywords within relative categories. There are more than 50% keywords can be attributed to one category, and nearly 90% keywords can be put in 1-3 categories. If keywords belong to 3 or more than 3 categories, when the word frequency is less than 11, 16% of the words can be categorized; when word frequency is equal or greater than 11, and 49% of the words can be categorized. Test concludes that keywords can be classified by machine-aided with keyword frequency statistics and standard deviation, which is better than traditional classification method.
Key wordsThesaurus      Ontology      Concept      Classification      Keywords frequency     
Received: 13 June 2011      Published: 03 December 2011
: 

G254

 

Cite this article:

Chang Chun, Lai Yuangen. Research on Machine-aided Classification Methods of Domain Concepts. New Technology of Library and Information Service, 2011, 27(10): 34-39.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2011.10.07     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2011/V27/I10/34

[1] 贺德方.《汉语主题词表》的回顾与展望[J]. 情报理论与实践, 2010,33(2): 1-4.

[2] 戴维民. 信息组织[M]. 北京:高等教育出版社, 2009.

[3] 赵妍, 侯汉清. 中文期刊文献通用词标引分析[J]. 图书与情报, 2007(1): 63-65.

[4] 张琪玉, 侯汉清. 情报检索语言实用教程[M]. 武汉:武汉大学出版社, 2004.

[5] Crouch C J. A Cluster-based Approach to Thesaurus Construction . In:Proceeding of the 11th Annual International ACM/SIGIR Conference on Research & Development in Information Retrieval. Grenoble: ACM Press, 1988:309-320.

[6] Crouch C J, Yang B. Experiments in Automatic Statistical Thesaurus Construction . In:Proceeding of the 15th Annual International ACM/SIGIR Conference on Research & Development in Information Retrieval. Copenhagen: ACM Press, 1992:21-24.

[7] 刘华梅, 侯汉清. 基于受控词表互操作的集成词库构建研究[J]. 中国图书馆学报, 2010,36(3): 67-72.

[8] 侯汉清, 刘华梅, 郝嘉树. 60年来情报检索语言及其互操作进展(1949-2009)[J]. 图书馆杂志, 2009(12): 2-13.

[9] 侯汉清, 薛鹏军. 中文信息自动分类用知识库的设计与构建[J]. 情报学报, 2003, 22(6): 681-686.

[10] 陆勇, 章成志, 侯汉清. 基于百科资源的多策略中文同义词自动抽取研究[J]. 中国图书馆学报 , 2010, 36(1): 56-62.

[11] 常春, 赖院根. 基于文献标题词汇共现获取词间关系研究[J]. 图书情报工作, 2009, 53(8): 17-20.

[12] Bechhofer S,Goble C. Thesaurus Construction Through Knowledge Representation[J]. Data & Knowledge Engineers,2001, 37(1):25-45.

[13] Wartena C,Brussee R. Instanced-based Mapping Between Thesauri and Folksonomies . In:Proceeding of the 7th International Semantic Web Conference. 2008: 356-370.

[14] 万方数据.http://www.wanfangdata.com.cn/.

[15] Salton G, Yang C S. On the Specification of Term Values in Automatic Indexing [J]. Journal of Documentation, 1973, 29(4): 351-372.

[16] Salton G, Buckley C. Term-weighting Approaches in Automatic Text Retrieval [J]. Information Processing & Management, 1988, 24(5): 513-523.

[17] 中国图书资料分类法编辑委员会.中国图书资料分类法(第四版)[M].北京:科学技术文献出版社,2000.
[1] Fan Shaoping,Zhao Yuxuan,An Xinying,Wu Qingqiang. Classification Model for Medical Entity Relations with Convolutional Neural Network[J]. 数据分析与知识发现, 2021, 5(9): 75-84.
[2] Chen Jie,Ma Jing,Li Xiaofeng. Short-Text Classification Method with Text Features from Pre-trained Models[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[3] Zhou Zeyu,Wang Hao,Zhao Zibo,Li Yueyan,Zhang Xiaoqin. Construction and Application of GCN Model for Text Classification with Associated Information[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[4] Lu Quan, He Chao, Chen Jing, Tian Min, Liu Ting. A Multi-Label Classification Model with Two-Stage Transfer Learning[J]. 数据分析与知识发现, 2021, 5(7): 91-100.
[5] Xie Hao,Mao Jin,Li Gang. Sentiment Classification of Image-Text Information with Multi-Layer Semantic Fusion[J]. 数据分析与知识发现, 2021, 5(6): 103-114.
[6] Yu Bengong,Zhu Xiaojie,Zhang Ziwei. A Capsule Network Model for Text Classification with Multi-level Feature Extraction[J]. 数据分析与知识发现, 2021, 5(6): 93-102.
[7] Song Ruoxuan,Qian Li,Du Yu. Identifying Academic Creative Concept Topics Based on Future Work of Scientific Papers[J]. 数据分析与知识发现, 2021, 5(5): 10-20.
[8] Meng Zhen,Wang Hao,Yu Wei,Deng Sanhong,Zhang Baolong. Vocal Music Classification Based on Multi-category Feature Fusion[J]. 数据分析与知识发现, 2021, 5(5): 59-70.
[9] Zhang Mengyao, Zhu Guangli, Zhang Shunxiang, Zhang Biao. Grouping Microblog Users of Trending Topics Based on Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(2): 43-49.
[10] Sheng Shu, Huang Qi, Yang Yang, Xie Qiwen, Qin Xinguo. Exchanging Chinese Medical Information Based on HL7 FHIR[J]. 数据分析与知识发现, 2021, 5(11): 13-28.
[11] Dong Miao, Su Zhongqi, Zhou Xiaobei, Lan Xue, Cui Zhigang, Cui Lei. Improving PubMedBERT for CID-Entity-Relation Classification Using Text-CNN[J]. 数据分析与知识发现, 2021, 5(11): 145-152.
[12] Feng Hao, Li Shuqing. Multi-layer Cascade Classifier for Credit Scoring with Multiple-Support Vector Machines[J]. 数据分析与知识发现, 2021, 5(10): 28-36.
[13] Wang Yan, Wang Huyan, Yu Bengong. Chinese Text Classification with Feature Fusion[J]. 数据分析与知识发现, 2021, 5(10): 1-14.
[14] Leng Jidong,Lv Xueqiang,Jiang Yang,Li Guolin. Consensus Mechanisms of Consortium Blockchain: A Survey[J]. 数据分析与知识发现, 2021, 5(1): 56-65.
[15] Zeng Zhen,Li Gang,Mao Jin,Chen Jinghao. Data Governance and Domain Ontology of Regional Public Security[J]. 数据分析与知识发现, 2020, 4(9): 41-55.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn