Please wait a minute...
New Technology of Library and Information Service  2011, Vol. 27 Issue (10): 34-39    DOI: 10.11925/infotech.1003-3513.2011.10.07
article Current Issue | Archive | Adv Search |
Research on Machine-aided Classification Methods of Domain Concepts
Chang Chun, Lai Yuangen
Institute of Scientific & Technical Information of China, Beijing 100038, China
Download: PDF(506 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  With 1987-2009 documents in Wanfang Data, the paper collects all documents of industrial technology. Within 16 second categories, it computes the keywords frequency, and calculates the standard deviation of keywords within relative categories. There are more than 50% keywords can be attributed to one category, and nearly 90% keywords can be put in 1-3 categories. If keywords belong to 3 or more than 3 categories, when the word frequency is less than 11, 16% of the words can be categorized; when word frequency is equal or greater than 11, and 49% of the words can be categorized. Test concludes that keywords can be classified by machine-aided with keyword frequency statistics and standard deviation, which is better than traditional classification method.
Key wordsThesaurus      Ontology      Concept      Classification      Keywords frequency     
Received: 13 June 2011      Published: 03 December 2011
: 

G254

 

Cite this article:

Chang Chun, Lai Yuangen. Research on Machine-aided Classification Methods of Domain Concepts. New Technology of Library and Information Service, 2011, 27(10): 34-39.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2011.10.07     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2011/V27/I10/34

[1] 贺德方.《汉语主题词表》的回顾与展望[J]. 情报理论与实践, 2010,33(2): 1-4.

[2] 戴维民. 信息组织[M]. 北京:高等教育出版社, 2009.

[3] 赵妍, 侯汉清. 中文期刊文献通用词标引分析[J]. 图书与情报, 2007(1): 63-65.

[4] 张琪玉, 侯汉清. 情报检索语言实用教程[M]. 武汉:武汉大学出版社, 2004.

[5] Crouch C J. A Cluster-based Approach to Thesaurus Construction . In:Proceeding of the 11th Annual International ACM/SIGIR Conference on Research & Development in Information Retrieval. Grenoble: ACM Press, 1988:309-320.

[6] Crouch C J, Yang B. Experiments in Automatic Statistical Thesaurus Construction . In:Proceeding of the 15th Annual International ACM/SIGIR Conference on Research & Development in Information Retrieval. Copenhagen: ACM Press, 1992:21-24.

[7] 刘华梅, 侯汉清. 基于受控词表互操作的集成词库构建研究[J]. 中国图书馆学报, 2010,36(3): 67-72.

[8] 侯汉清, 刘华梅, 郝嘉树. 60年来情报检索语言及其互操作进展(1949-2009)[J]. 图书馆杂志, 2009(12): 2-13.

[9] 侯汉清, 薛鹏军. 中文信息自动分类用知识库的设计与构建[J]. 情报学报, 2003, 22(6): 681-686.

[10] 陆勇, 章成志, 侯汉清. 基于百科资源的多策略中文同义词自动抽取研究[J]. 中国图书馆学报 , 2010, 36(1): 56-62.

[11] 常春, 赖院根. 基于文献标题词汇共现获取词间关系研究[J]. 图书情报工作, 2009, 53(8): 17-20.

[12] Bechhofer S,Goble C. Thesaurus Construction Through Knowledge Representation[J]. Data & Knowledge Engineers,2001, 37(1):25-45.

[13] Wartena C,Brussee R. Instanced-based Mapping Between Thesauri and Folksonomies . In:Proceeding of the 7th International Semantic Web Conference. 2008: 356-370.

[14] 万方数据.http://www.wanfangdata.com.cn/.

[15] Salton G, Yang C S. On the Specification of Term Values in Automatic Indexing [J]. Journal of Documentation, 1973, 29(4): 351-372.

[16] Salton G, Buckley C. Term-weighting Approaches in Automatic Text Retrieval [J]. Information Processing & Management, 1988, 24(5): 513-523.

[17] 中国图书资料分类法编辑委员会.中国图书资料分类法(第四版)[M].北京:科学技术文献出版社,2000.
[1] Shiqi Deng,Liang Hong. Constructing Domain Ontology for Intelligent Applications: Case Study of Anti Tele-Fraud[J]. 数据分析与知识发现, 2019, 3(7): 73-84.
[2] Ru Li,Rui Li,Jie Jiang,Huayi Wu. Spatio-Temporal Characteristics of WMTS Access Sessions[J]. 数据分析与知识发现, 2019, 3(6): 1-11.
[3] Zhu Fu,Yuefen Wang,Xuhui Ding. Semantic Representation of Design Process Knowledge Reuse[J]. 数据分析与知识发现, 2019, 3(6): 21-29.
[4] Cheng Zhou,Hongqin Wei. Evaluating and Classifying Patent Values Based on Self-Organizing Maps and Support Vector Machine[J]. 数据分析与知识发现, 2019, 3(5): 117-124.
[5] Bengong Yu,Yangnan Chen,Ying Yang. Classifying Short Text Complaints with nBD-SVM Model[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[6] Qingqing Zhang,Xingshi He,Huimin Wang,Shengjun Meng. Text Sentiment Classification Based on Deep Belief Network[J]. 数据分析与知识发现, 2019, 3(4): 71-79.
[7] Lianjie Xiao,Mengrui Gao,Xinning Su. An Under-sampling Ensemble Classification Algorithm Based on Fuzzy C-Means Clustering for Imbalanced Data[J]. 数据分析与知识发现, 2019, 3(4): 90-96.
[8] Guangshang Gao. A Survey of User Profiles Methods[J]. 数据分析与知识发现, 2019, 3(3): 25-35.
[9] Sisi Gui,Wei Lu,Xiaojuan Zhang. Temporal Intent Classification with Query Expression Feature[J]. 数据分析与知识发现, 2019, 3(3): 66-75.
[10] Xiang Xue,Yuxiang Zhao. Exploring User Mental Models of Online Music Classification System: Case Study of College Students[J]. 数据分析与知识发现, 2019, 3(2): 1-12.
[11] Ying Wang,Li Qian,Jing Xie,Zhijun Chang,Beibei Kong. Building Knowledge Graph with Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(1): 15-26.
[12] Zixuan Zhang,Hao Wang,Liping Zhu,Sanhong eng. Identifying Risks of HS Codes by China Customs[J]. 数据分析与知识发现, 2019, 3(1): 72-84.
[13] Hui Li,Yaqing Chai. Fine-Grained Sentiment Analysis Based on Convolutional Neural Network[J]. 数据分析与知识发现, 2019, 3(1): 95-103.
[14] Jiehua Wu,Jing Shen,Bei Zhou. Classifying Multilayer Social Network Links Based on Transfer Component Analysis[J]. 数据分析与知识发现, 2018, 2(9): 88-99.
[15] Xinlei Li,Hao Wang,Xiaomin Liu,Sanhong Deng. Comparing Text Vector Generators for Weibo Short Text Classification[J]. 数据分析与知识发现, 2018, 2(8): 41-50.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn