Please wait a minute...
New Technology of Library and Information Service  2010, Vol. 26 Issue (6): 17-24    DOI: 10.11925/infotech.1003-3513.2010.06.04
article Current Issue | Archive | Adv Search |
Construction of Natural Language Thesauri for Automatic Assistant Indexing Literature System
Yang He1,2 Yang Yihong1,2  Qiao Xiaodong1  Li Ning2  Zhu Lijun1
1(Institute of Scientific & Technical Information of China,Beijing 100038,China)
2(Beijing Wanfang Data Co.Ltd,Beijing 100038,China)
Download: PDF(841 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

The paper mainly discusses the construction of natural language thesauri for automatic assistant indexing literature system. Based on years of massive manual indexing keywords, it analyzes the rules of word frequency, length, type, co-occurrence, and proposes a method for constructing a thesauri of automatic assistant indexing and post controlled vocabulary.

Key wordsAutomatic assistant indexing      Scientific literature processing      Thesauri of automatic assistant indexing      Keyword      Literal similarity algorithm     
Received: 12 April 2010      Published: 26 July 2010
: 

G254

 
Fund:

*本文系“十一五”国家科技支撑计划重点项目《知识组织系统的集成及服务体系研究与实现》子课题“基于集成词表的数据标引加工系统升级改造”(项目编号:2006BAH03B03-02)的研究成果之一。

Corresponding Authors: Yang Yihong     E-mail: yangyh@wanfangdata.com.cn

Cite this article:

Yang He Yang Yihong Qiao Xiaodong Li Ning Zhu Lijun. Construction of Natural Language Thesauri for Automatic Assistant Indexing Literature System. New Technology of Library and Information Service, 2010, 26(6): 17-24.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2010.06.04     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2010/V26/I6/17

[1] 张琪玉.积极为自然语言与情报检索语言的结合创造条件——建议大量编制自然语言词表(上)[J].图书馆杂志,1999,18(9):7-9.
[2] 王群.论自然语言的优势与人工语言的消亡[J].大学图书馆学报,2004,22(2):62-65,52.
[3] 杨瑜,张文德,陈建芳.用户检索结果选择行为的调查与分析[J].情报杂志,2009,28(4):52-55.
[4] 周钦强,孙炳达,王义.文本自动分类系统文本预处理方法的研究[J].计算机应用研究,2005,22(2):85-86.
[5] 宋明亮.汉语词汇字面相似性原理与后控制词表动态维护研究[J].情报学报,1996,15(4):261-271.
[6] 马费成,望俊成.我国数字信息资源研究的热点领域:共词分析透视[J].情报理论与实践,2007,30(4):438-443.
[7] 杜慧平,何琳,侯汉清.基于聚类分析的自然语言叙词表的自动构建[J].国家图书馆学刊,2007,16(3):44-49.
[8] 章成志.基于多层特征的字符串相似度计算模型[J].情报学报,2005,24(6):696-701.
[9] 仲云云,侯汉清,杜慧平.电子政务主题词表自动构建研究[J].中国图书馆学报,2008(3):97-102.
[10] 杜慧平,侯汉清.网络环境中汉语叙词表的自动构建研究[J].情报学报,2008,27(6):863-869.
[11] 刘群,李素建.基于《知网》的词汇语义相似度计算[EB/OL].[2010-02-20]. http://www.keenage.com/papers/基于《知网》的词汇语义相似度计算.doc.
[12] Tseng Y H. Automatic Thesaurus Generation for Chinese Documents[J].Journal of the American Society for Information Science and Technology,2002,53(13):1130-1138.
[13] 查贵庭.基于多词表的自动标引技术研究——新华社新闻稿自动标引的实验[J].情报学报,2002,21(3):273-277.
[14] Chung Y M, Lee J Y. A Corpus-based Approach to Comparative Evaluation of Statistical Term Association Measures[J].Journal of the American Society for Information Science and Technology,2001,52(4):283-296.
[15] Tsurumaru H, Hitaka T, Yoshida S. An Attempt to Automatic Thesaurus Construction from an Ordinary Japanese Language Dictionary[C]. In: Proceedings of the 11th Conference on Computational Linguistics. Morristown, NJ, USA: Association for Computational Linguistics,1986:445-447.
[16] Cheung F,Kao B,Cheung D,et al. An Efficient Algorithm for Incremental Update of Concept Spaces[C]. In: Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Berlin: Springer-Verlag,2002:368-380.
[17] Crouch C J. An Approach to the Automatic Construction of Global Thesauri[J].Information Processing and Management,1990,26(5):629-640.
[18] 陆勇,侯汉清.面向信息检索的汉语同义词自动识别和挖掘[J].情报理论与实践,2006,29(4):472-475.
[19] 吴志强.经济信息检索后控制词表的研制[D].南京:南京农业大学,1999.
[20] 朱毅华.智能搜索引擎中的同义词识别算法研究[D].南京:南京农业大学,2001.
[21] 张琪玉.积极为自然语言与情报检索语言的结合创造条件──建议大量编制自然语言词表(下)[J]. 图书馆杂志,1999,18(10):8-10.

[1] Xiuxian Wen,Jian Xu. Research on Product Characteristics Extraction and Hedonic Price Based on User Comments[J]. 数据分析与知识发现, 2019, 3(7): 42-51.
[2] Qingtian Zeng,Xiaohui Hu,Chao Li. Extracting Keywords with Topic Embedding and Network Structure Analysis[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[3] Zhen Zhang,Jin Zeng. Extracting Keywords from User Comments: Case Study of Meituan[J]. 数据分析与知识发现, 2019, 3(3): 36-44.
[4] Zhuchen Liu,Hao Chen,Yanhua Yu,Jie Li. Extracting Keywords with TextRank and Weighted Word Positions[J]. 数据分析与知识发现, 2018, 2(9): 74-79.
[5] Tian Xia. Extracting Keywords with Modified TextRank Model[J]. 数据分析与知识发现, 2017, 1(2): 28-34.
[6] Ning Jianfei,Liu Jiangzhen. Using Word2vec with TextRank to Extract Keywords[J]. 现代图书情报技术, 2016, 32(6): 20-27.
[7] Xu Deshan, Li Hui, Zhang Yunliang. A Method of Keywords Annotation Based on Linked Triples[J]. 现代图书情报技术, 2015, 31(9): 31-37.
[8] Li Junfeng, Lv Xueqiang, Zhou Shaojun. Patent Keyword Indexing Based on Weighted Complex Graph Model[J]. 现代图书情报技术, 2015, 31(3): 26-32.
[9] Li Xiangdong, Cao Huan, Ding Cong, Huang Li. Short-text Classification Based on HowNet and Domain Keyword Set Extension[J]. 现代图书情报技术, 2015, 31(2): 31-38.
[10] Zhang Yingyi, Zhang Chengzhi, Chi Xuehua, Li Lei. Difference Research on Keywords Tagging Behavior for Academic User Blog——A Case Study of ScienceNet.cn[J]. 现代图书情报技术, 2015, 31(10): 13-21.
[11] Gu Yijun, Xia Tian. Study on Keyword Extraction with LDA and TextRank Combination[J]. 现代图书情报技术, 2014, 30(7): 41-47.
[12] Chen Guo, Hu Changping. Research on the Structural Features of Keyword Network of Scientific Research Areas:An Empirical Study of LIS[J]. 现代图书情报技术, 2014, 30(7): 84-91.
[13] Xia Dong, Xiao Xiaodan, Li Guolei, Chen Xianlai. Research on Correspondence Between Keyword and Chinese Library Classification Based on Latent Semantic Analysis[J]. 现代图书情报技术, 2014, 30(12): 92-96.
[14] Xia Tian. Study on Keyword Extraction Using Word Position Weighted TextRank[J]. 现代图书情报技术, 2013, 29(9): 30-34.
[15] Wang Hao, Zou Jieli, Deng Sanhong. Model Construction and Experiment Analysis of Automatic Indexing for Chinese Books[J]. 现代图书情报技术, 2013, 29(7/8): 55-62.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn