|
|
Construction of Natural Language Thesauri for Automatic Assistant Indexing Literature System |
Yang He1,2 Yang Yihong1,2 Qiao Xiaodong1 Li Ning2 Zhu Lijun1 |
1(Institute of Scientific & Technical Information of China,Beijing 100038,China)
2(Beijing Wanfang Data Co.Ltd,Beijing 100038,China) |
|
|
Abstract The paper mainly discusses the construction of natural language thesauri for automatic assistant indexing literature system. Based on years of massive manual indexing keywords, it analyzes the rules of word frequency, length, type, co-occurrence, and proposes a method for constructing a thesauri of automatic assistant indexing and post controlled vocabulary.
|
Received: 12 April 2010
Published: 26 July 2010
|
|
Fund: *本文系“十一五”国家科技支撑计划重点项目《知识组织系统的集成及服务体系研究与实现》子课题“基于集成词表的数据标引加工系统升级改造”(项目编号:2006BAH03B03-02)的研究成果之一。 |
Corresponding Authors:
Yang Yihong
E-mail: yangyh@wanfangdata.com.cn
|
[1] 张琪玉.积极为自然语言与情报检索语言的结合创造条件——建议大量编制自然语言词表(上)[J].图书馆杂志,1999,18(9):7-9.
[2] 王群.论自然语言的优势与人工语言的消亡[J].大学图书馆学报,2004,22(2):62-65,52.
[3] 杨瑜,张文德,陈建芳.用户检索结果选择行为的调查与分析[J].情报杂志,2009,28(4):52-55.
[4] 周钦强,孙炳达,王义.文本自动分类系统文本预处理方法的研究[J].计算机应用研究,2005,22(2):85-86.
[5] 宋明亮.汉语词汇字面相似性原理与后控制词表动态维护研究[J].情报学报,1996,15(4):261-271.
[6] 马费成,望俊成.我国数字信息资源研究的热点领域:共词分析透视[J].情报理论与实践,2007,30(4):438-443.
[7] 杜慧平,何琳,侯汉清.基于聚类分析的自然语言叙词表的自动构建[J].国家图书馆学刊,2007,16(3):44-49.
[8] 章成志.基于多层特征的字符串相似度计算模型[J].情报学报,2005,24(6):696-701.
[9] 仲云云,侯汉清,杜慧平.电子政务主题词表自动构建研究[J].中国图书馆学报,2008(3):97-102.
[10] 杜慧平,侯汉清.网络环境中汉语叙词表的自动构建研究[J].情报学报,2008,27(6):863-869.
[11] 刘群,李素建.基于《知网》的词汇语义相似度计算[EB/OL].[2010-02-20]. http://www.keenage.com/papers/基于《知网》的词汇语义相似度计算.doc.
[12] Tseng Y H. Automatic Thesaurus Generation for Chinese Documents[J].Journal of the American Society for Information Science and Technology,2002,53(13):1130-1138.
[13] 查贵庭.基于多词表的自动标引技术研究——新华社新闻稿自动标引的实验[J].情报学报,2002,21(3):273-277.
[14] Chung Y M, Lee J Y. A Corpus-based Approach to Comparative Evaluation of Statistical Term Association Measures[J].Journal of the American Society for Information Science and Technology,2001,52(4):283-296.
[15] Tsurumaru H, Hitaka T, Yoshida S. An Attempt to Automatic Thesaurus Construction from an Ordinary Japanese Language Dictionary[C]. In: Proceedings of the 11th Conference on Computational Linguistics. Morristown, NJ, USA: Association for Computational Linguistics,1986:445-447.
[16] Cheung F,Kao B,Cheung D,et al. An Efficient Algorithm for Incremental Update of Concept Spaces[C]. In: Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Berlin: Springer-Verlag,2002:368-380.
[17] Crouch C J. An Approach to the Automatic Construction of Global Thesauri[J].Information Processing and Management,1990,26(5):629-640.
[18] 陆勇,侯汉清.面向信息检索的汉语同义词自动识别和挖掘[J].情报理论与实践,2006,29(4):472-475.
[19] 吴志强.经济信息检索后控制词表的研制[D].南京:南京农业大学,1999.
[20] 朱毅华.智能搜索引擎中的同义词识别算法研究[D].南京:南京农业大学,2001.
[21] 张琪玉.积极为自然语言与情报检索语言的结合创造条件──建议大量编制自然语言词表(下)[J]. 图书馆杂志,1999,18(10):8-10. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|