Please wait a minute...
Advanced Search
现代图书情报技术  2013, Vol. 29 Issue (11): 60-67     https://doi.org/10.11925/infotech.1003-3513.2013.11.09
  情报分析与研究 本期目录 | 过刊浏览 | 高级检索 |
融合关键词增补与领域本体的共词分析方法研究
唐晓波, 肖璐
武汉大学信息资源研究中心 武汉 430072
Research of Co-word Analysis Method of Combining Keywords Extension and Domain Ontology
Tang Xiaobo, Xiao Lu
Center for Studies of Information Resources, Wuhan University, Wuhan 430072, China
全文: PDF (1251 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 针对传统共词分析中的不足,提出一个新的共词分析过程模型,该模型从两个方面对传统共词分析方法进行改进。首先,自标引关键词不能全面描述论文主题内容,需对其进行增补。选择高频自标引关键词构成增补词典,利用基于增补词典的分词技术从标题中提取论文候选关键词,按一定规则进行增补。其次,针对共现频次较难准确描述词对相似度, 引入领域本体来计算高频关键词对的语义相似度,综合考虑共现频次和语义相似度值得到词对的相关度值。用相关度来描述词对相似度,并作为构建共词矩阵的依据。最后通过实验证明改进方法的有效性。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
唐晓波
肖璐
关键词 共词分析增补词典领域本体    
Abstract:This paper puts forward a new co-word analysis process model according to the deficiency in tradition co-word analysis. This model improves the traditional methods of co-word analysis from two aspects. At first, this paper supplements the indexing keywords because they cannot fully describe the topic content of the thesis. High frequency words from indexing key words are chosen to constitute a supplementary dictionary. Paper candidate keywords are extracted from the title by the word segmentation technology based on the supplement dictionary, and then the candidate keywords are supplemented according to certain rules. Secondly,domain Ontology is introduced to calculate the high frequency keywords for semantic similarity because the co-occurrence frequencies are difficult to accurately describe the similarity between two words,considering the co-occurrence frequency and semantic similarity. Then the correlation is used to describe the word similarity, and is the basis of building co-word matrix. Finally, experiments prove the effectiveness of this improved method.
Key wordsCo-word analysis    Extension dictionary    Domain Ontology
收稿日期: 2013-07-29      出版日期: 2013-11-29
:  TP391  
基金资助:本文系国家自然科学基金项目“社会化媒体集成检索与语义分析方法研究”(项目编号:71273194)的研究成果之一。
通讯作者: 肖璐     E-mail: ahjk_xiaolu@163.com
引用本文:   
唐晓波, 肖璐. 融合关键词增补与领域本体的共词分析方法研究[J]. 现代图书情报技术, 2013, 29(11): 60-67.
Tang Xiaobo, Xiao Lu. Research of Co-word Analysis Method of Combining Keywords Extension and Domain Ontology. New Technology of Library and Information Service, 2013, 29(11): 60-67.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2013.11.09      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2013/V29/I11/60
[1] 廖胜姣,肖仙桃. 基于文献计量的共词分析研究进展[J]. 情报科学,2008,26(6):855-859.(Liao Shengjiao, Xiao Xiantao. Research Advances on the Bibiometrics-based Co-word Analysis[J]. Information Science, 2008,26(6):855-859.)
[2] 钟伟金,李佳. 共词分析法研究(一)——共词分析的过程与方式[J]. 情报杂志,2008,27(5):70-72.(Zhong Weijin, Li Jia. The Research of Co-word Analysis(1) —— The Process and Methods of Co-word Analysis[J]. Journal of Information, 2008,27(5):70-72.)
[3] 李颖,贾二鹏,马力. 国内外共词分析研究综述[J]. 新世纪图书馆,2012(1):23-27.(Li Ying, Jia Erpeng, Ma Li. Co-word Analysis Research Review at Home and Abroad[J]. New Century Library, 2012(1):23-27.)
[4] 李纲,李轶. 一种基于关键词加权的共词分析方法[J]. 情报科学,2011, 29(3):321-324.(Li Gang, Li Yi. A New Method for Weighted Co-word Analysis Based on Keywords[J]. Information Science, 2011,29(3):321-324.)
[5] 邵作运,李秀霞. 共词分析中作者关键词规范化研究——以图书馆个性化信息服务研究为例[J]. 情报科学,2012,30(5):731-735.(Shao Zuoyun, Li Xiuxia. Study on the Standardization of Author Keywords in Co-word Analysis——Taking Library Personalized Information Services Study as Example[J]. Information Science, 2012,30(5):731-735.)
[6] 沈君,王续琨,陈悦,等. 战略坐标视角下的专利技术主题分析——以第三代移动通信技术为例[J]. 情报杂志,2012,31(11):88-94.( Shen Jun, Wang Xukun, Chen Yue,et al. Analysis on Technology Focus from the Perspective of Strategic Diagram: A Case in the Field of 3G Mobile Communication[J]. Journal of Information, 2012,31(11):88-94.)
[7] 韩红旗,安小米. 科技论文关键词的战略图分析[J]. 情报理论与实践,2012,35(9):86-90.(Han Hongqi, An Xiaomi. A Strategic Diagram Method for the Analysis of the Keywords in Scientific Papers[J]. Information Studies: Theory & Application, 2012,35(9):86-90.)
[8] 章成志. 自动标引研究的回顾与展望[J]. 现代图书情报技术, 2007(11):33-39.(Zhang Chengzhi. Review and Prospect of Automatic Indexing Research[J]. New Technology of Library and Information Service, 2007(11):33-39.)
[9] 邓三鸿, 王昊, 秦嘉杭,等. 基于字角色标注的中文书目关键词标引研究[J]. 中国图书馆学报, 2012,38(2):38-49.(Deng Sanhong, Wang Hao, Qin Jiahang, et al. Research on Keywords Indexing for Chinese Bibliography Based on Word Roles Annotation[J]. Journal of Library Science in China, 2012,38(2):38-49.)
[10] 肖红,许少华. 基于词汇同现模型的关键词自动提取方法研究[J]. 沈阳理工大学学报,2009,28(5):38-41.(Xiao Hong, Xu Shaohua. A Method of Automatic Keyword Extraction Based on Co-occurrence Model[J]. Transactions of Shenyang Ligong University, 2009,28(5):38-41.)
[11] Anjewierden A, Kabel S.Automatic Indexing of PDF Documents with Ontologies[C].In:Proceedings of the 13th Belgian/Dutch Conference on Artificial Intelligence(BNAIC'01),Amsterdam, Neteherlands.2001:23-30.
[12] Tomokiyo T, Hurst M.A Language Model Approach to Keyphrase Extraction[C].In:Proceedings of the ACL 2003 Workshop on Multiword Expressions:Analysis,Acquisition&Treatment (MWE'03),Sapporo, Japan. Stroudsburg: Association for Computational Linguistics, 2003:33-40.
[13] Hulth A.Improved Automatic Keyword Extraction Given More Linguistic Knowledge[C].In:Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, Sapporo, Japan. Stroudsburg:Association for Computational Linguistics, 2003:216-223.
[14] 钟伟金. 基于主要主题词加权的共词聚类分析法效果研究[J]. 情报学报,2009,28(2):214-219.(Zhong Weijin. Research into the Effects of Weighted Co-word Cluster Analysis Based on Major Descriptor[J]. Journal of the China Society for Scientific and Technical Information, 2009,28(2):214-219.)
[15] 吴清强,赵亚娟. 基于论文属性的加权共词模型探讨[J]. 情报学报,2008,27(1):89-92.(Wu Qingqiang, Zhao Yajuan. Research in the Weighted Co-word Analysis Based on the Attributes of Articles[J]. Journal of the China Society for Scientific and Technical Information, 2008,27(1):89-92.)
[16] An X Y, Wu Q Q. Co-word Analysis of the Trends in Stem Cells Field Based on Subject Heading Weighting[J]. Scientometrics, 2011, 88(1): 133-144.
[17] 李纲,王忠义. 基于语义的共词分析方法研究[J]. 情报杂志,2011,30(12):145-149.(Li Gang, Wang Zhongyi. Research on the Semantic-based Co-word Analysis[J]. Journal of Information, 2011,30(12):145-149.)
[18] 张启宇,朱玲,张雅萍. 中文分词算法研究综述[J]. 情报探索,2008(11):53-56.(Zhang Qiyu, Zhu Ling, Zhang Yaping. Review of Chinese Word Segmentation Algorithm[J]. Information Research, 2008(11):53-56.)
[19] 奉国和,郑伟. 国内中文自动分词技术研究综述[J]. 图书情报工作,2011,55(2):41-45.(Feng Guohe, Zhen Wei. Review of Chinese Automatic Word Segmentation[J]. Library and Information Service, 2011,55(2):41-45.)
[20] 王昊, 邓三鸿, 苏新宁. 基于字序列标注的中文关键词抽取研究[J]. 现代图书情报技术, 2011(12):39-45.(Wang Hao, Deng Sanhong, Su Xinning. Research on Chinese Keywords Extraction Based on Characters Sequence Annotation[J]. New Technology of Library and Information Service, 2011(12):39-45.)
[21] 于江德,李学钰,樊孝忠.信息抽取中领域本体的设计和实现[J]. 电子科技大学学报,2008,37(5):746-749.(Yu Jiangde, Li Xueyu, Fan Xiaozhong. Design and Implementation of Domain Ontology for Information Extraction[J]. Journal of University of Electronic Science and Technology of China, 2008,37(5):746-749.)
[22] Gruber T R. A Translation Approach to Portable Ontology Specifications[J]. Knowledge Acquisition,1993,5(2):199-220.
[23] 杜小勇,李曼,王珊. 本体学习研究综述[J]. 软件学报,2006,17(9):1837-1847.(Du Xiaoyong, Li Man, Wang Shan. A Survey on Ontology Learning Research[J]. Journal of Software, 2006,17(9):1837-1847.)
[24] 朱恒民,马静,黄卫东,等. 基于领域本体实现全网信息的智能搜索方法研究[J]. 情报学报,2010,29(1):9-15.(Zhu Hengmin, Ma Jing, Huang Weidong, et al. Study on Method of the Global Web Intelligent Search Based on Domain Ontology[J]. Journal of the China Society for Scientific and Technical Information, 2010,29(1):9-15.)
[1] 邬金鸣,侯跃芳,崔雷. 基于医学主题词标引规则的词共现聚类分析结果自动判读和表达的研究[J]. 数据分析与知识发现, 2020, 4(9): 133-144.
[2] 程齐凯,王佳敏,陆伟. 基于引用共词网络的领域基础词汇发现研究*[J]. 数据分析与知识发现, 2019, 3(6): 57-65.
[3] 何有世, 何述芳. 基于领域本体的产品网络口碑信息多层次细粒度情感挖掘*[J]. 数据分析与知识发现, 2018, 2(8): 60-68.
[4] 陆佳莹,袁勤俭,黄奇,钱韵洁. 基于概念格理论的产品领域本体构建研究*[J]. 现代图书情报技术, 2016, 32(5): 38-46.
[5] 鲍玉来,毕强. 蒙古文音乐领域的语义检索初探*[J]. 现代图书情报技术, 2016, 32(11): 94-100.
[6] 张帆, 乐小虬. 领域科技文献创新点句中主题属性实例识别方法研究[J]. 现代图书情报技术, 2015, 31(5): 15-23.
[7] 段宇锋, 朱雯晶, 陈巧, 刘伟, 刘凤红. 条件随机场与领域本体元素集相结合的未登录词识别研究[J]. 现代图书情报技术, 2015, 31(4): 41-49.
[8] 段宇锋, 黄思思. 基于BFO构建中文植物物种多样性领域本体的研究[J]. 现代图书情报技术, 2015, 31(12): 72-79.
[9] 颜时彦, 王胜清, 罗云川, 黄浩军. 云环境下基于FCA的领域本体协作构建模式初探[J]. 现代图书情报技术, 2014, 30(3): 49-56.
[10] 赵宇翔,彭希羡. 媒体即社区?信息系统领域基于文献的研究主题分析*[J]. 现代图书情报技术, 2014, 30(1): 56-65.
[11] 胡昌平, 陈果. 共词分析中的词语贡献度特征选择研究[J]. 现代图书情报技术, 2013, 29(7/8): 89-93.
[12] 姚晓娜, 祝忠明, 王思丽. 面向地学领域的自动语义标注研究[J]. 现代图书情报技术, 2013, (4): 48-53.
[13] 许鑫, 郭金龙. 基于领域本体的专题库构建——以中华烹饪文化知识库为例[J]. 现代图书情报技术, 2013, (12): 2-9.
[14] 郭金龙, 洪韵佳, 许鑫. 中华烹饪文化领域本体构建及其应用[J]. 现代图书情报技术, 2013, (12): 10-18.
[15] 洪韵佳, 许鑫. 基于领域本体的知识库多层次文本聚类研究——以中华烹饪文化知识库为例[J]. 现代图书情报技术, 2013, (12): 19-26.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn