Please wait a minute...
Advanced Search
数据分析与知识发现  2019, Vol. 3 Issue (6): 57-65    DOI: 10.11925/infotech.2096-3467.2018.1159
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于引用共词网络的领域基础词汇发现研究*
程齐凯,王佳敏(),陆伟
(武汉大学信息管理学院 武汉 430072);(武汉大学信息检索与知识挖掘研究所 武汉 430072)
Discovering Domain Vocabularies Based on Citation Co-word Network
Qikai Cheng,Jiamin Wang(),Wei Lu
(School of Information Management, Wuhan University, Wuhan 430072, China);(Information Retrieval and Knowledge Mining Laboratory, Wuhan University, Wuhan 430072, China)
全文: PDF(528 KB)   HTML ( 2
输出: BibTeX | EndNote (RIS)      
摘要 

目的】从学术文献中发现领域基础词汇, 为把握学科知识结构和发展脉络提供支持。【方法】将引文网络引入到共词分析中, 构造关键词之间的引用共词网络, 采用PageRank算法对候选词汇重要性进行排名, 基于约11万篇计算机领域文献集进行实证研究。【结果】从定性和定量的角度与词频法和共词分析法进行对比, 结果表明本文方法效果较好, 能更好地拟合专家人工筛选结果, 盲选实验的平均准确度达72.6%。【局限】仅以计算机领域为例进行实验。【结论】本研究提出一种融合引用共词网络和PageRank算法的领域基础词汇发现策略, 能够提高领域基础词汇发现的效率和质量。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
程齐凯
王佳敏
陆伟
关键词 基础词汇引用共词网络PageRank词频法共词分析    
Abstract

[Objective] This paper identifies basic vocabularies of a specific domain from academic papers, aiming to grasp the knowledge structure and development context. [Methods] We combined the citation network and the co-word analysis to construct a citation co-word network. Then, we used the PageRank algorithm to evaluate the importance of the candidate words. We examined the proposed method with 110,360 articles in computer science. [Results] Our new method was compared with the word frequency method and co-word analysis qualitatively and quantitatively. We found that the proposed method performed well, and the average precision of a blind selection experiment reached 72.6%. [Limitations] The proposed method was only examined with computer science articles. [Conclusions] The new strategies could improve the performance of basic vocabulary discovery in one specific domain.

Key wordsBasic Vocabulary    Citation Co-word Network    PageRank    Word Frequency    Co-word Analysis
收稿日期: 2018-10-19     
基金资助:*本文系国家自然科学基金面上项目“面向词汇功能的学术文本语义识别与知识图谱构建”(项目编号: 71473183)、国家自然科学基金青年项目“基于深度语义挖掘的引文推荐多样化研究”(项目编号: 71704137)和中国博士后科学基金资助项目“基于词汇功能的科研资源推著”(项目编号: 2016M602371)的研究成果之一
引用本文:   
程齐凯,王佳敏,陆伟. 基于引用共词网络的领域基础词汇发现研究*[J]. 数据分析与知识发现, 2019, 3(6): 57-65.
Qikai Cheng,Jiamin Wang,Wei Lu. Discovering Domain Vocabularies Based on Citation Co-word Network. Data Analysis and Knowledge Discovery, DOI:10.11925/infotech.2096-3467.2018.1159.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.1159
[1] Courtial J P.Comments on Leydesdorff’s Article[J]. Journal of the American Society for Information Science, 1998, 49(1): 98.
[2] Su H N, Lee P C.Mapping Knowledge Structure by Keyword Co-occurrence: A First Look at Journal Papers in Technology Foresight[J]. Scientometrics, 2010, 85(1): 65-79.
[3] Hu J M, Zhang Y.Research Patterns and Trends of Recommendation System in China Using Co-Word Analysis[J]. Information Processing and Management, 2015, 51(4): 329-339.
[4] Sun Y W, Zhai Y.Mapping the Knowledge Domain and the Theme Evolution of Appropriability Research Between 1986 and 2016: A Scientometric Review[J]. Scientometrics, 2018, 116(1): 203-230.
[5] Khasseh A A, Soheili F, Moghaddam H S, et al.Intellectual Structure of Knowledge in iMetrics: A Co-Word Analysis[J]. Information Processing & Management, 2017, 53(3): 705-720.
[6] Ravikumar S, Agrahari A, Singh S N.Mapping the Intellectual Structure of Scientometrics: A Co-Word Analysis of the Journal Scientometrics (2005-2010)[J]. Scientometrics, 2015, 102(1): 929-955.
[7] Soriano A S, Álvarez C L, Valdés R M T. Bibliometric Analysis to Identify an Emerging Research Area: Public Relations Intelligence — A Challenge to Strengthen Technological Observatories in the Network Society[J]. Scientometrics, 2018, 115(3): 1591-1641.
[8] 胡昌平, 陈果. 科技论文关键词特征及其对共词分析的影响[J]. 情报学报, 2014, 33(1): 23-32.
[8] (Hu Changping, Chen Guo.Characteristics of Keywords in Scientific Papers and Their Impact on Co-word Analysis[J]. Journal of the China Society for Scientific and Technical Information, 2014, 33(1): 23-32.)
[9] 李树青. 基于引文关键词加权共现技术的图情学科领域本体自动构建方法研究[J]. 情报学报, 2012, 31(4): 371-380.
[9] (Li Shuqing.Research on Automatic Construction of Domain Ontology in Library and Information Science Based on Weighted Co-occurrence of Citation Keywords[J]. Journal of the China Society for Scientific and Technical Information, 2012, 31(4): 371-380.)
[10] Yan B N, Lee T S, Lee T P.Mapping the Intellectual Structure of the Internet of Things (IoT) Field (2000-2014): A Co-Word Analysis[J]. Scientometrics, 2015,105(2): 1285-1300.
[11] Wang Z S, Zhao H, Wang Y.Social Networks in Marketing Research 2001-2014: A Co-Word Analysis[J]. Scientometrics, 2015, 105(1): 65-82.
[12] Donohue J C.Understanding Scientific Literature: A Bibliographic Approach[M]. Cambridge: The MIT Press, 1973: 101.
[13] Booth A D.A “Law” of Occurrences for Words of Low Frequency[J]. Information and Control, 1967, 10(4): 386-393.
[14] Yang Y, Wu M, Cui L.Integration of Three Visualization Methods Based on Co-Word Analysis[J]. Scientometrics, 2011, 90(2): 659-673.
[15] Yan B N, Lee T S, Lee T P.Analysis of Research Papers on E-Commerce (2000-2013): Based on a Text Mining Approach[J]. Scientometrics, 2015, 105(1): 403-417.
[16] 李纲, 巴志超. 共词分析过程中的若干问题研究[J]. 中国图书馆学报, 2017, 43(4): 93-113.
[16] (Li Gang, Ba Zhichao.Co-word Analysis: Limitations and Solutions[J]. Journal of Library Science in China, 2017, 43(4): 93-113.)
[17] Choi J, Yi S, Lee K C.Analysis of Keyword Networks in MIS Research and Implications for Predicting Knowledge Evolution[J]. Information & Management, 2011, 48(8): 371-381.
[18] Zhu W, Guan J.A Bibliometric Study of Service Innovation Research: Based on Complex Network Analysis[J]. Scientometrics, 2013, 94(3): 1195-1216.
[19] Ocholla D N, Onyancha O B, Britz J.Can Information Ethics Be Conceptualized by Using the Core/Periphery Model?[J]. Journal of Informetrics, 2010, 4(4): 492-502.
[20] Liu J X, Zheng C H, Xu Y.Extracting Plants Core Genes Responding to Abiotic Stresses by Penalized Matrix Decomposition[J]. Computers in Biology & Medicine, 2012, 42(5): 582-589.
[21] Ding Y, Song M, Han J, et al.Entitymetrics: Measuring the Impact of Entities[J]. PLoS One, 2013, 8(8): e71416.
[22] Song M, Han N G, Kim Y H, et al.Discovering Implicit Entity Relation with the Gene-Citation-Gene Network[J]. PLoS One, 2013, 8(12): e84639.
[23] 吴清强, 赵亚娟. 基于论文属性的加权共词模型探讨[J]. 情报学报, 2008, 27(2): 89-92.
[23] (Wu Qingqiang, Zhao Yajuan.Research in the Weighted Co-word Analysis Based on the Attributes of Articles[J]. Journal of the China Society for Scientific and Technical Information, 2008, 27(2): 89-92.)
[24] 葛菲, 谭宗颖. 基于文献计量学的科学结构及其演化的研究方法述评[J]. 情报杂志, 2012, 31(12): 34-39.
[24] (Ge Fei, Tan Zongying.Review of Science Structure and Evolution of Bibliometric Methods[J]. Journal of Intelligence, 2012, 31(12): 34-39.)
[25] Brin S, Page L.The Anatomy of a Large-Scale Hypertextual Web Search Engine[C]// Proceedings of the 7th International Conference on World Wide Web. 1998: 107-117.
[26] Zhao W Y, Mao J, Lu K.Ranking Themes on Co-Word Networks: Exploring the Relationships Among Different Metrics[J]. Information Processing & Management, 2018, 54(2): 203-218.
[27] 陈果, 肖璐, 赵雪芹. 领域知识分析中的关键词选择方法研究——一种以学科为背景的全局视角[J]. 情报学报, 2014, 33(9): 959-968.
[27] (Chen Guo, Xiao Lu, Zhao Xueqin.A Keyword Selection Method Based on the Combination of Popularity and Domain Relevancy of Keywords: A Holistic Perspective[J]. Journal of the China Society for Scientific and Technical Information, 2014, 33(9): 959-968.)
[1] 陈晓威,史昱天. 社会网络中关键节点的识别——基于符号网络的PageRank算法改进[J]. 数据分析与知识发现, 2017, 1(8): 68-75.
[2] 刘通,杨敬成. 基于信号传播算法的在线医疗咨询反馈内容评估方法*[J]. 数据分析与知识发现, 2017, 1(11): 29-36.
[3] 赵宇翔,彭希羡. 媒体即社区?信息系统领域基于文献的研究主题分析*[J]. 现代图书情报技术, 2014, 30(1): 56-65.
[4] 唐晓波, 房小可. 融入社会关系的微博排名策略研究[J]. 现代图书情报技术, 2013, 29(9): 74-81.
[5] 胡昌平, 陈果. 共词分析中的词语贡献度特征选择研究[J]. 现代图书情报技术, 2013, 29(7/8): 89-93.
[6] 唐晓波, 肖璐. 融合关键词增补与领域本体的共词分析方法研究[J]. 现代图书情报技术, 2013, 29(11): 60-67.
[7] 叶春蕾, 冷伏海. 科技文献全文主题识别方法实证研究[J]. 现代图书情报技术, 2012, 28(1): 53-57.
[8] 陆伟, 彭玉, 陈武. 基于SOM的领域热点主题探测[J]. 现代图书情报技术, 2011, 27(1): 63-68.
[9] 杨颖, 崔雷. 应用改进的共词聚类法探索医学信息学热点主题演变[J]. 现代图书情报技术, 2011, 27(1): 83-87.
[10] 王立学,冷伏海,王海霞. 技术成熟度及其识别方法研究*[J]. 现代图书情报技术, 2010, 26(3): 58-63.
[11] 段晓丽, 王宇. 基于主题分割与PageRank算法的文本主题抽取[J]. 现代图书情报技术, 2010, 26(12): 34-39.
[12] 陈仕吉. 科学研究前沿探测方法综述[J]. 现代图书情报技术, 2009, (9): 28-33.
[13] 王建冬. 基于复杂网络方法的国内信息服务研究概念网络分析[J]. 现代图书情报技术, 2009, (10): 56-61.
[14] 王建冬,孙慧明. 基于网站链接分析的“211工程”高校排名实证研究[J]. 现代图书情报技术, 2008, 24(9): 64-69.
[15] 陈暾,陈新 . Perl语言辅助的信息计量学研究[J]. 现代图书情报技术, 2006, 1(7): 41-46.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn